Introduction

We have recently isolated the large variety of factors that function at eukaryotic forks, and have reconstituted them into working machines. The specific function of many of the proteins are unknown other than being needed for the replication process. One may have expected to understand the eukaryotic fork from knowledge of the bacterial replisome. This is partially true, but not to the extent of the processes of translation and transcription. This is because the core catalytic components of the replisome are not conserved between prokaryotes and eukaryotes. The only homologies between them are in the clamp and clamp loader, which look and operate essentially the same. But the helicase, primase, and DNA polymerases show no homology and eukaryotes contain many factors that have no bacterial homologue. Hence the structure and function of the eukaryotic replisome may be quite different from the prokaryotic replisome.

Biochemical Study of the Eukaryotic Replisome

Figure 1. Some purified proteins from the eukaryotic replisome

There are over 30 proteins needed at a eukaryotic replication fork, and thus to study this process in detail requires they all be cloned, expressed and purified. While difficult, my lab has worked as a team to obtain these proteins, and to reconstitute them into active complexes that form the functional leading/lagging strand replisome. A few of our protein preparations are shown in the SDS PAGE gels to the right.

The Eukyarotic Replisome

We work on the replisome of the budding yeast, Saccharomyces cerevisiae, but yeast have all the major components of the replisome in common with the human. The "protein pieces" of the eukaryotic replisome are illustrated below. The eukaryotic helicase is composed of 11 proteins called CMG, even though only 6 are illustrated in the diagram. CMG complex is an acronym coined by its founder, Mike Botchan that stands for Cdc45/Mcm2-7/GINS. The Mcm2-7 heterohexamer forms a ring that encircles the leading stand and tracks along it, and in this way it may seem similar to the prokaryotic DnaB homohexamer, which surrounds the lagging strand. However, the two are completely different structures and non homologous in sequence. The Mcm subunits are based on the architecture of the AAA+ fold, while DnaB is based on a RecA folding pattern. Furthermore, to become active as a helicase, the Mcm2-7 requires five additional proteins, the Cdc45 and the 4-subunit GINS complex. These additional proteins form a second channel, while the prokaryotic helicases have only one central channel. The function of the second channel in CMG, and whether it encircles the second strand of DNA is unknown.

The eukaryotic primase is composed of 4 different subunits. Discovered by Bob Lehman, the Pol alpha-primase is very different from the prokaryotic primase. The two smallest subunits synthesize the RNA and form a heterodimer that defines a Pol X fold, while prokaryotic primase functions as a single subunit and its structure is based on a topoisomerase fold. In addition to the two priming subunits, the eukaryotic Pol alpha-primase contains two larger subunits, the largest of which is a DNA polymerase (polymerase alpha). The exact function of the polymerase alpha subunit is still obscure, but it extends the RNA primer for 15-20 nucleotides, making a hybrid RNA-DNA primer. The 2nd largest of the four subunits is referred to as the B subunit. The B subunit is essential for life, but its function is unknown. Unlike prokaryotic primase, which is not an integral part of the replisome, the eukaryotic Pol alpha-primase is integral to the replisome and travels with replication forks. One contact that secures Pol alpha-primase into the replisome is Ctf4, a homotrimer of 104 kDa subunits. Ctf4 also binds CMG, thus connecting Pol alpha-primase to the CMG helicase. Mcm10 is an essential protein that also binds Pol alpha-primase and CMG, and it also travels with replication forks. Very little is known about the function of Mcm10 (despite its name, it shares no homology to the Mcm subunits of CMG).

The eukaryotic replisome uses two different DNA polymerases, Pol epsilon and Pol delta, to synthesize the leading and lagging strands. It is widely thought that Pol epsilon is utilized for the leading strand, and Pol delta for the lagging strand, although aspects of this arrangement remains in dispute. Both Pols interact with the PCNA sliding clamp. The clamp loader, RFC (Replication Factor C) is homologous to the prokaryotic clamp loader, and its clamp loading action is explained in the prokaryotic section of this website. However, RFC is not known to be an integral part of the replisome, unlike the case of E. coli, in which the clamp loader is the central organizing component of the replisome.

How do these proteins function together in a replisome?

Figure 2. Eukaryotic replisomes require over 30 different proteins

The eukaryotic replisome is highly regulated by modification enzymes and a variety of binding proteins. Regulation of the replisome is required for the DNA damage checkpoint response (phosphorylation, ubiquitination, and other modifications). Replisome regulation is also involved in cell cycle control, nucleosome handling and inheritance, and other pathways involved in genome integrity. Thus, we have cloned and purified many additional proteins beyond those discussed above, and are studying how these processes interweave their actions with replisome action. We have published on a few aspects of these processes. Considering the rapidly evolving knowledge and complexity of these important pathways, we continue to purify new factors and modification enzymes that affect the replication apparatus in these exciting avenues of research.

What does the replisome look like?

We have only just started our studies of eukaryotic replication in the last few years. Nonetheless, we have addressed several important aspects of eukaryotic replisome function by the biochemical approach using pure proteins (e.g. see publications list). Of course, many more questions remain than the number of questions that have thus far been answered. But our initial studies have reconstituted a fully functional leading/lagging strand replication fork, have identified how the proteins are organized, and how the DNA polymerases are targeted to the leading and lagging DNA strands. For example, we have identified a stable 15 protein complex of CMG-Pol epsilon, which we refer to as CMGE. This CMGE complex is active only on the leading strand of a replication fork DNA, supporting the evidence that Pol epsilon is the leading strand polymerase. The lagging strand is much more complicated and we are still working on the details. However, we find that Pol delta is functional on the lagging strand, while the same Pol delta has very low activity on the leading strand. Therefore, there is something quite specific about the activity of these two polymerases on either the leading or lagging strands. Exactly how Pol epsilon is repressed on the lagging strand, and how Pol delta is repressed on the leading strand are puzzles we intend to understand in future studies. The same goes for the structure of the replisome. We have collaborated with Huilin Li at Brookhaven National Labs/SUNY Stoneybrook) who has determined the EM structure of the bulk of the replisome. The initial studies of the replisome structure are summarized after the section below.

Architecture of the CMGE leading strand replisome

Figure 3. Structure of CMG reveals two channels. The top view shows a channel comprised of the Mcm2-7 motor proteins, and a channel made by the accessory factors and the outside surface of Mcm2 and Mcm5. The side view shows the two tiered appearance of the Mcms.

Figure 4. Structure of CMG (top) and CMGE (bottom)

Figure 5. Pol epsilon sits on top of CMG. The green region is the density attributed to Pol epsilon. The remainder is the CMG helicase complex.

Movie 1. Rotation of CMGE (horizontal). Pol epsilon sits mainly on top of the accessory factors, Cdc45 and GINS, although it makes some contact with the Mcm ring.

Movie 2. Rotation of CMGE (vertical).The central channel through the Mcm2-7 is visible. The secondary channel made by the accessory factors is largely occluded by the Pol epsilon which resides over the top of the accessory factors.

The 3D reconstruction of CMGE from multiple 2D averages taken at many different angles gives a 3D map of about 15 angstrom resolution, shown in Fig. 5 and movies 1 and 2. Pol epsilon consists of the green density that includes Pol2 (catalytic), Dpb2 (presumed location as a dot), Dpb3, and Dpb4. Each Mcm subunit is a two domain protein and thus the Mcm complex is divided into a CTD (C-terminal domain) and NTD (N-terminal domain).

The structures of the 11-protein CMG helicase and 15-protein CMGE (CMG-Pol epsilon) were examined by negative stain, and solved in 3D to a resolution of 15 angstroms. The structure of CMG looked essentially the same as the CMG of Drosophila, published by the Botchan/Berger/Nogales groups. The yeast CMG structure, like Drosophila CMG, contains two channels, the Mcm2-7 channel, and a side channel formed from by interaction of Cdc45-GINS with the side of the Mcm2-7 ring, as shown in Fig. 3.

In the EM, the CMG-PolE complex can be observed, in which Pol E appears as an appendage on top of CMG with multiple connections to CMG. In Fig. 4 are some 2D averages of side-view images that illustrate this point.

Architecture of the core leading/lagging replisome.

A core replisome that contained CMG helicase, Pol epsilon, Ctf4 trimer scaffold and Pol alpha-primase was built-up in stages to a final of 20 different subunits. Images of the 2D averages of the subassemblies and the final core replisome are shown in in Figure 6.

Figure 6. Buildup of the replisome. 2D averages were of the complexes indicated in the cartoons to the right. The lagging strand Pol apha-primase binds to Ctf4, under the MCM ring, and on the opposite side of the Mcms from Pol epsilon.

Figure 7. Architecture of the eukaryotic replisome.

Study of the DNA path through CMG was preformed earlier in the Drosophila system by the Mike Botchan and James Berger labs. The leading strand enters CMG via the C-terminal side (CTD). Earlier studies in the Xenopus system by Johannes Walter′s lab showed that the lagging strand is excluded to the outside of the CMG complex. The EM structure of the reconstituted yeast CMGE and CMGE-Ctf4-Pol alpha shows that Pol epsilon is located at the CTD tier of the MCMs, and the Ctf4-Pol alpha-primase is located on the NTD side (Fig. 7). Thus Pol epsilon is located ahead of the unwinding point of the parental DNA. The leading strand enters the CMG via the motor domains of Mcm2-7 and must do a U-turn at the bottom, after being unwound, to reach Pol epsilon at the top. The lagging strand is excluded to the outside of CMG and is illustrated to reach down below CMG to bind the Pol alpha-primase for primer synthesis. The location of the lagging strand Pol delta within the replisome is yet to be established, as are the locations of numerous proteins known to travel with the replisome including histones and histone chaperones.