| Journal of General Virology |
| SUMMARY | INTRO | METHODS | RESULTS | DISCUSSION | FOOTNOTES | REFS |
| First posted online 10 July 2001 | FULL-LENGTH ARTICLE |
| Rec 5 April 2001; Acc 29 June 2001 | DOI: 10.1099/vir.0.17784-0 |
Teresa Luque,1 Ruth Finch,2 Norman Crook,2 David R. O'Reilly1 and Doreen Winstanley2
1 Department of Biology,
Imperial College of Science, Technology and Medicine, Imperial College
Road, London SW7 2AZ, UK
2 Horticulture Research International, Wellesbourne, Warwick
CV35 9EF, UK
The nucleotide sequence of the DNA genome of Cydia pomonella granulovirus (CpGV) was determined and analysed. The genome is composed of 123500 bp and has a G+C content of 45.2 %. It contains 143 ORFs of 150 nucleotides or more that show minimal overlap. One-hundred-and-eighteen (82.5 %) of these putative genes are homologous to genes previously identified in other baculoviruses. Among them, 73 are homologous to genes of Autographa californica nucleopolyhedrovirus (AcMNPV), whereas 108 and 98 are homologous to genes of Xestia c-nigrum GV (XcGV) and Plutella xylostella GV (PxGV), respectively. These homologues show on average 37.4 % overall amino acid sequence identity to those from AcMNPV and 45 % to those from XcGV and PxGV. The CpGV gene content was compared to that of other baculoviruses. Several genes reported to have major roles in baculovirus biology were not found in the CpGV genome, such as gp64, the major budded virus glycoprotein gene in some nucleopolyhedroviruses, and lef-7, involved in DNA replication. However, the CpGV genome encodes the large and small subunits of ribonucleotide reductase, three inhibitor of apoptosis (iap) homologues and two protein tyrosine phosphatases. The CpGV, PxGV and XcGV genomes present a noticeably high level of conservation of gene order and orientation. A striking feature of the CpGV genome is the absence of typical homologous repeat sequences. However, it contains one major repeat region and 13 copies of a single 7377 bp imperfect palindrome.
Introduction |
The Baculoviridae are a family of
invertebrate viruses with large, double-stranded DNA genomes. To date,
most baculoviruses have been isolated from lepidopteran insects. The
Baculoviridae are subdivided into two genera,
Nucleopolyhedrovirus (NPV) and Granulovirus (GV) (Murphy
et al., 1995
). NPVs form large
polyhedral occlusion bodies that contain multiple virus particles, while
GVs form smaller, ovoid occlusion bodies called granules that generally
contain a single virion. NPVs and GVs present major differences in
infection cycle pathology (Crook, 1991
; Federici, 1997
). Unlike NPVs, in GV infections the nuclear membrane
breaks down before occlusion body formation. Little is known about the
molecular causes of these differences. Several NPVs have been completely
sequenced, namely Autographa californica (Ac) MNPV (Ayres et
al., 1994
), Orgyia pseudotsugata (Op)
MNPV (Ahrens et al., 1997
), Lymantria dispar (Ld) MNPV (Kuzio et al.,
1999
), Bombyx mori (Bm) NPV (Gomi
et al., 1999
), Spodoptera
exigua (Se) MNPV (IJkel et al., 1999
) and Helicoverpa armigera (Ha) SNPV
(Chen et al., 2001
). In contrast, only two GVs, Xestia c-nigrum (Xc)
GV and Plutella xylostella (Px) GV, have been sequenced so far
(Hashimoto et al., 2000
; Hayakawa et al., 1999
).
GVs can be subdivided into two classes, the 'slow'
and 'fast' GVs (Winstanley & O'Reilly, 1999
). XcGV is a slow GV, whereas PxGV is a fast GV. The type
species of the genus, Cydia pomonella GV (CpGV), is also a fast GV,
i.e. the host typically dies in the same instar in which it was infected.
CpGV is highly pathogenic for the codling moth, C. pomonella, a
worldwide pest of apples, pears and walnuts (Glen & Payne, 1984
). Several strains of the virus have been
isolated, although most field trials have used the Mexican isolate CpGV-M
(Crook et al., 1985
). The molecular biology
of this virus is poorly understood. Although its genome has been mapped
and estimated to contain about 125.6 kbp (Crook et al., 1985
, 1997
), only a small number of CpGV genes have been identified.
Here, we describe the complete genome sequence of CpGV and compare it to
other baculovirus genomes.
Methods |
Cloning and sequence analysis. The
SalI cosmid library of CpGV-M1, an in vivo cloned genotype
from the Mexican isolate, has been described (Crook et al., 1997
). DNA fragments were subcloned from the cosmids
into plasmid vectors for sequencing. Cosmid DNA or PCR products were also
used as templates. M13 single-stranded DNA templates were sequenced using
Sequenase (US Biochemicals) according to the manufacturer's protocol.
Double-stranded DNA was sequenced using ABI Prism Big Dye Terminator Cycle
Sequencing Ready Reaction kits (PE Biosystems) and ABI 373 and 377
automated sequencers. Sequence data were assembled and analysed using the
Wisconsin Genetics Computer Group programmes (Devereux et al.,
1984
) and DNAStar (Lasergene). ORFs of
50 codons or more were considered for further analysis; smaller ORFs were
included when similarity to a known gene was demonstrated. Overlapping
ORFs were considered where the non-overlapping portion of the ORF exceeded
50 amino acids. The possibilities of splicing or alternative start sites
were not considered. Homology searches were carried out using BLAST and
FASTA programmes (Altschul et al., 1990
; Pearson & Lipman, 1988
). Pairwise sequence alignments were performed with the GCG
Gap programme using a gap creation penalty of 4 and an extension penalty
of 2. Multiple alignments were performed with ClustalW (Thompson et
al., 1994
) and phylogenetic analyses with
PAUP*4.0b4a (Swofford, 2000
). The MacVector program was used to identify repeats by
comparing the genome to itself with a window size of 30 and minimum
similarity of 65 %. The conservation of gene order between baculovirus
genomes was visualized by generating gene parity plots essentially as
described by Hu et al. (1998
).
Results and Discussion |
Sequence analysis of the CpGV genome
The CpGV genome consists of
123500 bp, in good agreement with the previous estimate of 125.6 kbp based
on restriction mapping (Crook et al., 1997
). This compares to 178733 bp and 100999 bp for XcGV and
PxGV (Hashimoto et al., 2000
; Hayakawa et al., 1999
). The A of the granulin start codon was designated
nt 1 and the sequence numbered in the direction of granulin
transcription. The G+C content of the CpGV genome is 45.2 %, slightly
higher than that of XcGV, PxGV, AcMNPV, BmNPV, SeMNPV and HaSNPV
(3944 %), but lower than that of OpMNPV and LdMNPV (5558 %).
Our criteria for selecting ORFs for further study were that they should be
methionine-initiated ORFs of at least 50 codons having minimal overlap
with other ORFs. However, following comparisons with other baculoviruses,
some exceptions to these criteria were allowed. ORF86 (Cp86), encoding the
core protein P6.9 homologue, and Cp53, homologous to Ac110, are 49 and 48
codons long, respectively. Cp5, Cp77 and Cp122 are all contained within
larger ORFs. However, these three ORFs were considered more likely to be
biologically meaningful because they have homologues in other
baculoviruses. Thus, Cp5, Cp77 and Cp122 are included in Table 1, rather than the larger ORFs they lie within.
Finally, Cp37 and 38, Cp51 and 52, Cp82 and 83, Cp87 and 88, and Cp137 and
138 overlap significantly (>45 codons). Ultimately, 143 ORFs were selected
for further study (Table 1 and Fig.
1) and they are numbered from granulin in a clockwise
direction. The number of ORFs described for other completely sequenced
baculoviruses ranges from 120 (PxGV) to 181 (XcGV).
Table 1. CpGV ORFs
The positions and orientations of 143 putative ORFs
in the CpGV genome are shown and compared to homologues in selected other
completely sequenced baculoviruses. Red, ORFs unique to CpGV; blue, ORFs
present in all GVs and absent in NPV; green, ORFs conserved in all nine
completely sequenced baculoviruses. A Cp79 homologue is found in all
baculoviruses included in this table but is absent from HaSNPV. The
presence of baculovirus early (E) and late (L) promoter elements, located
within 120 nt of the ATG, is indicated in the 'Prom.' column. 'E'
indicates a TATA sequence with a CAC/GT start site sequence 2040 nt
downstream. 'L' indicates the presence of an A/T/GTAAG motif. Homologous
ORFs in the genomes of AcMNPV (Ayres et al., 1994
), OpMNPV (Ahrens et al., 1997
), LdMNPV (Kuzio et al., 1999
), SeMNPV (IJkel et al., 1999
), XcGV (Hayakawa et al., 1999
) and PxGV (Hashimoto et al., 2000
) are shown with the percentage amino acid
sequence identity to the homologous CpGV ORF. Pairwise alignments were
carried out using the GCG Gap algorithm (Devereux et al., 1984
).
Click here to see Table 1 (in a new browser window)
Fig. 1. Circular map of the CpGV genome. The
inner circles indicate the positions of cleavage sites for the following
enzymes: inner circle, SalI; middle circle, EcoRI; and outer
circle, BamHI. CpGV ORFs discussed in the text are indicated
outside these circles, with the arrow indicating the direction of
transcription. The positions of repeat sequences are also shown, with the
major repeat region from 20 to 21 kbp indicated by the bar. A scale in bp
is provided in the centre of the figure.
Similar to other baculoviruses, CpGV ORFs have minimal intergenic distances and no obvious preferred orientation (51.7 % clockwise, 48.3 % anticlockwise) or clustering according to function or expression. Coding sequences represent 88.4 % of the genome, similar to XcGV and PxGV (88 and 89 %, respectively). Fifty-three ORFs overlap an adjacent ORF. Apart from the five pairs noted above, these overlaps are less than 36 codons. Thirty-eight CpGV ORFs possess a consensus early promoter motif (TATA box with a CAC/GT motif 2040 bp downstream) within 120 bp of the initiation codon (Table 1). Nineteen of these also have a late promoter motif (A/T/GTAAG) within 120 bp of the initiation codon and 52 ORFs only have a late promoter motif. Fifty-three ORFs lack any recognized consensus promoter motif within 120 nt of the ATG.
Comparison of CpGV ORFs to those of other baculoviruses
One-hundred-and-eighteen CpGV ORFs have homologues in other baculoviruses, while 25 are so far unique to CpGV. None of these showed significant similarity to sequences in GenBank (Table 1). Sixty-three ORFs are conserved among all nine baculoviruses sequenced so far. Seventy-three CpGV ORFs have homologues in AcMNPV while 108 and 98 CpGV ORFs have homologues in XcGV and PxGV, respectively. Ninety-five CpGV ORFs have homologues in both XcGV and PxGV, of which 27 are so far unique to GVs. Seven CpGV ORFs have homologues in NPVs but not in XcGV or PxGV. Finally, 64 XcGV ORFs and 21 PxGV ORFs lack CpGV homologues (Table 2).
Table 2. Granulovirus ORFs without homologues in CpGV
|
XcGV |
|||||
|
4 |
41 |
65 |
106 |
151 |
163 |
|
5 p10 (Ac137, Px2) |
42 |
67 he65 (Ac105) |
108 |
152 enhancin-2 |
164 |
|
6 |
44 |
69 |
115 |
153 |
166 enhancin-4 |
|
14 ie-0 (Ac141, Px15) |
48 |
70 |
117 |
154 enhancin-3 |
167 |
|
21 p94 (Ac134) |
49 |
71 |
124 (Px90) |
155 |
168 |
|
22 |
57 |
72 |
127 ctl (Ac3) |
156 |
176 (Px116) |
|
23 |
59 |
73 |
128 |
157 |
177 |
|
24 |
61 |
74 |
129 |
158 |
179 |
|
28 |
62 |
83 |
138 |
160 (Ac111) |
181 |
|
37 |
63 |
102 (Ac60) |
147 (Ac112+113) |
161 |
|
|
38 |
64 |
104 |
150 enhancin-1 |
162 |
|
|
PxGV |
|||||
|
1 |
18 |
33 |
62 |
90 (Xc124) |
119 |
|
2 p10 (Ac137, Xc5) |
19 |
39 |
73 |
105 |
|
|
3 |
22 |
48 |
77 |
111 |
|
|
15 ie-0 (Ac141, Xc14) |
27 |
58 |
81 |
116 (Xc176) |
The average amino acid sequence identity between CpGV and XcGV, PxGV and AcMNPV homologues is 44.6, 45.4 and 37.4 %, respectively. The most conserved ORFs are ubiquitin (Cp54), polyhedrin/granulin (Cp1), p6.9 (Cp86), sod (Cp59), lef-9 (Cp117) and lef-8 (Cp131) (Table 1). In contrast, 39K (Cp57) and lef-3 (Cp113) are among the least conserved.
Genes specific to GVs
Twenty-seven ORFs are present in all three sequenced GV
genomes and absent in NPVs (Table 1). These
GV-specific genes show 35.3 % mean amino acid sequence identity. The most
conserved GV-specific ORFs are related to the previously described orf16L
(Kang et al., 1997
), whereas Cp2 is among
the least conserved GV-specific ORFs. CpGV has two genes in the orf16L
family, Cp20 and Cp23. They are approximately 35 % identical to each
other. However, each is approximately 55 % identical to its orthologue in
XcGV and PxGV, suggesting there are two independent lineages of genes in
this family. The functions of most GV-specific ORFs are unknown. Cp116
encodes a putative novel inhibitor of apoptosis that seems to be
GV-specific (see below). Cp46 (homologous to Xc40 and Px35) is likely to
be a member of the stromelysin family within the matrix metalloproteinase
superfamily (Hashimoto et al., 2000
; Hayakawa et al., 1999
). Ko et al. (2000
) recently confirmed that Xc40 encodes an active
proteinase. They suggested it is retained within infected cells until
death, when it is released into the body of the insect, causing the
proteolysis of host tissues. Cp46 is about 70 amino acids longer than Xc40
at the 5´ end and may have a signal sequence.
Genes involved in DNA replication and expression
Many genes implicated in
DNA replication and expression are present in all sequenced baculovirus
genomes, presumably reflecting their critical roles in virus replication.
Early baculovirus genes are transcribed by the host cell machinery, but
this is often modulated by viral transcription regulators, such as
ie-0, ie-1, ie-2 and pe38 (Friesen, 1997
). Both ie-0 and ie-2 are
absent from CpGV, whereas ie-1 (Cp7) and pe38 (Cp24) are
present, but poorly conserved. These genes are not well conserved among
baculoviruses in general (about 26 % mean amino acid identity). Both
ie-2 and pe38 are absent from XcGV, PxGV, LdMNPV, SeMNPV and
HaSNPV.
Six genes are reported to be essential for DNA
replication in AcMNPV and OpMNPV: lef-1, lef-2,
lef-3, dnapol, helicase and ie-1 (Lu et
al., 1997
). Homologues of all are present in
CpGV. With the exception of lef-3 and ie-1, they are
moderately well conserved (Table 1). AcMNPV and BmNPV
lef-7 stimulates DNA replication in transient assays (Gomi et
al., 1997
; Morris et al., 1994
). However, in AcMNPV it acts in a cell-specific
manner. CpGV lacks lef-7, similar to XcGV, PxGV,
LdMNPV, SeMNPV and HaSNPV. In common with LdMNPV, XcGV and PxGV, CpGV
encodes a DNA ligase (Cp120) and a second helicase (Cp126). The LdMNPV DNA
ligase displays catalytic properties of a type III DNA ligase (Pearson
& Rohrmann, 1998
). The helicase-2
gene shows similarities to a yeast mitochondrial helicase called
pif-1 (Pearson & Rohrmann, 1998
). In LdMNPV, neither the helicase-2 nor the DNA
ligase gene stimulates DNA replication in transient assays. It has
been suggested that they may be involved in DNA repair (Kuzio et
al., 1999
).
Cp127 and Cp128 encode the large (rr1) and small
(rr2) subunits of ribonucleotide reductase. OpMNPV, LdMNPV and SeMNPV also
encode ribonucleotide reductase subunits (Ahrens et al., 1997
; IJkel et al., 1999
; Kuzio et al., 1999
). The latter viruses also encode a dUTPase, but no dUTPase
homologue is present in CpGV. These enzymes are involved in nucleotide
metabolism and may facilitate virus replication in non-dividing cells, in
which dNTP pathways are inactive. There appear to be two separate lineages
of baculovirus ribonucleotide reductase genes. Those from OpMNPV and two
from LdMNPV appear to be part of a novel lineage only found in some
baculoviruses, whereas those from SeMNPV are part of a eukaryotic lineage
(Jordan & Reichard, 1998
). The CpGV rr1 gene appears to fall in the novel
baculovirus lineage and shows 51 and 52 % amino acid identity to its
homologues in OpMNPV and LdMNPV respectively, but only 31 % identity to
SeMNPV homologues. CpGV rr2 is most similar to OpMNPV rr2
and LdMNPV rr2a (Ld147) and shows little or no similarity to
the LdMNPV or SeMNPV rr2b genes (Ld120 and Se45). Op31 appears to
be a fusion of two ORFs, an N-terminal one of unknown function and a
C-terminal ORF encoding dUTPase (Fig. 2). However,
these ORFs are separated in LdMNPV (Ld116 and Ld138) and SeMNPV (Se55 and
Se54). Cp16 is homologous to the N-terminal part of Op31, Ld138 and Se54.
The C-terminal part of Se54 is homologous to Ac33 but no homologue of this
gene is present in CpGV. XcGV, PxGV, AcMNPV, BmNPV and HaSNPV do not
encode ribonucleotide reductase subunits or dUTPase (Hayakawa et
al., 1999
; Hashimoto et al., 2000
; Ayres et al., 1994
; Gomi et al., 1999
; Chen et al., 2001
).
Fig. 2. Relationships of dutpase genes
and associated ORFs from several baculoviruses. The solid black box
indicates the 5´ region of Op31 and its homologous ORFs. The
diagonally striped box represents dutpase. Ac33 homologous
sequences are indicated by a horizontally striped box, whereas the 3´
portion of Ld138 (not related to other ORFs) is denoted by the stippled
box. Comparisons were based on gap analysis. The length of each ORF is
indicated on the right-hand side.
Many genes required for late gene transcription have
been described, including lef 46, lef 811,
39K, p47 and vlf-1 (Lu & Miller, 1997
). All of these are found in CpGV, with the
possible exception of lef-6. Generally, these genes are more
conserved than the early transcription activators (IJkel et al.,
1999
). This is also the case for CpGV
(42 % mean amino acid identity). Of these, lef-8 (Cp131) and
lef-9 (Cp117) are the most conserved (about 58 % mean amino acid
identity). In contrast lef-6, if present at all, is very poorly
conserved. Hayakawa et al. (1999
) reported that it is absent from XcGV. However, Hashimoto
et al. (2000
) reported that Px60,
which is homologous to Xc88, shows some similarity to NPV lef-6
genes. Cp80 is a clear homologue of Px60 and Xc88 (Table
1). These three genes are smaller than the NPV lef-6 genes
(86101 amino acids vs 138187 amino acids). Functional studies
will be necessary to determine whether Cp80, Xc88 and Px60 are true
lef-6 homologues.
CpGV structural genes
The most conserved baculovirus structural protein is
polyhedrin/granulin (66.5 % mean amino acid identity), the major component
of occlusion bodies (Rohrmann, 1992
). Other conserved CpGV structural genes are p6.9
(Cp86) and odv-e25 (Cp91) (64.2 and 56 % mean amino acid identity
respectively; Table 1). CpGV lacks homologues of some
structural genes, such as calyx/pep and
p80/p87-capsid, both of which are also absent from XcGV and
PxGV. Hayakawa et al. (1999
) reported that Xc2 encodes a homologue of ORF1629
(p78/83), thought to be an essential virion component. Xc2 shows only low
similarity with the NPV ORF1629 genes (e.g. 24.1 % identity over 162 aa to
Op2) and is substantially shorter (231 amino acids compared to 462 to 555
amino acids). Cp2, Xc2 and Px5 are clearly homologues of each other (Table 1) and are in the same genomic position as ORF1629.
However, Cp2 is smaller again than the NPV ORF1629 genes (174 amino acids)
and shows no significant similarity to any of them. Similarly Px5 was not
identified as an ORF1629 homologue. Cp2/Xc2/Px5 may represent GV-specific
genes (Table 1). Thus, the analysis suggests that GVs
may not possess an ORF1629 homologue.
The absence of a p10 homologue in CpGV is
noteworthy. In NPV-infected cells, P10 forms fibrillar structures in the
nucleus and cytoplasm. The protein is implicated in occlusion body
morphogenesis and disintegration of the nuclear matrix, thereby
disseminating the polyhedra (van Oers & Vlak, 1997
). Three XcGV ORFs (Xc5, Xc19 and Xc83) present
similarities to p10. Xc5 is most similar and was named p10,
although it is poorly conserved. Homologues of these three ORFs are
present in PxGV (Px2, 21 and 50) and Hashimoto et al. (2000
) suggested they are all p10 homologues.
No Xc5, Xc83 or Px2 homologues are present in CpGV. We have previously
described ORF17R (Cp22), which is 56 % identical to Xc19 and 39 %
identical to Px21 (Kang et al., 1997
). Cp22 shares a number of motifs with P10, including a
proline-rich domain and a heptad repeat sequence. It is 30 % identical to
AcMNPV P10. However, it is significantly larger (329 vs 137 amino acids)
and much of the sequence identity is between sequences of low complexity.
A similar situation occurs with Cp62, which is homologous to Px50. Thus,
although Cp22 and Cp62 may be functionally analogous to p10, we
consider it unlikely they are true homologues in the evolutionary
sense.
Similar to XcGV, PxGV, LdMNPV, SeMNPV and HaSNPV,
CpGV does not encode the envelope glycoprotein GP64, the major envelope
fusion protein of AcMNPV, BmNPV and OpMNPV (Monsma et al., 1996
). However, recent evidence suggests gp64
is unique to group I NPVs (Pearson et al., 2000
). In LdMNPV, the envelope fusion protein is the
product of the Ld130 gene. This protein contains a furin-like
proprotein convertase cleavage site also conserved in its SeMNPV homologue
(IJkel et al., 2000
). Ld130
homologues are present in all baculoviruses that have been completely
sequenced, including species that contain gp64. The role of the
Ld130 homologue in the latter species is unclear (Pearson et
al., 2000
). CpGV encodes an Ld130
homologue, Cp31, which shows 30 % amino acid identity to
Ld130.
Auxiliary genes
Aside from structural genes or those directly implicated in
DNA replication and transcription, baculoviruses have other genes that
reduce their dependence on the host or enhance their fitness in other ways
(O'Reilly, 1997
). Among them, ubiquitin is
the most conserved and is present in all sequenced genomes. The main
function of cellular ubiquitin is to signal protein degradation (Haas
et al., 1996
). Viral ubiquitin is
non-essential and its role is unclear (Reilly & Guarino, 1996
). Superoxide dismutase (sod) is also
well conserved and present in all sequenced baculovirus genomes. This is
presumably involved in the removal of free radicals, but is non-essential
(Tomalski et al., 1991
) and its role in the virus life-cycle is not known. As
previously reported, chitinase (Cp10) and cathepsin (Cp11)
are also well conserved (Kang et al., 1998
). These genes function together to promote liquefaction of
the host. Other genes in CpGV include gp37 (Cp13) and fibroblast
growth factor, fgf (Cp123). Hashimoto et al. (2000
) report that PxGV has two fgf
homologues, Px56 and Px104. Cp123 is a homologue of Px104. CpGV also
has a homologue of Px56 (Cp76) but this does not show significant
similarity to fgf. No enhancin homologue is present in CpGV
or PxGV. In contrast, four enhancin homologues are present in XcGV and two
in LdMNPV. Enhancin is a metalloproteinase and evidence suggests that it
may digest components of the insect peritrophic membrane, facilitating the
initiation of infection (Derksen & Granados, 1988
; Wang & Granados, 1998
). As noted, CpGV encodes a GP37 homologue (Cp13). GP37
(spindlin) is related to the fusolins of entomopoxviruses, which also act
as enhancing factors (Yuen et al., 1990
). Both CpGV and PxGV lack a conotoxin-like,
ctl, homologue (Eldridge et al., 1992
). Such a gene is present in XcGV but lacking in BmNPV,
SeMNPV and HaSNPV. Its biological role is unknown. Cp9 and Cp79 are
homologous to Ac145 and Ac150. Members of this gene family, which contain
a six-cysteine motif similar to chitin-binding proteins, are also found in
entomopoxviruses (Dall et al., 2001
).
Genes implicated in phosphorylation and dephosphorylation
CpGV possesses genes
encoding a protein kinase (PK) (Cp3) and two protein tyrosine phosphatases
(PTP) (Cp66 and Cp98). Phosphorylation is a common mechanism for
regulating protein activity and several baculovirus proteins, such as IE-1
and P78/83, are known to be phosphorylated (Choi & Guarino, 1995
; Vialard & Richardson, 1993
). CpGV PK is homologous to AcMNPV PK-1, which
is present in many other baculoviruses. No homologue of AcMNPV PK-2 was
found in CpGV. This gene has only been described in AcMNPV and BmNPV and
is non-essential (Ahrens et al., 1997
; Ayres et al., 1994
). Two lineages of dual specificity PTPs (dsPTPs) have been
identified in baculoviruses. OpMNPV encodes one copy of each whereas other
baculoviruses encode one or none. CpGV-M1 is unique in encoding two
homologues of dsPTP-2. However, one of these, Cp66, is truncated and only
encodes the C-terminal end of a dsPTP. It is unclear whether this encodes
a functional dsPTP, although it does include the catalytic loop
[HCXXGXXR(S/T)]. Cp66 is more similar to Cp98 (54 % amino acid identity)
than it is to any other dsPTP, suggesting they may derive from a
duplication in CpGV. No dsPTP-1 is present in CpGV. In contrast, no dsPTP
is present in XcGV, PxGV, LdMNPV or HaSNPV (Chen et al., 2001
; Hashimoto et al., 2000
; Hayakawa et al., 1999
; Kuzio et al., 1999
).
Inhibitors of apoptosis
Programmed cell death is triggered early in baculovirus
infection and, to counter this, baculoviruses encode proteins that inhibit
apoptosis, such as P35 and IAP (inhibitors of apoptosis). P35 homologues
have only been described in AcMNPV, BmNPV and Spodoptera
littoralis NPV, whereas iap genes are present in all
sequenced baculovirus genomes and in many other baculoviruses (Clem,
1997
). The first iap, now referred to as
iap-3, was identified in CpGV by complementation of AcMNPV
p35 mutants (Crook et al., 1993
). IAP homologues generally contain two baculovirus IAP
repeats (BIR) (Birnbaum et al., 1994
), which are associated with binding to apoptosis-inducing
proteins (Vucic et al., 1997
), and a C-terminal zinc finger-like (RING) Cys/His motif.
Two iaps, iap-1 and iap-2, are present in AcMNPV and
BmNPV but their functions are poorly understood (Griffiths et al.,
1999
). Epiphyas postvittana MNPV
encodes four iap genes. Two of these, iap-1 and
iap-2, were shown to possess anti-apoptotic activity,
whereas no anti-apoptotic activity was demonstrated for iap-3
or iap-4 (Maguire et al., 2000
). In addition to iap-3 (Cp17), our sequence
analysis has now revealed two additional CpGV iap genes (Cp94 and
Cp116). However, Cp94 has a single BIR motif. The relationships between
baculovirus iaps were explored by phylogenetic analyses based on
the BIR and RING motifs (Fig. 3). Insect IAPs from
Spodoptera frugiperda and Trichoplusia ni were included in
these analyses. These showed most similarity to the IAP-3 group. CpGV IAPs
did not group together, suggesting they do not derive from duplications in
CpGV. Cp-IAP-3 clearly grouped with the IAP-3 group. However, Cp94 was
ambiguous in these trees, generally grouping either with IAP-1 or IAP-3
sequences, but never with IAP-2s. Thus, Cp94 could not be unambiguously
included in any particular iap group. The same was true for
Ld-IAP-3, which, although numbered IAP-3, does not clearly belong to any
group. This could be due to the fact that Ld-IAP-3 is truncated, and thus
difficult to classify. In contrast, Cp116 grouped strongly with Xc-IAP,
Px-IAP and a T. ni GV IAP. These sequences form a well supported
clade that does not group strongly with any previously recognized
iap group. Thus, we propose to designate this the iap-5
group. Furthermore, Cp116, Xc-iap and Px-iap are located in
homologous positions on their respective genomes and thus, it is highly
likely they are true homologues.
Fig. 3. Phylogenetic analysis of the
baculovirus IAP gene family. The figure shows a strict consensus tree of
the two most parsimonious trees found using 174 characters from the BIR
and RING-finger domains, of which 150 were parsimony-informative. The tree
has 1647 steps, a consistency index of 0.574 and a retention index of
0.678 after removing the uninformative sites. The tree search strategy
adopted involved a heuristic algorithm with 10 random addition-sequence
replicates, TBR branch swapping, and used the protpars matrix. Gaps were
considered as missing data. Bootstrap support (%, n=1000) is
indicated above the nodes. Bootstrap values lower than 50 % are omitted.
Ac, AcMNPV; Bm, BmNPV; Busu, Buzura suppressaria NPV; Cf,
Choristoneura fumiferana NPV; Cp, CpGV; Eppo, Epiphyas
postvittana NPV; Ha, HaSNPV; Ld, LdMNPV; Op, OpMNPV; Px, PxGV; Se,
SeMNPV; Xc, XcGV. The major clades of IAP genes are indicated on the
right-hand side.
Baculovirus repeated ORFs (bro genes)
A striking feature of many
baculovirus genomes is the presence of the bro gene family
(baculovirus repeated ORFs) (Kuzio et al., 1999
). This family appears to be widespread, highly
repetitive and highly conserved. The number of copies varies from none in
PxGV to 16 in LdMNPV. Most bro genes have a related core in their
N-terminal region but show differing degrees of similarity elsewhere.
Kuzio et al. (1999
) divided the family into
four groups based on the variable regions. Comparison of the bro
genes suggests that they are the result of several gene duplication events
(Kuzio et al., 1999
). Ac-bro,
Bm-bro-d and Ld-bro-n are the most highly conserved genes
between these viruses (82 % amino acid sequence identity). The function of
these genes is not yet clear, although it has been shown that they bind to
DNA (Zemskov et al., 2000
). BmNPV bro genes have been characterized at the
transcriptional and translational levels (Kang et al., 1999
). They are transcribed early and are localized
in both the nucleus and cytoplasm (Zemskov et al., 2000
). A single bro-related ORF (Cp63) was
identified in CpGV. This is a truncated gene encoding a 55 amino acid
protein that contains most of the core described by Kuzio et al.
(1999
). Cp-bro is most similar to
the group III bro genes, whereas it does not show significant
similarity to the OpMNPV bro genes. It is interesting to note that
further similarities to the core sequence can be identified upstream of
the predicted Cp63 start codon but there is no in-frame Met codon
preceding these sequences (Fig. 4). The sequence of
this region has been thoroughly verified and we are confident this is not
an artefact. We can find no homology to either the N terminus or the C
terminus of bro genes elsewhere in the genome, arguing against the
possibility that Cp63 represents an exon of a spliced bro
homologue. Consistent with this, Cp63 is not flanked by recognizable
consensus splice sites. It seems most likely that Cp63 represents a
bro homologue that has been truncated at both the N- and C-termini.
It remains to be seen whether it is functional.
Fig. 4. Alignment of BRO core sequences. The
figure shows a ClustalW alignment (using default parameters) of BRO core
sequences, including eight residues upstream from the start codon of Cp63
(indicated by the asterisk). Conserved residues are shaded in black
(identical amino acids) or grey (similar amino acids). Amino acid residue
numbers are given on the sides of the figure.
Repeated sequences
Repeated sequence elements, known as hrs, have been
identified in all baculoviruses sequenced to date (Hayakawa et al.,
2000
). These sequences are thought to
function as origins of DNA replication (ori) and as enhancers of early
gene transcription (Cochran & Faulkner, 1983
; Guarino & Summers, 1986
; Kool et al., 1995
; Theilmann & Stewart, 1992
; Xie et al., 1995
). An individual NPV hr typically comprises repeats
of a 6080 bp unit centred around a palindrome. The putative
hrs identified recently in HaSNPV (Chen et al., 2001
) deviate from this pattern somewhat in that
they are thought to include two distinct repeat motifs. Nonetheless, like
all classical hrs, each HaSNPV hr element comprises multiple
tandem repeats of these repeat units. NPVs have several hrs
dispersed around the genome, with variable numbers of the repeat unit. A
second type of replication origin, a non-hr ori, has been
identified in some NPVs (Habib & Hasnain, 2000
; Heldens et al., 1997
; Kool et al., 1994
). These are complex structures, comprising multiple direct
and inverted repeats within a region spanning 8004000 bp. They
appear to be present only once per genome.
The situation in GVs is less clear. Hayakawa et
al. (1999
) identified nine putative
hrs in XcGV. These are quite different from NPV hrs. They
comprise three to six direct repeats of a 120 bp element that lacks a
palindromic core. PxGV hrs seem more like NPV hrs, in that
the repeat unit is centred around a palindrome, although this is shorter
than that found in NPVs (15 bp vs approximately 30 bp). The repeat unit of
approximately 105 bp is larger than that typically found in NPVs. Four
large hrs are present in PxGV, containing 10 to 26 copies of the
repeat unit.
To identify CpGV hrs, the entire sequence was compared to itself and its complement by dot matrix analyses, using a 30 bp window allowing up to 35 % mismatches. A major region of repeated sequences, spanning 1.13 kbp, was identified from 20.1 to 21.2 kbp. Three ORFs (Cp2527) are within this region. None of them have known homologues and it is not known whether they are transcribed. This region includes several different classes of repeat (Fig. 5 A) and is reminiscent of the non-hr-type ori. It is not repeated elsewhere in the genome. An AT-rich section comprises six short imperfect repeats of consensus TTTTTATAATNATAATACA. This is followed by three large imperfect repeats, each of which we have subdivided into three sections (Fig. 5 A, B). In repeat 1, section B is extended by multiple repeats of parts of its 5´ end (designated section b). Within repeat 3, sections B and C are separated by 141 nt that are not repeated elsewhere.
Fig. 5. CpGV repeat sequences. (A) Graphical
representation of the CpGV major repeat region. The AT-rich repeats are
represented by open boxes. Sections A, B and C of the large repeats are
indicate by stippled, horizontal striped and diagonal striped boxes,
respectively. The multiple repetitions of part of section B within repeat
1 are indicated by the horizontal-striped box labelled b. (B) Alignment of
repeats 13 within the CpGV major repeat region. Sections A, b, B and
C are indicated. A PvuII restriction site in section C is
indicated. (C) ClustalW alignment (using default parameters) of CpGV
dispersed repeat sequences. Nucleotides conserved in at least 10 of the
sequences are shaded.
Surprisingly, these analyses did not identify any other regions of the genome containing multiple tandemly repeated sequences. We were readily able to identify hrs in other baculoviruses using this approach (data not shown). Indeed, the criteria we used (up to 35 % mismatches permitted) are considerably less stringent than those used to identify hrs in other baculoviruses. Thus, we conclude that CpGV does not possess hr elements similar to those found in other baculoviruses.
As noted above, an individual hr element
normally comprises several tandem repeats of a 60120 bp unit.
However, there are examples of potential hr elements with a single
copy of the repeat unit. Furthermore, it has been shown that a single
repeat element is sufficient for ori function (Leisy et al., 1995
). We considered the possibility that all CpGV
hrs only contain a single copy of the repeat unit. We therefore
examined the genome for all instances where a sequence of 60 bp or greater
was repeated elsewhere in the genome, permitting up to 30 % mismatches.
This identified a repeated imperfect palindrome of approximately 75 bp.
FASTA searches of this sequence against the complete genome revealed that
it is present in the genome 13 times (Fig. 5 C). It is
never present as multiple tandem repeats like a typical hr element.
However, repeats 1 and 2 are separated by 144 bp and repeats 3 and 4 by
only 90 bp (Fig. 1). All other copies of this sequence
element are widely distributed on the genome. Six of these repeats (2, 5,
7, 9, 10 and 13) are within predicted ORFs. It remains to be seen whether
these elements actually function as replication origins or as
transcription enhancers.
Fig. 6. Comparison of the CpGV gene
organization versus PxGV (A), XcGV (B) and AcMNPV (C). Homologues are
plotted based on their relative location in the genome. ORFs with no
homologues are aligned on the vertical and horizontal axes. In (A) and
(B), homologues that have been separated from their expected neighbours
are named. A bar indicates the largest contiguous stretch of CpGV unique
ORFs.
Organization of the CpGV genome
Comparison of gene arrangements
between species can give valuable information about baculovirus evolution.
In general, gene order is highly conserved between CpGV, PxGV and XcGV,
with only a small number of rearrangements (Fig. 6).
In contrast to the NPVs, there is no large genome inversion among the GVs.
On the other hand, genome organization is poorly conserved between CpGV
and AcMNPV. Similar observations have been made comparing XcGV and PxGV
with NPVs (Hashimoto et al., 2000
; Hayakawa et al., 1999
). In CpGV, 96 of 98 PxGV homologues and 100 of 108 XcGV
homologues are in a conserved position. The homologues group in
discontinuous clusters with a major gap on the CpGV genome corresponding
to Cp2428. Cp24 is homologous to pe-38 while Cp25 to Cp28
have no homologues in other baculoviruses, and represent the largest
contiguous stretch of ORFs unique to CpGV. However, as noted, these ORFs
span a large repeat sequence and may not be transcribed.
Four small regions were identified in XcGV that show
a relatively conserved gene arrangement with AcMNPV and OpMNPV. Three of
these are also present in PxGV (Hashimoto et al., 2000
; Hayakawa et al., 1999
). In CpGV, four main regions can be identified
showing a moderately conserved gene arrangement with AcMNPV. These include
three of the regions described in XcGV and PxGV and contain Cp79,
Cp8396, Cp101108 and Cp111114, corresponding to
Ac147145, Ac10389, Ac8375 and Ac6568, respectively
(Fig. 6). However, gene order within these regions is
not exactly conserved. For instance, the Ac79 homologue, which one would
expect to be around Cp105, is actually Cp65. A similar translocation is
observed for its XcGV homologue (Xc75). The gene arrangement within these
regions is also well conserved in the NPVs.
The complete sequence of CpGV has revealed many differences in gene content and arrangement compared to NPVs. Not surprisingly, it shows a greater similarity to XcGV and PxGV in terms of both gene arrangement and conservation of homologous genes. A striking feature of the CpGV genome is the lack of typical hrs, which are present in all other baculoviruses sequenced to date. This suggests it may be necessary to re-evaluate what constitutes an hr element, and/or the role they play in the virus life-cycle.
We are grateful to Salvador Carranza for help with the sequence analysis, and to Elisabeth Herniou for extensive help with the analysis of the data and critical comments on the manuscript. We thank John Kuzio for the production of Fig. 1, and George Rohrmann, Julie Olszewski and Sally Wormleaton for helpful comments on the manuscript. This work was supported in part by a grant from the BBSRC (G01317) to D.R.O'R., and by BBSRC core support to D.W.
The GenBank accession number of the CpGV genome sequence reported in this paper is U53466.
References |
Rohrmann, G. F. (1992). Baculovirus structural proteins. Journal of General Virology 73, 749761.
© 2001 SGM
This article is now available in the October 2001 print issue of JGV (vol. 82, 25312547). The complete issue of the journal may be seen in electronic form on JGV Online.
-->