LOCUS AACZ04011842 2641 bp DNA linear PRI 26-APR-2016 DEFINITION Pan troglodytes isolate Yerkes chimp pedigree #C0471 (Clint) Contig1730.1, whole genome shotgun sequence. ACCESSION AACZ04011842 AACZ04000000 VERSION AACZ04011842.1 DBLINK BioProject: PRJNA13184 BioSample: SAMN02981217 KEYWORDS WGS. SOURCE Pan troglodytes (chimpanzee) ORGANISM Pan troglodytes Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Pan. REFERENCE 1 (bases 1 to 2641) CONSRTM Chimpanzee Sequencing and Analysis Consortium TITLE Initial sequence of the chimpanzee genome and comparison with the human genome JOURNAL Nature 437 (7055), 69-87 (2005) PUBMED 16136131 REFERENCE 2 (bases 1 to 2641) AUTHORS Hughes,J.F., Skaletsky,H., Pyntikova,T., Minx,P.J., Graves,T., Rozen,S., Wilson,R.K. and Page,D.C. TITLE Conservation of Y-linked genes during human evolution revealed by comparative sequencing in chimpanzee JOURNAL Nature 437 (7055), 100-103 (2005) PUBMED 16136134 REMARK Erratum:[Nature. 2006 May 11;441(7090):248] REFERENCE 3 (bases 1 to 2641) AUTHORS Yang,S.P., Hillier,L.W., Chinwalla,A.T., Fulton,L.A., Huang,X. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (26-NOV-2003) Genome Sequencing Center, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA REFERENCE 4 (bases 1 to 2641) AUTHORS Warren,W. and Wilson,R.K. CONSRTM International Chimpanzee Genome Analysis Consortium TITLE Direct Submission JOURNAL Submitted (22-APR-2016) Washington University School of Medicine, McDonnell Genome Institute, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT Assembly Release Notes for chimpanzee 'Clint', version Pan tro 3.0: The chimpanzee genome (Pan troglodytes) genome was originally sequenced to 4X coverage using a male captive-born chimp of West Africa origin known as 'Clint' from the Yerkes Primate Research Center (Atlanta, USA). The revised assembly (Pan_troglodytes-2.1.4) represents an additional 2X whole genome shotgun plasmid reads which were generated as part of an improvement plan for the existing 4X chimp assembly (Pan_troglodytes-1.0). Both of these prior versions were assembled using the PCAP software (Genome Res. 13(9):2164-70 2003). A very small fragment of this assembly was complemented with BACs from two other chimpanzees. The chromosome Y sequence was finished at the McDonnell Genome Institute, Washington University School of Medicine with detailed mapping and extensive collaboration with David Page's group at the Whitehead Institute (Hughes et al., Nature, 2005 437:100-3). For a pure 'Clint' version of the chimpanzee genome we generated 55x of Illumina overlapping paired 250bp length reads, 2 Lanes of a Chicago library (Dovetail Genomics) and 9x of PacBio long single molecule reads (P5C3 chemistry). The combined Illumina sequence reads were assembled using the DiscoVAR de novo assembler (Weisenfeld NI et al., Nat Genet. 2014 46(12):1350-5). We attempted to scaffold all contigs from this assembly using in vitro HiC content mapping (Dovetail Genomics). We then filled scaffold gaps where possible using 9x PacBio reads with PBJelly (English AC et al. PloS One 2012 7:e47768). The assembly in total was corrected for residual base substitution and small insertion and deletion errors using mapped 'Clint' paired end 250bp reads with Raccoon (Kuderna et al., unpublished). The de novo assembly is made up of a total of 3,554 non-singleton scaffolds with an N50 scaffold length of 27Mb (N50 contig length was 334kb). The total assembled size is 3.02Gb. To create a chromosomal version of the Pan_tro 3.0 assembly we first used Nucmer-aligned assembled scaffold sequences to Pan tro 2.1.4 and human GRCh38 references to initially order and orient along the Pan tro 2.1.4 chromosomes. The assembled Pan_tro 3.0 genome was also broken into 1kb segments and then aligned against the chimpanzee Pan tro 2.1.4 and human genomes using BLAT (Kent 2002) to identify uniquely aligning segments of the chimpanzee genome to aid in identifying breakpoints and confirm alignment localization. Aligned paired end discordance of 'Clint' fosmid end sequences revealed misassembly events that were manually corrected. In the final phase only finished BAC clones from the male 'Clint' chimpanzee were integrated into the assembly. Finally, centromeres were placed along each chromosome using the localization data from human. There are 2.95 Gb bases (including Ns in gaps) on ordered/oriented chromosomes, 140 Mb on the chr*_random, and 123 Mb on chromosome Un. The scaffold N50 length is 27 Mb (count=39) and the contig N50 length is 334kb (count=2503). This draft assembly is referred to as Pan_tro 3.0. Credits: Funding - The sequence characterization of the chimp genome was provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH) and Spanish grant MINECO BFU2014-55090-P (FEDER). BAC sequencing - McDonnell Genome Institute at Washington University School of Medicine, St Louis, MO Sequence generation, assembly and data integration for creation of chromosomal AGP files - Lukas Kuderna and Tomas Marques-Bonet, ICREA at Institut de Biologia Evolutiva, (UPF-CSIC), PRBB, 08003 Barcelona, Spain, and LaDeana Hillier, McDonnell Genome Institute at Washington University School of Medicine, St Louis, MO. Wes Warren, Lars Feuk (Uppsala U), Andrew Sharp (Mt Sinai) , Ed Green (Dovetail), Mikkel Schierup (Aarhus U). Thanks also to Illumina (Bojan Obradovic). ##Genome-Assembly-Data-START## Assembly Method :: DiscoVar v. 51280; PBJelly v. 14.9.9 Assembly Name :: Pan_tro 3.0 Genome Coverage :: 6x Sanger; 55x Illumina; 9x PacBio Sequencing Technology :: Sanger; Illumina; PacBio ##Genome-Assembly-Data-END## FEATURES Location/Qualifiers source 1..2641 /organism="Pan troglodytes" /mol_type="genomic DNA" /isolate="Yerkes chimp pedigree #C0471 (Clint)" /db_xref="taxon:9598" /sex="male" BASE COUNT 555 a 771 c 756 g 559 t ORIGIN 1 ctgggagagg gtccccccat ttcacaggca aggaagttca tagagggtct ccccatttca 61 cagacaagga agttcgaaga gggtccccca tttcatacgc aaggaagctg ggagagggtc 121 cccccatttc ataggcaagg aagtttggag agggtccccc catttcacag acaaggagct 181 gggagagcgt ccccccattt cacaggcaag gaagttcaga gagggtcccc ccatttcaca 241 gacaaggaag ttcggagagg gtccccccat ttcacaggca aggaagttca gagagggtcc 301 tcccatttca cagacgaagt tcggagaggg tccccccatt tcataggcaa ggaagctcgg 361 agagggtccc cccatttcat aggcaaggaa gctgggagag gatcccccat ttcataggca 421 aggaagctgg gagagggtcc cccatttcat aggcaaggaa gctcggagag ggtcccccca 481 tttcataggc aaggaagctg ggagagggtc cccccatttc ataggcaagg aggctgggag 541 agtgtccccc catttcatag gcaaggaagc tcggagaggg ccagtgcttt ttaacagtca 601 cgcagttctg agcatcaggc tgtcagcctg gagcctggct cggtactgag gttcccagga 661 tgtataaaga actgaacacg cacctttgct gttggctctt ctgtgcgagt gctctgagag 721 ttttctgtac tcccagcaag tttgggttgg tacagattca gcacagcgaa tctgaccatg 781 tggtgaccac aagcacgcgc ttactgtgcg cctaagttca cactccctga agaagctcct 841 gccccaacgc cgcagcctat gggcctttgc gtctacgcag gaccatgagg ctgtgtcccc 901 gagcctggcc cggggagtaa ctggcatggc ctgcggtgct gggggtgtgt ccacaggagc 961 tggtgccaca gaggacacgt gtgtctggtg tcttcctgga ctcggtcgga ccagtttccc 1021 agtgtgtccg tgagggtgac actgagtgag acttctggga tccacacctt tcctgggacc 1081 ctcactcttc ctgccacaga aagcacctgt gaaagtctga gatcaagaat gtgcacaggg 1141 gaggtgtctg cgggtcaagg tcctgagctc atgtctcaca ggcctctggt agagacgaga 1201 ttgttcgtgt ctctgcgaag aaggccatgt tttattgcag ccaagtacaa agtcctctct 1261 gccactcgct ggctgggact ggggtcatgt tctgtcaggt tccaaagagc gccaggaaga 1321 ctctgagacc acagagcacg gcctgctttc tcccagcccc tggtaaagat gtccagcctg 1381 ggacctcccc ctgtatatag tgccttaagt gggagaggac caccacccag atcaggggag 1441 gacctgtccc ctgcaatgct gctgcctgct ttccagattg acttgggagt gcttgtgctg 1501 acggagcatg ttctgtggtt tttctctttc ttccctctgt tgttgtgcct tttccgtagg 1561 tacctgcacc agtcaggggc cctgaccatg gaggccctgg aggacccttc ccccgagctc 1621 atggagggcc cagaggagga cattgctgac aaggtaggcc ctggagggct gggtaggtgg 1681 caagtagggg atttagaaca cagccacgcc taagggcctc tgcagacacc ccgggaggtg 1741 gggacagcac agccggaggt gaccccgtgt cctccagggc tccccagact gtccgtccag 1801 ccccgatgct ctttggcagg tttccccaag gggtctcgtg gccatgtgag aaaaaggagt 1861 cttcctgttg tgcacgaagg gccagctggg aggagtggac tgggcagtga gtgagcaccc 1921 tcgggtatcc cacccccact gtgtctgagt cgggctgggg gacacccaga cattcagtcc 1981 accaggcccg tggagatgcg acctggggca cccatgattt gggagaaaag ggctggccct 2041 gcagttactg cagagccaac agcctgagga acgggcttct cctggggcct tgaatggaat 2101 gggtgacagc agccagggca aggcagcctt gccgtggtca agacccgctt ttcagccgga 2161 tgtggtggct cacacctgga atcccagcac attaggaggc ctaggcgggc agatcacttg 2221 aggtcaggag tttgagatca gcctggccaa catagtgaaa acccgtctct actaaaaata 2281 caaaagttag ccaggcatgg tggcgggcaa ctgtaatccc agctatttgg gaggctgagg 2341 caggagaatc acttgaacct gggaggtgga ggttgcagtg agccgagatt gcaccactac 2401 actccagcct gggcgacaga gggagactcc gtctcacctt ctgcaccccc tcaccttctg 2461 tgaccccttc acttcctcct gtaccccctc aattcctcct gtacccctca tctcctcctg 2521 taccccctca ccacctcctg taccccctca ctacctcctg taccccctca cctcttcctg 2581 cactccctca cctcctgcat ccccctcacc tcctataccc catctcctcc tgtaccccct 2641 c //