LOCUS KI669146 85783 bp DNA linear CON 23-MAR-2015 DEFINITION Necator americanus unplaced genomic scaffold N_americanus-1.0_Cont880, whole genome shotgun sequence. ACCESSION KI669146 ANCG01000000 VERSION KI669146.1 DBLINK BioProject: PRJNA72135 BioSample: SAMN02953824 KEYWORDS WGS; HIGH_QUALITY_DRAFT. SOURCE Necator americanus (New World hookworm) ORGANISM Necator americanus Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Strongyloidea; Ancylostomatidae; Bunostominae; Necator. REFERENCE 1 (bases 1 to 85783) AUTHORS Mitreva,M. TITLE Draft genome of the hookworm Necator americanus JOURNAL Unpublished REFERENCE 2 (bases 1 to 85783) AUTHORS Mitreva,M., Abubucker,S., Martin,J., Minx,P., Warren,C., Pepin,K.H., Palsikar,V.B., Zhang,X.W.E. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (17-APR-2013) The Genome Institute, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT Necator americanus is a roundworm that causes most of the human hookworm infections. N. americanus is a blood-feeding nematode infecting people in the rural areas of the tropics and subtropics causing an estimated disease burden of 22 million disability adjusted life years. The life cycle of the parasite begins when parasite eggs from an infected host are passed in the feces. In favorable environmental condition they hatch, and through 2 larval stages develop to the infective stage (L3). L3s penetrate human skin, migrate through the circulatory system and lung to finally reside in duodenum. The development continues and the adult stages attach to the intestinal mucosa and feed on blood. The strain being sequenced was obtained from the laboratory of Dr. Peter Hotez by Dr. Bin Zhan, originally isolated from an infected patient in Hunan province, China, and has been maintained in hamster since 1976.The adult worms were collected from intestines of hamsters infected subcutaneously with N. americanus L3 for 8 weeks. Worm isolation and DNA extraction was performed by Bin Zhan with QIAamp DNA mini kit according to manufacturers instructions (Qiagen). Jian et. al., 2003 Exp Parasitol. This assembly consists of fragments, 3kb and 8kb insert whole genome shotgun libraries. The sequences were generating on the Roch/454 platform and assembled using Newbler. To improve scaffolding, an in-house tool CIGA (Cdna tool for Improving Genome Assembly), was used to map 454 cDNA reads using blat to the genomic assembly to link genomic contigs based on cDNA evidence. Only joins confirmed by additional independent data typing were accepted. The repeat library was generated using Repeatmodeler (A.F.A. Smit, R. Hubley & P. Green http://repeatmasker.org). The Ribosomal RNA genes were identified using RNAmmer (Lagesen et. al., 2007 Nucleic Acids Res.) and transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy, Nucleic Acids Res. 1997). Non-coding RNAs, such as microRNAs, were identified by sequence homology search of the Rfam database (Griffiths-Jones et. al., 2003 Nucleic Acids Res.). Repeats and predicted RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding genes were predicted using a combination of ab initio programs Snap (Korf, 2004 BCM Bioinformatics), Fgenesh (Salamov A., Solovyev V. 2000, Genome Res.) and Augustus (M. Stanke, et. al., 2008 Bioinformatics) and the annotation pipeline tool Maker (M. Yandell et. al., 2007 Genomic Research) which aligns mRNA, EST and protein information from same species or cross-species to aid in gene structure determination and modifications. A consensus gene set from the above prediction algorithms was generated, using a logical, hierarchical approach developed at the Genome institute. Gene product naming was determined by BER (http://ber.sourceforge.net). Our goal is to explore this WGS draft sequence of N. americanus to better define proteins involved in nematode parasitism that impact health and disease and are relevant to both host-parasite relationships and basic biological processes. For information regarding this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. For specific questions regarding the N. americanus genome project contact Makedonka Mitreva (mmitreva@genome.wustl.edu) at Washington University School of Medicine. The National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) provided funds for this project. ##Genome-Assembly-Data-START## Finishing Goal :: High-Quality Draft Current Finishing Status :: High-Quality Draft Assembly Method :: Newbler v. MapAsmResearch-04/19/2010-patch- 08/17/2010 Assembly Name :: N_ americanus_v1 Genome Coverage :: 26.15x Sequencing Technology :: 454 ##Genome-Assembly-Data-END## FEATURES Location/Qualifiers source 1..85783 /organism="Necator americanus" /mol_type="genomic DNA" /submitter_seqid="N_americanus-1.0_Cont880" /host="Homo sapiens" /db_xref="taxon:51031" /chromosome="Unknown" /lab_host="hamster" /country="China: Hunan Province" gene <2236..2913 /locus_tag="NECAME_15445" mRNA join(<2236..2410,2477..2595,2650..2725,2801..2913) /locus_tag="NECAME_15445" /product="shTK domain protein" CDS join(<2236..2410,2477..2595,2650..2725,2801..2913) /locus_tag="NECAME_15445" /inference="protein motif:HMMPfam:IPR003582" /codon_start=1 /product="shTK domain protein" /protein_id="ETN69203.1" /db_xref="InterPro:IPR003582" /translation="KKFSLVTRTISSKFLACKDRTNRKTGISECPFRAGLCDIPIYSR IMTVQCPRTCGKCPGQQPLTKTACVDLVNPATNSSECSSRIDLCMDPVYQDVMMKQCR KTCGFCSSTTNKTAVINPAMRGNLKKVFPRRHQKNVQNDNKTYFLKITLQDFIGNGMR " assembly_gap 3835..4085 /estimated_length=251 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 5173..5272 /estimated_length=100 /gap_type="within scaffold" /linkage_evidence="paired-ends" gene complement(5776..8072) /locus_tag="NECAME_15446" mRNA complement(join(5776..5903,5968..6079,6135..6244, 6641..6790,6846..6902,7760..7906,7982..8072)) /locus_tag="NECAME_15446" /product="shTK domain protein" CDS complement(join(5776..5903,5968..6079,6135..6244, 6641..6790,6846..6902,7760..7906,7982..8072)) /locus_tag="NECAME_15446" /inference="protein motif:HMMPfam:IPR003582" /codon_start=1 /product="shTK domain protein" /protein_id="ETN69204.1" /db_xref="InterPro:IPR003582" /translation="MFIYSSILLYSALYISQITAQTCAAGADNGPCLNGVCFAGTTCL TALDICCSDTGIIPDTTLASTVASTLASDSSVASTVTSITSASLASSATTTTSATCVD KLNPRTGVSDCSMRASLCNDPTYLTVMTEQCPRTCGRCSSSSGTITTTTSTTCVDKVN PRTGTSDCPMRSSLCLDSNYIALMRTECPRTCGFCTSTGSTVSGTATVATVTSATATT RAAGTCVDAINPRTGVSDCPQRVSLCNDSVYRDLMQSQCPLTCGLC" assembly_gap 6341..6440 /estimated_length=100 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 8372..8697 /estimated_length=326 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 9474..9573 /estimated_length=100 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 11321..11420 /estimated_length=100 /gap_type="within scaffold" /linkage_evidence="paired-ends" gene complement(14817..23940) /locus_tag="NECAME_15447" mRNA complement(join(14817..14920,15024..15087,15218..15336, 21508..21650,21709..21866,23255..23352,23883..23940)) /locus_tag="NECAME_15447" /product="oxidoreductase, short chain dehydrogenase/reductase family protein" CDS complement(join(14817..14920,15024..15087,15218..15336, 21508..21650,21709..21866,23255..23352,23883..23940)) /locus_tag="NECAME_15447" /inference="protein motif:HMMPfam:IPR002198" /codon_start=1 /product="oxidoreductase, short chain dehydrogenase/reductase family protein" /protein_id="ETN69205.1" /db_xref="InterPro:IPR002198" /translation="MSASYDFKGKKALVTGASKGIGCAIAAALSQAGAKVVALARDKE KLEELRKNHSNISIVVGDVTSSESSLRALLAPYQPFDILINNAGTGFVEPCHSLTEDA ITKQLDVNLKAPIILTKIVTSEMIRNSVRGAVVNISSQASMRPLEHHTVYCASKAGLD MAARCFAKELGQYGIRVNCVNPTVVMTELGKMAWSDPNKSEPLLSQMPISRFAEIEEV VNAVMFLLSDAASMTTGMALPVDGGFTQM" gene complement(24404..38503) /locus_tag="NECAME_15448" mRNA complement(join(24404..24445,27239..27344,31317..31348, 31408..31558,31687..31781,32440..32523,34228..34291, 35268..35355,38422..38503)) /locus_tag="NECAME_15448" /product="5' nucleotidase family protein" CDS complement(join(24404..24445,27239..27344,31317..31348, 31408..31558,31687..31781,32440..32523,34228..34291, 35268..35355,38422..38503)) /locus_tag="NECAME_15448" /inference="protein motif:HMMPfam:IPR008380" /note="KEGG: dre:324845 nt5c2a, nt5c2, wu:fc36f11, wu:fc45b10; 5'-nucleotidase, cytosolic IIa; K01081 5'-nucleotidase 6.5e-09" /codon_start=1 /product="5' nucleotidase family protein" /protein_id="ETN69206.1" /db_xref="InterPro:IPR008380" /translation="MMEACGFFYSVLSRAGQIWVHPQGVHSRCATEFIQRMGFAGKDI LYVGDHIFGDVLKSKKVGGWRTLLIVPELDTEMKPQHHDEAREQSISSVLAVGIEAEP LRIVRITDDMEAEFGAMGSMLRCGWRQTHFAAQLKKYADLYTCNVYNLIYYSGTHYFN SPVLLLPHEEKILLNSVEVEATGDVSEKASKLVGEFLIRMQLKIVLAFIMLSAVVAYP YQILPWQEAKKTWNKIRNKKFQNNVCHYV" assembly_gap 25208..26852 /estimated_length=1645 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 29593..29869 /estimated_length=277 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 31827..31926 /estimated_length=100 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 33144..33765 /estimated_length=622 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 35672..37106 /estimated_length=1435 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 38549..40100 /estimated_length=1552 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 40844..41673 /estimated_length=830 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 43283..43982 /estimated_length=700 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 46789..47463 /estimated_length=675 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 48412..50485 /estimated_length=2074 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 51698..53217 /estimated_length=1520 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 56221..57035 /estimated_length=815 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 58064..59792 /estimated_length=1729 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 60556..62112 /estimated_length=1557 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 62698..64277 /estimated_length=1580 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 65833..65932 /estimated_length=100 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 66572..68083 /estimated_length=1512 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 69126..69599 /estimated_length=474 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 70093..71204 /estimated_length=1112 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 72839..73233 /estimated_length=395 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 74455..74752 /estimated_length=298 /gap_type="within scaffold" /linkage_evidence="paired-ends" gene 77089..84817 /locus_tag="NECAME_15449" mRNA join(77089..77104,82265..82416,84749..84817) /locus_tag="NECAME_15449" /product="hypothetical protein" CDS join(77089..77104,82265..82416,84749..84817) /locus_tag="NECAME_15449" /codon_start=1 /product="hypothetical protein" /protein_id="ETN69207.1" /translation="MELTKRKVSHNEELCAEADLPYCQMTRILIDRIYRHHVCLKYPQ KSFALESCGKDEFYILFWIFANTNLKNLQQFIKE" assembly_gap 78069..78837 /estimated_length=769 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 80667..81951 /estimated_length=1285 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 82540..82639 /estimated_length=100 /gap_type="within scaffold" /linkage_evidence="paired-ends" assembly_gap 83590..84347 /estimated_length=758 /gap_type="within scaffold" /linkage_evidence="paired-ends" CONTIG join(ANCG01044634.1:1..3834,gap(251),ANCG01044635.1:1..1087, gap(100),ANCG01044636.1:1..1068,gap(100),ANCG01044637.1:1..1931, gap(326),ANCG01044638.1:1..776,gap(100),ANCG01044639.1:1..1747, gap(100),ANCG01000785.1:1..13787,gap(1645),ANCG01044640.1:1..2740, gap(277),ANCG01044641.1:1..1957,gap(100),ANCG01044642.1:1..1217, gap(622),ANCG01044643.1:1..1906,gap(1435),ANCG01044644.1:1..1442, gap(1552),ANCG01044645.1:1..743,gap(830),ANCG01044646.1:1..1609, gap(700),ANCG01044647.1:1..2806,gap(675),ANCG01044648.1:1..948, gap(2074),ANCG01044649.1:1..1212,gap(1520),ANCG01044650.1:1..3003, gap(815),ANCG01044651.1:1..1028,gap(1729),ANCG01044652.1:1..763, gap(1557),ANCG01044653.1:1..585,gap(1580),ANCG01044654.1:1..1555, gap(100),ANCG01044655.1:1..639,gap(1512),ANCG01044656.1:1..1042, gap(474),ANCG01044657.1:1..493,gap(1112),ANCG01044658.1:1..1634, gap(395),ANCG01044659.1:1..1221,gap(298),ANCG01044660.1:1..3316, gap(769),ANCG01044661.1:1..1829,gap(1285),ANCG01044662.1:1..588, gap(100),ANCG01044663.1:1..950,gap(758),ANCG01044664.1:1..1436) //