LOCUS       KI669146               85783 bp    DNA     linear   CON 23-MAR-2015
DEFINITION  Necator americanus unplaced genomic scaffold
            N_americanus-1.0_Cont880, whole genome shotgun sequence.
ACCESSION   KI669146 ANCG01000000
VERSION     KI669146.1
DBLINK      BioProject: PRJNA72135
            BioSample: SAMN02953824
KEYWORDS    WGS; HIGH_QUALITY_DRAFT.
SOURCE      Necator americanus (New World hookworm)
  ORGANISM  Necator americanus
            Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida;
            Rhabditina; Rhabditomorpha; Strongyloidea; Ancylostomatidae;
            Bunostominae; Necator.
REFERENCE   1  (bases 1 to 85783)
  AUTHORS   Mitreva,M.
  TITLE     Draft genome of the hookworm Necator americanus
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 85783)
  AUTHORS   Mitreva,M., Abubucker,S., Martin,J., Minx,P., Warren,C.,
            Pepin,K.H., Palsikar,V.B., Zhang,X.W.E. and Wilson,R.K.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-APR-2013) The Genome Institute, Washington University
            School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA
COMMENT     Necator americanus is a roundworm that causes most of the human
            hookworm infections. N. americanus is a blood-feeding nematode
            infecting people in the rural areas of the tropics and subtropics
            causing an estimated disease burden of 22 million disability
            adjusted life years. The life cycle of the parasite begins when
            parasite eggs from an infected host are passed in the feces. In
            favorable environmental condition they hatch, and through 2 larval
            stages develop to the infective stage (L3). L3s penetrate human
            skin, migrate through the circulatory system and lung to finally
            reside in duodenum. The development continues and the adult stages
            attach to the intestinal mucosa and feed on blood.
            
            The strain being sequenced was obtained from the laboratory of Dr.
            Peter Hotez by Dr. Bin Zhan, originally isolated from an infected
            patient in Hunan province, China, and has been maintained in
            hamster since 1976.The adult worms were collected from intestines
            of hamsters infected subcutaneously with N. americanus L3 for 8
            weeks.  Worm isolation and DNA extraction was performed by Bin Zhan
            with QIAamp DNA mini kit according to manufacturers instructions
            (Qiagen). Jian et. al., 2003 Exp Parasitol.
            
            This assembly consists of fragments, 3kb and 8kb insert whole
            genome shotgun libraries. The sequences were generating on the
            Roch/454 platform and assembled using Newbler. To improve
            scaffolding, an in-house tool CIGA (Cdna tool for Improving Genome
            Assembly), was used to map 454 cDNA reads using blat to the genomic
            assembly to link genomic contigs based on cDNA evidence. Only joins
            confirmed by additional independent data typing were accepted.
            
            The repeat library was generated using Repeatmodeler (A.F.A. Smit,
            R. Hubley & P. Green http://repeatmasker.org). The Ribosomal RNA
            genes were identified using RNAmmer (Lagesen et. al., 2007 Nucleic
            Acids Res.) and transfer RNA's were identified with tRNAscan-SE
            (Lowe and Eddy, Nucleic Acids Res. 1997). Non-coding RNAs, such as
            microRNAs, were identified by sequence homology search of the Rfam
            database (Griffiths-Jones et. al., 2003 Nucleic Acids Res.).
            Repeats and predicted RNA's were then masked using RepeatMasker (A.
            Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding
            genes were predicted using a combination of ab initio programs Snap
            (Korf, 2004 BCM Bioinformatics), Fgenesh (Salamov A., Solovyev V.
            2000, Genome Res.) and Augustus (M. Stanke, et. al., 2008
            Bioinformatics) and the annotation pipeline tool Maker (M. Yandell
            et. al., 2007 Genomic Research) which aligns mRNA, EST and protein
            information from same species or cross-species to aid in gene
            structure determination and modifications. A consensus gene set
            from the above prediction algorithms was generated, using a
            logical, hierarchical approach developed at the Genome institute.
            Gene product naming was determined by BER
            (http://ber.sourceforge.net).
            
            Our goal is to explore this WGS draft sequence of N. americanus to
            better define proteins involved in nematode parasitism that impact
            health and disease and are relevant to both host-parasite
            relationships and basic biological processes.
            
            For information regarding this assembly or project, or any other
            GSC genome project, please visit our Genome Groups web page
            (http://genome.wustl.edu/genome_group_index.cgi) and email the
            designated contact person. For specific questions regarding the N.
            americanus genome project contact Makedonka Mitreva
            (mmitreva@genome.wustl.edu) at Washington University School of
            Medicine. The National Human Genome Research Institute (NHGRI) of
            the National Institutes of Health (NIH) provided funds for this
            project.
            
            ##Genome-Assembly-Data-START##
            Finishing Goal           :: High-Quality Draft
            Current Finishing Status :: High-Quality Draft
            Assembly Method          :: Newbler v.
                                        MapAsmResearch-04/19/2010-patch-
                                        08/17/2010
            Assembly Name            :: N_ americanus_v1
            Genome Coverage          :: 26.15x
            Sequencing Technology    :: 454
            ##Genome-Assembly-Data-END##
FEATURES             Location/Qualifiers
     source          1..85783
                     /organism="Necator americanus"
                     /mol_type="genomic DNA"
                     /submitter_seqid="N_americanus-1.0_Cont880"
                     /host="Homo sapiens"
                     /db_xref="taxon:51031"
                     /chromosome="Unknown"
                     /lab_host="hamster"
                     /country="China: Hunan Province"
     gene            <2236..2913
                     /locus_tag="NECAME_15445"
     mRNA            join(<2236..2410,2477..2595,2650..2725,2801..2913)
                     /locus_tag="NECAME_15445"
                     /product="shTK domain protein"
     CDS             join(<2236..2410,2477..2595,2650..2725,2801..2913)
                     /locus_tag="NECAME_15445"
                     /inference="protein motif:HMMPfam:IPR003582"
                     /codon_start=1
                     /product="shTK domain protein"
                     /protein_id="ETN69203.1"
                     /db_xref="InterPro:IPR003582"
                     /translation="KKFSLVTRTISSKFLACKDRTNRKTGISECPFRAGLCDIPIYSR
                     IMTVQCPRTCGKCPGQQPLTKTACVDLVNPATNSSECSSRIDLCMDPVYQDVMMKQCR
                     KTCGFCSSTTNKTAVINPAMRGNLKKVFPRRHQKNVQNDNKTYFLKITLQDFIGNGMR
                     "
     assembly_gap    3835..4085
                     /estimated_length=251
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    5173..5272
                     /estimated_length=100
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     gene            complement(5776..8072)
                     /locus_tag="NECAME_15446"
     mRNA            complement(join(5776..5903,5968..6079,6135..6244,
                     6641..6790,6846..6902,7760..7906,7982..8072))
                     /locus_tag="NECAME_15446"
                     /product="shTK domain protein"
     CDS             complement(join(5776..5903,5968..6079,6135..6244,
                     6641..6790,6846..6902,7760..7906,7982..8072))
                     /locus_tag="NECAME_15446"
                     /inference="protein motif:HMMPfam:IPR003582"
                     /codon_start=1
                     /product="shTK domain protein"
                     /protein_id="ETN69204.1"
                     /db_xref="InterPro:IPR003582"
                     /translation="MFIYSSILLYSALYISQITAQTCAAGADNGPCLNGVCFAGTTCL
                     TALDICCSDTGIIPDTTLASTVASTLASDSSVASTVTSITSASLASSATTTTSATCVD
                     KLNPRTGVSDCSMRASLCNDPTYLTVMTEQCPRTCGRCSSSSGTITTTTSTTCVDKVN
                     PRTGTSDCPMRSSLCLDSNYIALMRTECPRTCGFCTSTGSTVSGTATVATVTSATATT
                     RAAGTCVDAINPRTGVSDCPQRVSLCNDSVYRDLMQSQCPLTCGLC"
     assembly_gap    6341..6440
                     /estimated_length=100
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    8372..8697
                     /estimated_length=326
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    9474..9573
                     /estimated_length=100
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    11321..11420
                     /estimated_length=100
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     gene            complement(14817..23940)
                     /locus_tag="NECAME_15447"
     mRNA            complement(join(14817..14920,15024..15087,15218..15336,
                     21508..21650,21709..21866,23255..23352,23883..23940))
                     /locus_tag="NECAME_15447"
                     /product="oxidoreductase, short chain
                     dehydrogenase/reductase family protein"
     CDS             complement(join(14817..14920,15024..15087,15218..15336,
                     21508..21650,21709..21866,23255..23352,23883..23940))
                     /locus_tag="NECAME_15447"
                     /inference="protein motif:HMMPfam:IPR002198"
                     /codon_start=1
                     /product="oxidoreductase, short chain
                     dehydrogenase/reductase family protein"
                     /protein_id="ETN69205.1"
                     /db_xref="InterPro:IPR002198"
                     /translation="MSASYDFKGKKALVTGASKGIGCAIAAALSQAGAKVVALARDKE
                     KLEELRKNHSNISIVVGDVTSSESSLRALLAPYQPFDILINNAGTGFVEPCHSLTEDA
                     ITKQLDVNLKAPIILTKIVTSEMIRNSVRGAVVNISSQASMRPLEHHTVYCASKAGLD
                     MAARCFAKELGQYGIRVNCVNPTVVMTELGKMAWSDPNKSEPLLSQMPISRFAEIEEV
                     VNAVMFLLSDAASMTTGMALPVDGGFTQM"
     gene            complement(24404..38503)
                     /locus_tag="NECAME_15448"
     mRNA            complement(join(24404..24445,27239..27344,31317..31348,
                     31408..31558,31687..31781,32440..32523,34228..34291,
                     35268..35355,38422..38503))
                     /locus_tag="NECAME_15448"
                     /product="5' nucleotidase family protein"
     CDS             complement(join(24404..24445,27239..27344,31317..31348,
                     31408..31558,31687..31781,32440..32523,34228..34291,
                     35268..35355,38422..38503))
                     /locus_tag="NECAME_15448"
                     /inference="protein motif:HMMPfam:IPR008380"
                     /note="KEGG: dre:324845 nt5c2a, nt5c2, wu:fc36f11,
                     wu:fc45b10; 5'-nucleotidase, cytosolic IIa; K01081
                     5'-nucleotidase 6.5e-09"
                     /codon_start=1
                     /product="5' nucleotidase family protein"
                     /protein_id="ETN69206.1"
                     /db_xref="InterPro:IPR008380"
                     /translation="MMEACGFFYSVLSRAGQIWVHPQGVHSRCATEFIQRMGFAGKDI
                     LYVGDHIFGDVLKSKKVGGWRTLLIVPELDTEMKPQHHDEAREQSISSVLAVGIEAEP
                     LRIVRITDDMEAEFGAMGSMLRCGWRQTHFAAQLKKYADLYTCNVYNLIYYSGTHYFN
                     SPVLLLPHEEKILLNSVEVEATGDVSEKASKLVGEFLIRMQLKIVLAFIMLSAVVAYP
                     YQILPWQEAKKTWNKIRNKKFQNNVCHYV"
     assembly_gap    25208..26852
                     /estimated_length=1645
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    29593..29869
                     /estimated_length=277
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    31827..31926
                     /estimated_length=100
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    33144..33765
                     /estimated_length=622
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    35672..37106
                     /estimated_length=1435
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    38549..40100
                     /estimated_length=1552
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    40844..41673
                     /estimated_length=830
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    43283..43982
                     /estimated_length=700
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    46789..47463
                     /estimated_length=675
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    48412..50485
                     /estimated_length=2074
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    51698..53217
                     /estimated_length=1520
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    56221..57035
                     /estimated_length=815
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    58064..59792
                     /estimated_length=1729
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    60556..62112
                     /estimated_length=1557
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    62698..64277
                     /estimated_length=1580
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    65833..65932
                     /estimated_length=100
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    66572..68083
                     /estimated_length=1512
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    69126..69599
                     /estimated_length=474
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    70093..71204
                     /estimated_length=1112
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    72839..73233
                     /estimated_length=395
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    74455..74752
                     /estimated_length=298
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     gene            77089..84817
                     /locus_tag="NECAME_15449"
     mRNA            join(77089..77104,82265..82416,84749..84817)
                     /locus_tag="NECAME_15449"
                     /product="hypothetical protein"
     CDS             join(77089..77104,82265..82416,84749..84817)
                     /locus_tag="NECAME_15449"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="ETN69207.1"
                     /translation="MELTKRKVSHNEELCAEADLPYCQMTRILIDRIYRHHVCLKYPQ
                     KSFALESCGKDEFYILFWIFANTNLKNLQQFIKE"
     assembly_gap    78069..78837
                     /estimated_length=769
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    80667..81951
                     /estimated_length=1285
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    82540..82639
                     /estimated_length=100
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     assembly_gap    83590..84347
                     /estimated_length=758
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
CONTIG      join(ANCG01044634.1:1..3834,gap(251),ANCG01044635.1:1..1087,
            gap(100),ANCG01044636.1:1..1068,gap(100),ANCG01044637.1:1..1931,
            gap(326),ANCG01044638.1:1..776,gap(100),ANCG01044639.1:1..1747,
            gap(100),ANCG01000785.1:1..13787,gap(1645),ANCG01044640.1:1..2740,
            gap(277),ANCG01044641.1:1..1957,gap(100),ANCG01044642.1:1..1217,
            gap(622),ANCG01044643.1:1..1906,gap(1435),ANCG01044644.1:1..1442,
            gap(1552),ANCG01044645.1:1..743,gap(830),ANCG01044646.1:1..1609,
            gap(700),ANCG01044647.1:1..2806,gap(675),ANCG01044648.1:1..948,
            gap(2074),ANCG01044649.1:1..1212,gap(1520),ANCG01044650.1:1..3003,
            gap(815),ANCG01044651.1:1..1028,gap(1729),ANCG01044652.1:1..763,
            gap(1557),ANCG01044653.1:1..585,gap(1580),ANCG01044654.1:1..1555,
            gap(100),ANCG01044655.1:1..639,gap(1512),ANCG01044656.1:1..1042,
            gap(474),ANCG01044657.1:1..493,gap(1112),ANCG01044658.1:1..1634,
            gap(395),ANCG01044659.1:1..1221,gap(298),ANCG01044660.1:1..3316,
            gap(769),ANCG01044661.1:1..1829,gap(1285),ANCG01044662.1:1..588,
            gap(100),ANCG01044663.1:1..950,gap(758),ANCG01044664.1:1..1436)
//