LOCUS ETN68992.1 1990 aa PRT CON 23-MAR-2015 DEFINITION Necator americanus myosin head protein. ACCESSION KI669176-32 PROTEIN_ID ETN68992.1 SOURCE Necator americanus (New World hookworm) ORGANISM Necator americanus Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Strongyloidea; Ancylostomatidae; Bunostominae; Necator. REFERENCE 1 (bases 1 to 1237885) AUTHORS Mitreva,M. TITLE Draft genome of the hookworm Necator americanus JOURNAL Unpublished REFERENCE 2 (bases 1 to 1237885) AUTHORS Mitreva,M., Abubucker,S., Martin,J., Minx,P., Warren,C., Pepin,K.H., Palsikar,V.B., Zhang,X.W.E. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (17-APR-2013) The Genome Institute, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT Necator americanus is a roundworm that causes most of the human hookworm infections. N. americanus is a blood-feeding nematode infecting people in the rural areas of the tropics and subtropics causing an estimated disease burden of 22 million disability adjusted life years. The life cycle of the parasite begins when parasite eggs from an infected host are passed in the feces. In favorable environmental condition they hatch, and through 2 larval stages develop to the infective stage (L3). L3s penetrate human skin, migrate through the circulatory system and lung to finally reside in duodenum. The development continues and the adult stages attach to the intestinal mucosa and feed on blood. The strain being sequenced was obtained from the laboratory of Dr. Peter Hotez by Dr. Bin Zhan, originally isolated from an infected patient in Hunan province, China, and has been maintained in hamster since 1976.The adult worms were collected from intestines of hamsters infected subcutaneously with N. americanus L3 for 8 weeks. Worm isolation and DNA extraction was performed by Bin Zhan with QIAamp DNA mini kit according to manufacturers instructions (Qiagen). Jian et. al., 2003 Exp Parasitol. This assembly consists of fragments, 3kb and 8kb insert whole genome shotgun libraries. The sequences were generating on the Roch/454 platform and assembled using Newbler. To improve scaffolding, an in-house tool CIGA (Cdna tool for Improving Genome Assembly), was used to map 454 cDNA reads using blat to the genomic assembly to link genomic contigs based on cDNA evidence. Only joins confirmed by additional independent data typing were accepted. The repeat library was generated using Repeatmodeler (A.F.A. Smit, R. Hubley & P. Green http://repeatmasker.org). The Ribosomal RNA genes were identified using RNAmmer (Lagesen et. al., 2007 Nucleic Acids Res.) and transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy, Nucleic Acids Res. 1997). Non-coding RNAs, such as microRNAs, were identified by sequence homology search of the Rfam database (Griffiths-Jones et. al., 2003 Nucleic Acids Res.). Repeats and predicted RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding genes were predicted using a combination of ab initio programs Snap (Korf, 2004 BCM Bioinformatics), Fgenesh (Salamov A., Solovyev V. 2000, Genome Res.) and Augustus (M. Stanke, et. al., 2008 Bioinformatics) and the annotation pipeline tool Maker (M. Yandell et. al., 2007 Genomic Research) which aligns mRNA, EST and protein information from same species or cross-species to aid in gene structure determination and modifications. A consensus gene set from the above prediction algorithms was generated, using a logical, hierarchical approach developed at the Genome institute. Gene product naming was determined by BER (http://ber.sourceforge.net). Our goal is to explore this WGS draft sequence of N. americanus to better define proteins involved in nematode parasitism that impact health and disease and are relevant to both host-parasite relationships and basic biological processes. For information regarding this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. For specific questions regarding the N. americanus genome project contact Makedonka Mitreva (mmitreva@genome.wustl.edu) at Washington University School of Medicine. The National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) provided funds for this project. ##Genome-Assembly-Data-START## Finishing Goal :: High-Quality Draft Current Finishing Status :: High-Quality Draft Assembly Method :: Newbler v. MapAsmResearch-04/19/2010-patch- 08/17/2010 Assembly Name :: N_ americanus_v1 Genome Coverage :: 26.15x Sequencing Technology :: 454 ##Genome-Assembly-Data-END## FEATURES Qualifiers source /organism="Necator americanus" /mol_type="genomic DNA" /submitter_seqid="N_americanus-1.0_Cont9" /host="Homo sapiens" /db_xref="taxon:51031" /chromosome="Unknown" /lab_host="hamster" /country="China: Hunan Province" protein /locus_tag="NECAME_01102" /inference="protein motif:HMMPfam:IPR001609" /inference="protein motif:HMMPfam:IPR002928" /inference="protein motif:HMMPfam:IPR004009" /note="KEGG: cel:F52B10.1 nmy-1; Non-muscle MYosin family member (nmy-1); K10352 myosin heavy chain 0." /db_xref="InterPro:IPR001609" /db_xref="InterPro:IPR002928" /db_xref="InterPro:IPR004009" intron_pos 46:2 (1/46) intron_pos 82:0 (2/46) intron_pos 111:0 (3/46) intron_pos 163:1 (4/46) intron_pos 172:2 (5/46) intron_pos 213:0 (6/46) intron_pos 244:0 (7/46) intron_pos 298:1 (8/46) intron_pos 346:1 (9/46) intron_pos 378:1 (10/46) intron_pos 404:0 (11/46) intron_pos 463:2 (12/46) intron_pos 504:1 (13/46) intron_pos 579:1 (14/46) intron_pos 619:1 (15/46) intron_pos 680:0 (16/46) intron_pos 741:2 (17/46) intron_pos 775:2 (18/46) intron_pos 803:0 (19/46) intron_pos 834:0 (20/46) intron_pos 878:0 (21/46) intron_pos 910:0 (22/46) intron_pos 943:0 (23/46) intron_pos 985:0 (24/46) intron_pos 1025:0 (25/46) intron_pos 1091:2 (26/46) intron_pos 1135:2 (27/46) intron_pos 1174:0 (28/46) intron_pos 1227:0 (29/46) intron_pos 1278:0 (30/46) intron_pos 1319:0 (31/46) intron_pos 1358:0 (32/46) intron_pos 1388:0 (33/46) intron_pos 1420:0 (34/46) intron_pos 1458:0 (35/46) intron_pos 1508:0 (36/46) intron_pos 1553:2 (37/46) intron_pos 1577:0 (38/46) intron_pos 1632:0 (39/46) intron_pos 1674:2 (40/46) intron_pos 1711:0 (41/46) intron_pos 1754:0 (42/46) intron_pos 1789:0 (43/46) intron_pos 1883:0 (44/46) intron_pos 1936:2 (45/46) intron_pos 1976:1 (46/46) BEGIN 1 MEESDLRFLQ VQRAAVADPA RASEWAGKKL CWVPHEKDGF VAGSIKQETN DEVIVEICDT 61 GKTVTISKDD VQKANPPKFD KVEDMSELTY LNEASVLHNL KERYFSSLIY TYSGLFCVVI 121 NPYKRLPIYS ESLIEEFKGK KRHEMPPHIF AIADSAYRSM LQDREDQSIL CTGESGAGKT 181 ENTKKVIQYL AHVAGATRSK GGPQAPAASP AKGELEHQLL QANPILEAFG NSKTVKNDNS 241 SRFGKFIRIN FDMSGYISGA NIEFYLLEKS RTLRQAPDER SFHIFYQFLR GTSAAEKANY 301 LLEDIDKYRF LVNGNITLPN VDDAQEFQST LKSMRIMGFA EDEITSVLRV VSATVLMGNF 361 EFTQEKKSDQ AILPDDRVIQ KVCHLLGLPV IELTKAFLRP RIKVGREFVN KAQNKEQAEF 421 AVEAIAKASY ERMFKWLVNR INKSLDRTRR QGASFIGILD IAGFEIFELN SFEQLCINFT 481 NEKLQQLFNN TMFIMEQEEY QREGIEWQFI DFGLDLQPTI DLIEKPMGLL ALLDEQCLFP 541 KATDKTLVEK LQKTHSKHPK FIVPDMRAKS DFAVVHYAGR VDYSADQWLM KNMDPLNENV 601 VALMQASTDP FVCGIWKDAE FAGICAAEMN ETAFGVRAKK GMFRTVSQLH KEQLTRLMTT 661 LRNTSPHFVR CIIPNHEKKA GKINSMLVLE QLRCNGVLEG IRICRQGFPN RVPFQEFRHR 721 YEILTPNVIP RGFMDGKEAV KKMIEYLEVD SNLYRIGQSK VFFRTGVLAH LEEERDLKLT 781 DLIIQFQAQC RAFLARRLYV KRMQQSSAIR VLQRNGLAWM KLRNWQWWRL FTKVKPLLQV 841 TNQEAAISAK EDELRAIREK LDKVETEFKE SLTKIDQVMA ERNVLQDQLQ QETDNNAELE 901 EVKNRLQLKK NELEEMVNEM RDRLVDEEQR TEKMSQEKKK LVETVRDLEE QLEQEEQARQ 961 KLQLDKANVD QRVKNVEIKL VDITDAHDKL LKEKRILEDK MNQLNMQLSE EEERVKQVAR 1021 QRGKVEGHVQ ELEQELLRER QIKSELEQQK RKLITELEDS RELLEEKRGK LEELNGQLMK 1081 REEELSQVLT RSDEEAATIA LLQKQIRDMQ ATIDELREDI ETERAARNKA EMARREVVAQ 1141 LEKVKGDMLD KVDETSVLQD IMRRKEDEVR DLKKALESTT HALENKLEEQ KAKYNRQIEE 1201 LHEKIEQQKK VNSQQEKYKH QAENERAELT QELANIQAQK AEADKRRKQQ EVQFLDMQSQ 1261 LAECDEHRLQ ALEQLDKARE ELEHISRTRE DEEQLVSNLN RKVAALEVQL HEVSDQVQEE 1321 TRAKLAQINR VRQLEEEKAT IAEERDEIDA ARQHMERDIN VLRQQLTEAR KKADEGVIQQ 1381 MEELRKKAQR DLENTQHQLE ESEASKERLI QSKKKLQQEL EDANIELENI RTASREMEKR 1441 QKKFDMQLAE ERANVQKAIL ERDAHAQESR DRETRILSLV NELEQLKGTI DETERVRRML 1501 QLELDESISS KDDVGKNVHE LEKAKRQLEQ TVQEQKATIE ELEDQLGFAE DARLRLEVNI 1561 QALRAEQDRN LNAKDQEAED KRRSLVKQLR DLEQELENER RSKAGAISQK KKMEAHIAEI 1621 EQQLDVANRL KDEYNKQLKK NQQMIKEYQH DSEEARQMKE EIASQLRDIE RRLRSAEAEN 1681 QRLSEANEML TSQKRQLEQE KDELEELRGR GGSFSSEEKR RLEQKLAQLE EELEEEQNNA 1741 EIAIDKQRKA QQQLEQLTTE LSMERSVAQK SEAERQGLER QNRELKAKIA ELESTAQSRA 1801 RAQIAALEAK IQYLEEQNSV ESQERHNATR QYRFVLSSCS YLALQSVMST SIVLLRRIEK 1861 RLHDTILQLE DERRNVEQQK EIAEKCNLRA KQMRRQLDEQ EEEMTRERAK SRNLQREIDD 1921 LTEANDTLTR ENNSLRGGAA RRNRENMRLR SAYQIPGSSD NLTRNDDEDG SIGTEVTGSD 1981 HTDELKKTSV //