LOCUS EPB73264.1 1961 aa PRT CON 06-JUN-2013 DEFINITION Ancylostoma ceylanicum myosin head protein. ACCESSION KE124997-2 PROTEIN_ID EPB73264.1 SOURCE Ancylostoma ceylanicum ORGANISM Ancylostoma ceylanicum Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Strongyloidea; Ancylostomatidae; Ancylostomatinae; Ancylostoma. REFERENCE 1 (bases 1 to 444250) AUTHORS Mitreva,M. TITLE Draft genome of the parasitic nematode Anyclostoma ceylanicum JOURNAL Unpublished REFERENCE 2 (bases 1 to 444250) AUTHORS Mitreva,M., Abubucker,S., Martin,J., Minx,P., Warren,C., Pepin,K.H., Palsikar,V.B., Zhang,X.W.E. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (14-MAY-2013) The Genome Institute, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT Ancylostoma ceylanicum is a parasite of humans and carnivores in Asia. The parasite was adapted to the Syrian golden hamster (Mesocricetus auratus) in 1972 by Ray and Bhopale. The strain (Indian) was distributed worldwide from the lab of Dr. Jerzy Behnke in the 1980's. The sequenced strain was obtained by Dr. John M. Hawdon (jhawdon@gwu.edu) from Dr. Ricardo Fujiwara at the Federal University of Minas Gerais, Brazil. The strain was maintained in Dr. Hawdon's lab in dogs and hamsters since 2007. Worm isolation and extraction of nucleic acids was performed by Dr. Verena Gelmedin and others in the Hawdon lab, or the Genome Institute production team. Voucher specimens are on deposit in the U.S. National Parasite Collection (accession number 102954). For the original isolation and adaptation to hamsters see Ray, D.K., Bhopale, K.K., 1972. Complete development of Ancylostoma ceylanicum (Looss, 1911) in golden hamsters, Mesocricetus auratus. Experientia 28, 359-361 This assembly consists of fragments, 3kb and 8kb insert whole genome shotgun libraries. The sequences were generating on the Roch/454 platform and assembled using Newbler. To improve scaffolding, inhouse tools CIGA (Cdna tool for Improving Genome Assembly) and Pygap (Gap closure tool) were used to map 454 cDNA reads using blat to the genomic assembly to link genomic contigs based on cDNA evidence. Only joins confirmed by additional independent data typing were accepted and close gaps followed by the Pyramid assembler and Illumina paired reads to closing gaps and extending contigs The repeat library was generated using Repeatmodeler (A.F.A. Smit, R. Hubley & P. Green http://repeatmasker.org). The Ribosomal RNA genes were identified using RNAmmer (Lagesen et. al., 2007 Nucleic Acids Res.) and transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy, Nucleic Acids Res. 1997). Non-coding RNAs, such as microRNAs, were identified by sequence homology search of the Rfam database (Griffiths-Jones et. al., 2003 Nucleic Acids Res.). Repeats and predicted RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding genes were predicted using a combination of ab initio programs Snap (Korf, 2004 BCM Bioinformatics), Fgenesh (Salamov A., Solovyev V. 2000, Genome Res.) and Augustus (M. Stanke, et. al., 2008 Bioinformatics) and the annotation pipeline tool Maker (M. Yandell et. al., 2007 Genomc Research) which aligns mRNA, EST and protein information from same species or cross-species to aid in gene structure determination and modifications. A consensus gene set from the above prediction algorithms was generated, using a logical, hierarchical approach developed at the Genome institute. Gene product naming was determined by BER (http://ber.sourceforge.net). Our goal is to explore this WGS draft sequence of A. ceylanicum to better define proteins involved in nematode parasitism that impact health and disease and are relevant to both host-parasite relationships and basic biological processes. For information regarding this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. For specific questions regarding the A. ceylanicum genome project contact Makedonka Mitreva (mmitreva@genome.wustl.edu) at Washington University School of Medicine. The National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) provided funds for this project. ##Genome-Assembly-Data-START## Current Finishing Status :: High-Quality Draft Assembly Method :: Newbler v. MapAsmResearch-04/19/2010-patch- 08/17/2010 Assembly Name :: A_ceylanicum1.3.ec.cg.pg Genome Coverage :: 26.10x Sequencing Technology :: 454 ##Genome-Assembly-Data-END## FEATURES Qualifiers source /organism="Ancylostoma ceylanicum" /mol_type="genomic DNA" /submitter_seqid="A_ceylanicum-1.0_Cont223" /specimen_voucher="USDA:USNPC:102954" /db_xref="taxon:53326" /chromosome="Unknown" protein /locus_tag="ANCCEY_07634" /inference="protein motif:HMMPfam:IPR001609" /inference="protein motif:HMMPfam:IPR002928" /inference="protein motif:HMMPfam:IPR004009" /note="KEGG: phu:Phum_PHUM098460 0. myosin-9, putative K10352" /db_xref="InterPro:IPR001609" /db_xref="InterPro:IPR002928" /db_xref="InterPro:IPR004009" intron_pos 44:1 (1/46) intron_pos 68:0 (2/46) intron_pos 115:0 (3/46) intron_pos 167:1 (4/46) intron_pos 214:0 (5/46) intron_pos 231:1 (6/46) intron_pos 266:1 (7/46) intron_pos 299:1 (8/46) intron_pos 334:0 (9/46) intron_pos 380:1 (10/46) intron_pos 420:0 (11/46) intron_pos 442:2 (12/46) intron_pos 471:0 (13/46) intron_pos 528:0 (14/46) intron_pos 575:0 (15/46) intron_pos 618:0 (16/46) intron_pos 666:0 (17/46) intron_pos 758:2 (18/46) intron_pos 802:0 (19/46) intron_pos 832:2 (20/46) intron_pos 849:1 (21/46) intron_pos 875:0 (22/46) intron_pos 916:0 (23/46) intron_pos 963:0 (24/46) intron_pos 990:0 (25/46) intron_pos 1022:0 (26/46) intron_pos 1070:2 (27/46) intron_pos 1102:2 (28/46) intron_pos 1152:0 (29/46) intron_pos 1200:0 (30/46) intron_pos 1298:0 (31/46) intron_pos 1344:2 (32/46) intron_pos 1410:2 (33/46) intron_pos 1451:1 (34/46) intron_pos 1487:0 (35/46) intron_pos 1524:2 (36/46) intron_pos 1561:0 (37/46) intron_pos 1584:0 (38/46) intron_pos 1627:0 (39/46) intron_pos 1671:2 (40/46) intron_pos 1695:0 (41/46) intron_pos 1742:0 (42/46) intron_pos 1798:0 (43/46) intron_pos 1834:2 (44/46) intron_pos 1889:0 (45/46) intron_pos 1927:0 (46/46) BEGIN 1 MSNSDFEQDP GFQYLGMSRE ARAASAARPF DSKKNVWVPD PEEGFIAAEI QSVQGDQVTV 61 VTAKGNTVTV KKDEAQEMNP PKFDKTEDMA NLTFLNEASV LANLKDRYKD MMIYTYSGLF 121 CVVINPYKRL PIYTESVIKF YMGKRRNEMP PHLFATSDEA YRNMVQDREN QSMLITGESG 181 AGKTENTKKV ISYFAIVGAT QAAKGAKGEG TKGGTLEEQI VQTNPVLEAF GNAKTVRNNN 241 SSRFGKFIRT HFSAQGKLAG GDIEHYLLEK SRVVRQAAGE RSYHIFYQIM SGHDPKLRDQ 301 LKLNNDIKYY HFCSQAELTI DGVNDKEEMG LTQEAFDIMG FEDEEVMDLY KSCAAILHMG 361 EMKFKQRPRE EQAEPDGDED AQNVAHNLGV NHEEFLKALT KPRVRVGTEW VNKGQNLEQV 421 HWAVAGLGKA IYARMFKWLI GRCNKTLDAK QIERRYFIGV LDIAGFEIFD FNSFEQLWIN 481 FVNERLQQFF NHHMFVLEQE EYKREGIQWT FIDFGLDLQA CIELIEKPLG LISMLDEECI 541 VPKATDMTYV QKLNDQHLGK HPNFQKPKPP KGKQSEAHFA VVHYAGTVRY NATNFLEKNK 601 VCPHLILTPD SIDSIQTDPL NDTAVALLKT HSHGCKLMLE IWADYQTQEE AAEAAKSGAG 661 GGKKKGKSAS FMTVSMIYRE SLNNLMNMLY QTHPHFIRCI IPNEKKTSGM TMPMDSLSRT 721 RRHNPVLLLQ RSRNLNSFRF DRLGACTQPV DLQWCTGSYA VLAADQAKSS DDVKVASVAI 781 TDKLVTDGSL KDEEFKIGNT KVFFKAGILA RLEDMRDEIL RVIMTNFQSR VRWYLGQTDL 841 RRRMQQQAGL LIIQRNVRSW CTLRTWEWFK LYGKVKPLLK AGKEAEEMEK LSDKIKSLEE 901 AVAKGDESRK QLESQVAGLV EEKNQLFLNL EKEKANLQDA EERNQKLAAL KADLDKQLAE 961 VQYEQEIAEH KKHAQDLELS LKKAESEKQA RDHNIRSLQD EMANQDEAVA RLNKEKKHQE 1021 EVNRKLMEDL QAEEDRVNHM EKVRAKLEQQ LDDLEDAMDR EKRSRQDLEK AKRKVEGELK 1081 VAQENIDEIT KQKHDVEQNL KKKEAELHQL STRLEEEQSL VAKLQRQIKE LQARIAELEE 1141 ELENERQSRA KADRSRSELQ RELEEISERL EEQGGATAAQ LEANKKREAE LAKLRRDQEE 1201 ANLNHETALA SLRKKHHDAV AELTDQLEQL QKLKAKADKE KAQLQRELEE LSASVDSEVR 1261 SRQDIEKQLK VVEVQYAEAQ TKADEQSRQL NDFAALKNRL HNENGDLGRQ LEDMENQLNS 1321 LHRLKAQLTS QLEETKRSYD EEARERQALA AQVKNFEHEN DSLRDQLDTE SEAKAELLRQ 1381 ISKQNAEIQQ WKARFESEGL AKLDEIEEAK RKLQGKVQEL TDANEMAFAK IGSLEKTRHK 1441 LMQDLDDAQA RFDKIIDEWR KKHDDLAAEL DAAQRDNRNL STDLFRAKTA QDELTEHLES 1501 VRRENKQLAQ EVKDLADQLG EGGRSVHELQ KMVRRLEVEK EELQKALDEA EAALEAEEAK 1561 VLRAQVEVSQ IRSEIEKRIQ EKEEEFENTR KNHQRALESM QATLEAETKH KEEALRIKKK 1621 LEADINELEI ALDHANRANA DAQKTIKKYM ETVRELQLQV EDEQRQKDEI REQFLNSEKR 1681 NAILQTEKEE LSQVAEAAER ARRNAETDCI ELREHNNDLS AQLNGITAVK RKLEGELQAM 1741 HAELDETLAE LKNVDEMGKK AAADAARLAE ELRQEQEHSM HVERIRKGLE VQIKEMQIRL 1801 DEAEAAALKG GKKIIAQLES RIRSLEQELD GEQRRHQETD KNWRKSERRV KEVEFQLEED 1861 KKNQERLTEL IDKLQAKLKV FKRQVEEAEE VAATNLGKYR QLQAQLDDAE ERADVAENAL 1921 SKMRNKIRAS ASMVPSGSGG LAQSASSAVI RSTSFARSQD F //