LOCUS       EPB73264.1              1961 aa    PRT              CON 06-JUN-2013
DEFINITION  Ancylostoma ceylanicum myosin head protein.
ACCESSION   KE124997-2
PROTEIN_ID  EPB73264.1
SOURCE      Ancylostoma ceylanicum
  ORGANISM  Ancylostoma ceylanicum
            Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida;
            Rhabditina; Rhabditomorpha; Strongyloidea; Ancylostomatidae;
            Ancylostomatinae; Ancylostoma.
REFERENCE   1  (bases 1 to 444250)
  AUTHORS   Mitreva,M.
  TITLE     Draft genome of the parasitic nematode Anyclostoma ceylanicum
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 444250)
  AUTHORS   Mitreva,M., Abubucker,S., Martin,J., Minx,P., Warren,C.,
            Pepin,K.H., Palsikar,V.B., Zhang,X.W.E. and Wilson,R.K.
  TITLE     Direct Submission
  JOURNAL   Submitted (14-MAY-2013) The Genome Institute, Washington University
            School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA
COMMENT     Ancylostoma ceylanicum is a parasite of humans and carnivores in
            Asia. The parasite was adapted to the Syrian golden hamster
            (Mesocricetus auratus) in 1972 by Ray and Bhopale. The strain
            (Indian) was distributed worldwide from the lab of Dr. Jerzy Behnke
            in the 1980's. The sequenced strain was obtained by Dr. John M.
            Hawdon (jhawdon@gwu.edu) from Dr. Ricardo Fujiwara at the Federal
            University of Minas Gerais, Brazil. The strain was maintained in
            Dr. Hawdon's lab in dogs and hamsters since 2007. Worm isolation
            and extraction of nucleic acids was performed by Dr. Verena
            Gelmedin and others in the Hawdon lab, or the Genome Institute
            production team. Voucher specimens are on deposit in the U.S.
            National Parasite Collection (accession number 102954).  For the
            original isolation and adaptation to hamsters see Ray, D.K.,
            Bhopale, K.K., 1972. Complete development of Ancylostoma ceylanicum
            (Looss, 1911) in golden hamsters, Mesocricetus auratus. Experientia
            28, 359-361
            
            This assembly consists of fragments, 3kb and 8kb insert whole
            genome shotgun libraries. The sequences were generating on the
            Roch/454 platform and assembled using Newbler. To improve
            scaffolding, inhouse tools CIGA (Cdna tool for Improving Genome
            Assembly) and Pygap (Gap closure tool) were used to map 454 cDNA
            reads using blat to the genomic assembly to link genomic contigs
            based on cDNA evidence. Only joins confirmed by additional
            independent data typing were accepted and close gaps followed by
            the Pyramid assembler and Illumina paired reads to closing gaps and
            extending contigs
            
            The repeat library was generated using Repeatmodeler (A.F.A. Smit,
            R. Hubley & P. Green http://repeatmasker.org). The Ribosomal RNA
            genes were identified using RNAmmer (Lagesen et. al., 2007 Nucleic
            Acids Res.) and transfer RNA's were identified with tRNAscan-SE
            (Lowe and Eddy, Nucleic Acids Res. 1997). Non-coding RNAs, such as
            microRNAs, were identified by sequence homology search of the Rfam
            database (Griffiths-Jones et. al., 2003 Nucleic Acids Res.).
            Repeats and predicted RNA's were then masked using RepeatMasker (A.
            Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding
            genes were predicted using a combination of ab initio programs Snap
            (Korf, 2004 BCM Bioinformatics), Fgenesh (Salamov A., Solovyev V.
            2000, Genome Res.) and Augustus (M. Stanke, et. al., 2008
            Bioinformatics) and the annotation pipeline tool Maker (M. Yandell
            et. al., 2007 Genomc Research) which aligns mRNA, EST and protein
            information from same species or cross-species to aid in gene
            structure determination and modifications. A consensus gene set
            from the above prediction algorithms was generated, using a
            logical, hierarchical approach developed at the Genome institute.
            Gene product naming was determined by BER
            (http://ber.sourceforge.net).
            
            Our goal is to explore this WGS draft sequence of A. ceylanicum to
            better define proteins involved in nematode parasitism that impact
            health and disease and are relevant to both host-parasite
            relationships and basic biological processes.
            
            For information regarding this assembly or project, or any other
            GSC genome project, please visit our Genome Groups web page
            (http://genome.wustl.edu/genome_group_index.cgi) and email the
            designated contact person. For specific questions regarding the A.
            ceylanicum genome project contact Makedonka Mitreva
            (mmitreva@genome.wustl.edu) at Washington University School of
            Medicine. The National Human Genome Research Institute (NHGRI) of
            the National Institutes of Health (NIH) provided funds for this
            project.
            
            ##Genome-Assembly-Data-START##
            Current Finishing Status :: High-Quality Draft
            Assembly Method          :: Newbler v.
                                        MapAsmResearch-04/19/2010-patch-
                                        08/17/2010
            Assembly Name            :: A_ceylanicum1.3.ec.cg.pg
            Genome Coverage          :: 26.10x
            Sequencing Technology    :: 454
            ##Genome-Assembly-Data-END##
FEATURES             Qualifiers
     source          /organism="Ancylostoma ceylanicum"
                     /mol_type="genomic DNA"
                     /submitter_seqid="A_ceylanicum-1.0_Cont223"
                     /specimen_voucher="USDA:USNPC:102954"
                     /db_xref="taxon:53326"
                     /chromosome="Unknown"
     protein         /locus_tag="ANCCEY_07634"
                     /inference="protein motif:HMMPfam:IPR001609"
                     /inference="protein motif:HMMPfam:IPR002928"
                     /inference="protein motif:HMMPfam:IPR004009"
                     /note="KEGG: phu:Phum_PHUM098460 0. myosin-9, putative
                     K10352"
                     /db_xref="InterPro:IPR001609"
                     /db_xref="InterPro:IPR002928"
                     /db_xref="InterPro:IPR004009"
     intron_pos      44:1 (1/46)
     intron_pos      68:0 (2/46)
     intron_pos      115:0 (3/46)
     intron_pos      167:1 (4/46)
     intron_pos      214:0 (5/46)
     intron_pos      231:1 (6/46)
     intron_pos      266:1 (7/46)
     intron_pos      299:1 (8/46)
     intron_pos      334:0 (9/46)
     intron_pos      380:1 (10/46)
     intron_pos      420:0 (11/46)
     intron_pos      442:2 (12/46)
     intron_pos      471:0 (13/46)
     intron_pos      528:0 (14/46)
     intron_pos      575:0 (15/46)
     intron_pos      618:0 (16/46)
     intron_pos      666:0 (17/46)
     intron_pos      758:2 (18/46)
     intron_pos      802:0 (19/46)
     intron_pos      832:2 (20/46)
     intron_pos      849:1 (21/46)
     intron_pos      875:0 (22/46)
     intron_pos      916:0 (23/46)
     intron_pos      963:0 (24/46)
     intron_pos      990:0 (25/46)
     intron_pos      1022:0 (26/46)
     intron_pos      1070:2 (27/46)
     intron_pos      1102:2 (28/46)
     intron_pos      1152:0 (29/46)
     intron_pos      1200:0 (30/46)
     intron_pos      1298:0 (31/46)
     intron_pos      1344:2 (32/46)
     intron_pos      1410:2 (33/46)
     intron_pos      1451:1 (34/46)
     intron_pos      1487:0 (35/46)
     intron_pos      1524:2 (36/46)
     intron_pos      1561:0 (37/46)
     intron_pos      1584:0 (38/46)
     intron_pos      1627:0 (39/46)
     intron_pos      1671:2 (40/46)
     intron_pos      1695:0 (41/46)
     intron_pos      1742:0 (42/46)
     intron_pos      1798:0 (43/46)
     intron_pos      1834:2 (44/46)
     intron_pos      1889:0 (45/46)
     intron_pos      1927:0 (46/46)
BEGIN
        1 MSNSDFEQDP GFQYLGMSRE ARAASAARPF DSKKNVWVPD PEEGFIAAEI QSVQGDQVTV
       61 VTAKGNTVTV KKDEAQEMNP PKFDKTEDMA NLTFLNEASV LANLKDRYKD MMIYTYSGLF
      121 CVVINPYKRL PIYTESVIKF YMGKRRNEMP PHLFATSDEA YRNMVQDREN QSMLITGESG
      181 AGKTENTKKV ISYFAIVGAT QAAKGAKGEG TKGGTLEEQI VQTNPVLEAF GNAKTVRNNN
      241 SSRFGKFIRT HFSAQGKLAG GDIEHYLLEK SRVVRQAAGE RSYHIFYQIM SGHDPKLRDQ
      301 LKLNNDIKYY HFCSQAELTI DGVNDKEEMG LTQEAFDIMG FEDEEVMDLY KSCAAILHMG
      361 EMKFKQRPRE EQAEPDGDED AQNVAHNLGV NHEEFLKALT KPRVRVGTEW VNKGQNLEQV
      421 HWAVAGLGKA IYARMFKWLI GRCNKTLDAK QIERRYFIGV LDIAGFEIFD FNSFEQLWIN
      481 FVNERLQQFF NHHMFVLEQE EYKREGIQWT FIDFGLDLQA CIELIEKPLG LISMLDEECI
      541 VPKATDMTYV QKLNDQHLGK HPNFQKPKPP KGKQSEAHFA VVHYAGTVRY NATNFLEKNK
      601 VCPHLILTPD SIDSIQTDPL NDTAVALLKT HSHGCKLMLE IWADYQTQEE AAEAAKSGAG
      661 GGKKKGKSAS FMTVSMIYRE SLNNLMNMLY QTHPHFIRCI IPNEKKTSGM TMPMDSLSRT
      721 RRHNPVLLLQ RSRNLNSFRF DRLGACTQPV DLQWCTGSYA VLAADQAKSS DDVKVASVAI
      781 TDKLVTDGSL KDEEFKIGNT KVFFKAGILA RLEDMRDEIL RVIMTNFQSR VRWYLGQTDL
      841 RRRMQQQAGL LIIQRNVRSW CTLRTWEWFK LYGKVKPLLK AGKEAEEMEK LSDKIKSLEE
      901 AVAKGDESRK QLESQVAGLV EEKNQLFLNL EKEKANLQDA EERNQKLAAL KADLDKQLAE
      961 VQYEQEIAEH KKHAQDLELS LKKAESEKQA RDHNIRSLQD EMANQDEAVA RLNKEKKHQE
     1021 EVNRKLMEDL QAEEDRVNHM EKVRAKLEQQ LDDLEDAMDR EKRSRQDLEK AKRKVEGELK
     1081 VAQENIDEIT KQKHDVEQNL KKKEAELHQL STRLEEEQSL VAKLQRQIKE LQARIAELEE
     1141 ELENERQSRA KADRSRSELQ RELEEISERL EEQGGATAAQ LEANKKREAE LAKLRRDQEE
     1201 ANLNHETALA SLRKKHHDAV AELTDQLEQL QKLKAKADKE KAQLQRELEE LSASVDSEVR
     1261 SRQDIEKQLK VVEVQYAEAQ TKADEQSRQL NDFAALKNRL HNENGDLGRQ LEDMENQLNS
     1321 LHRLKAQLTS QLEETKRSYD EEARERQALA AQVKNFEHEN DSLRDQLDTE SEAKAELLRQ
     1381 ISKQNAEIQQ WKARFESEGL AKLDEIEEAK RKLQGKVQEL TDANEMAFAK IGSLEKTRHK
     1441 LMQDLDDAQA RFDKIIDEWR KKHDDLAAEL DAAQRDNRNL STDLFRAKTA QDELTEHLES
     1501 VRRENKQLAQ EVKDLADQLG EGGRSVHELQ KMVRRLEVEK EELQKALDEA EAALEAEEAK
     1561 VLRAQVEVSQ IRSEIEKRIQ EKEEEFENTR KNHQRALESM QATLEAETKH KEEALRIKKK
     1621 LEADINELEI ALDHANRANA DAQKTIKKYM ETVRELQLQV EDEQRQKDEI REQFLNSEKR
     1681 NAILQTEKEE LSQVAEAAER ARRNAETDCI ELREHNNDLS AQLNGITAVK RKLEGELQAM
     1741 HAELDETLAE LKNVDEMGKK AAADAARLAE ELRQEQEHSM HVERIRKGLE VQIKEMQIRL
     1801 DEAEAAALKG GKKIIAQLES RIRSLEQELD GEQRRHQETD KNWRKSERRV KEVEFQLEED
     1861 KKNQERLTEL IDKLQAKLKV FKRQVEEAEE VAATNLGKYR QLQAQLDDAE ERADVAENAL
     1921 SKMRNKIRAS ASMVPSGSGG LAQSASSAVI RSTSFARSQD F
//