LOCUS       EPB73275.1              1174 aa    PRT              CON 06-JUN-2013
DEFINITION  Ancylostoma ceylanicum hypothetical protein protein.
ACCESSION   KE124997-13
PROTEIN_ID  EPB73275.1
SOURCE      Ancylostoma ceylanicum
  ORGANISM  Ancylostoma ceylanicum
            Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida;
            Rhabditina; Rhabditomorpha; Strongyloidea; Ancylostomatidae;
            Ancylostomatinae; Ancylostoma.
REFERENCE   1  (bases 1 to 444250)
  AUTHORS   Mitreva,M.
  TITLE     Draft genome of the parasitic nematode Anyclostoma ceylanicum
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 444250)
  AUTHORS   Mitreva,M., Abubucker,S., Martin,J., Minx,P., Warren,C.,
            Pepin,K.H., Palsikar,V.B., Zhang,X.W.E. and Wilson,R.K.
  TITLE     Direct Submission
  JOURNAL   Submitted (14-MAY-2013) The Genome Institute, Washington University
            School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA
COMMENT     Ancylostoma ceylanicum is a parasite of humans and carnivores in
            Asia. The parasite was adapted to the Syrian golden hamster
            (Mesocricetus auratus) in 1972 by Ray and Bhopale. The strain
            (Indian) was distributed worldwide from the lab of Dr. Jerzy Behnke
            in the 1980's. The sequenced strain was obtained by Dr. John M.
            Hawdon (jhawdon@gwu.edu) from Dr. Ricardo Fujiwara at the Federal
            University of Minas Gerais, Brazil. The strain was maintained in
            Dr. Hawdon's lab in dogs and hamsters since 2007. Worm isolation
            and extraction of nucleic acids was performed by Dr. Verena
            Gelmedin and others in the Hawdon lab, or the Genome Institute
            production team. Voucher specimens are on deposit in the U.S.
            National Parasite Collection (accession number 102954).  For the
            original isolation and adaptation to hamsters see Ray, D.K.,
            Bhopale, K.K., 1972. Complete development of Ancylostoma ceylanicum
            (Looss, 1911) in golden hamsters, Mesocricetus auratus. Experientia
            28, 359-361
            
            This assembly consists of fragments, 3kb and 8kb insert whole
            genome shotgun libraries. The sequences were generating on the
            Roch/454 platform and assembled using Newbler. To improve
            scaffolding, inhouse tools CIGA (Cdna tool for Improving Genome
            Assembly) and Pygap (Gap closure tool) were used to map 454 cDNA
            reads using blat to the genomic assembly to link genomic contigs
            based on cDNA evidence. Only joins confirmed by additional
            independent data typing were accepted and close gaps followed by
            the Pyramid assembler and Illumina paired reads to closing gaps and
            extending contigs
            
            The repeat library was generated using Repeatmodeler (A.F.A. Smit,
            R. Hubley & P. Green http://repeatmasker.org). The Ribosomal RNA
            genes were identified using RNAmmer (Lagesen et. al., 2007 Nucleic
            Acids Res.) and transfer RNA's were identified with tRNAscan-SE
            (Lowe and Eddy, Nucleic Acids Res. 1997). Non-coding RNAs, such as
            microRNAs, were identified by sequence homology search of the Rfam
            database (Griffiths-Jones et. al., 2003 Nucleic Acids Res.).
            Repeats and predicted RNA's were then masked using RepeatMasker (A.
            Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding
            genes were predicted using a combination of ab initio programs Snap
            (Korf, 2004 BCM Bioinformatics), Fgenesh (Salamov A., Solovyev V.
            2000, Genome Res.) and Augustus (M. Stanke, et. al., 2008
            Bioinformatics) and the annotation pipeline tool Maker (M. Yandell
            et. al., 2007 Genomc Research) which aligns mRNA, EST and protein
            information from same species or cross-species to aid in gene
            structure determination and modifications. A consensus gene set
            from the above prediction algorithms was generated, using a
            logical, hierarchical approach developed at the Genome institute.
            Gene product naming was determined by BER
            (http://ber.sourceforge.net).
            
            Our goal is to explore this WGS draft sequence of A. ceylanicum to
            better define proteins involved in nematode parasitism that impact
            health and disease and are relevant to both host-parasite
            relationships and basic biological processes.
            
            For information regarding this assembly or project, or any other
            GSC genome project, please visit our Genome Groups web page
            (http://genome.wustl.edu/genome_group_index.cgi) and email the
            designated contact person. For specific questions regarding the A.
            ceylanicum genome project contact Makedonka Mitreva
            (mmitreva@genome.wustl.edu) at Washington University School of
            Medicine. The National Human Genome Research Institute (NHGRI) of
            the National Institutes of Health (NIH) provided funds for this
            project.
            
            ##Genome-Assembly-Data-START##
            Current Finishing Status :: High-Quality Draft
            Assembly Method          :: Newbler v.
                                        MapAsmResearch-04/19/2010-patch-
                                        08/17/2010
            Assembly Name            :: A_ceylanicum1.3.ec.cg.pg
            Genome Coverage          :: 26.10x
            Sequencing Technology    :: 454
            ##Genome-Assembly-Data-END##
FEATURES             Qualifiers
     source          /organism="Ancylostoma ceylanicum"
                     /mol_type="genomic DNA"
                     /submitter_seqid="A_ceylanicum-1.0_Cont223"
                     /specimen_voucher="USDA:USNPC:102954"
                     /db_xref="taxon:53326"
                     /chromosome="Unknown"
     protein         /locus_tag="ANCCEY_07645"
                     /inference="protein motif:HMMPfam:IPR019103"
                     /note="KEGG: cdu:CD36_50510 1.3e-22 ATP-dependent RNA
                     helicase, putative"
                     /db_xref="InterPro:IPR019103"
     intron_pos      203:2 (1/5)
     intron_pos      252:2 (2/5)
     intron_pos      328:1 (3/5)
     intron_pos      373:0 (4/5)
     intron_pos      512:0 (5/5)
BEGIN
        1 MVATYSPDGP TPKIVSFSGS GDDSSQFSLW LRRLEDIMRI RASPMTSQQK ANFLICYLEG
       61 VAREKVEELG EEDRSNYDTV VAHLKRFFEG PQHRYMARQS LSTCQQHPGE SSATFANRLL
      121 NLVRAATTGQ DPASQKERVL EEFVARLRPD IRYYVKLDNP ATFEQALAKA QMVEQLLAEA
      181 TAERLISPAG PSRTIEVKSA APQLLHLNED GTMVEIMAVV MVFPANDLFL AKQTVTIVEE
      241 SVIRLVSVLH LGNLLDRHDL AASHLGNDLV QVHRCMPIPP KNYRLLASNG TCYTKPKVEL
      301 SLSNGPPLHM FIDPTTYVLS NEAPPGDWLP RGPSSETTSN TLSLWPSWFS PWSLYDIWVF
      361 VCCVSITTGL LKRRHGADDA PLAIAVPPLP WASPPAREQD TVEVARVEID TTNVWPPRAS
      421 LSPINVLTIA NNEQFFVAQI PVKVNGIQVL ALVDTGAAIS ITSKATAPLL GVFALADTDI
      481 PCAVGMAGVP VKIIGRARLR FEIGSFTLHQ PIVIKTTTET PPTKFRPPRI PVKFQKELDE
      541 HINKLLRAGR IVESDTWTKR RPVVLWVVGT HSVKRALVRR VEPLAERGAL KVTLDAYGWS
      601 SNPLEEDIKK RGRLMEDGYL LDICVRLTKA PAAANPIYEN ISRMQVFENL ETDSTASAIL
      661 SHVYGAAPLG CPNRDEPQPM PDVEPRVDVL VNDNIVTFSA EQRKAVALGT SGFPIAAIQA
      721 AFGTGKTLVG AVIAAQLVDR DEIVLVTAST NAAVAQFAQT ILSLSAYRHL RVLRYVSDAA
      781 VLENMAPTNV DMNKILISLH DTYTNRLSPE AMELCNKFTI GRRILERYIE NPDLALYLTD
      841 EEKEEYAIAE RNVSRTLEKM IALMLTLRPP HILCITTASL MNTIGTPDGA FNAYRDKFTV
      901 LIGDEASQIP EPALTAISNR LPNLRQIYIG DIHQLEPHAK CPRDSHAAVY GARSVMSVLC
      961 SARAVPVASL VRTFRAHPAL NELPNRVAYD GTLVSGITAF ARPMLIRAME FPAPGIPFML
     1021 IDVDGQSTRA ENMSHFNPVE VETCVKLIEL LKARGIAPEY ICVITFYREQ FRRVEQATLD
     1081 QGIEISTVDS IQGREKEIVI LLTTKTHFTP ESADFLDEYR RMNVALSRCR QGQFILGHVP
     1141 SLATVTFWRR VIDWATSLQA IVTPETLERY FHDV
//