LOCUS KIH65423.1 301 aa PRT CON 16-JAN-2015 DEFINITION Ancylostoma duodenale collagen triple helix repeat protein protein. ACCESSION KN727564-2 PROTEIN_ID KIH65423.1 SOURCE Ancylostoma duodenale ORGANISM Ancylostoma duodenale Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Strongyloidea; Ancylostomatidae; Ancylostomatinae; Ancylostoma. REFERENCE 1 (bases 1 to 27074) AUTHORS Mitreva,M. TITLE Draft genome of the parsitic nematode Ancylostoma duodenale JOURNAL Unpublished REFERENCE 2 (bases 1 to 27074) AUTHORS Mitreva,M., Abubucker,S., Martin,J., Minx,P., Warren,C., Pepin,K.H., Palsikar,V.B., Zhang,X.W. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (16-DEC-2013) The Genome Institute, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT Ancylostoma duodenale is one of two major species of hookworm that infect humans worldwide. An Indian A. dudoenale isolate was first adapted to dogs by Dr. G.A. Schad in 1979 (Schad 1979) using corticosteroid treatment. This strain has been lost. The sequenced strain of A. duodenale (Zhejiang) was isolated from humans in Zhejiang Province, China by Dr. Wen Li-yong and passaged once through a dog (unpublished). Genomic DNA was isolated by Dr. John M. Hawdon and confirmed as A. duodenale by PCR. No voucher has been deposited. This strain has also been lost, and no laboratories are known to be currently maintaining A. duodenale in animals. For the original isolation and adaptation to dogs see Schad, G.A. 1979. Ancylostoma duodenale: maintenance through six generations in helminth-native pups. Experimental Parasitology 47(2), 246-253. This assembly consists of fragments, 3kb and 8kb insert whole genome shotgun libraries. The sequences were generated on the Roch/454 platform and assembled using Newbler. To improve scaffolding, inhouse tools CIGA (Cdna tool for Improving Genome Assembly) and Pygap (Gap closure tool) were used. CIGA mapped 454 cDNA reads using blat to the genomic assembly linking genomic contigs based on cDNA evidence with only joins confirmed by additional independent data typing accepted. PyGap closed gaps and extended contigs using the Pyramid assembler with Illumina paired end reads. The repeat library was generated using Repeatmodeler (A.F.A. Smit, R. Hubley & P. Green http://repeatmasker.org). The Ribosomal RNA genes were identified using RNAmmer (Lagesen et. al., 2007 Nucleic Acids Res.) and transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy, Nucleic Acids Res. 1997). Non-coding RNAs, such as microRNAs, were identified by sequence homology search of the Rfam database (Griffiths-Jones et. al., 2003 Nucleic Acids Res.). Repeats and predicted RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding genes were predicted using a combination of ab initio programs Snap (Korf, 2004 BCM Bioinformatics), Fgenesh (Salamov A., Solovyev V. 2000, Genome Res.) and Augustus (M. Stanke, et. al., 2008 Bioinformatics) and the annotation pipeline tool Maker (M. Yandell et. al., 2007 Genomc Research) which aligns mRNA, EST and protein information from same species or cross-species to aid in gene structure determination and modifications. A consensus gene set from the above prediction algorithms was generated, using a logical, hierarchical approach developed at the Genome institute. Gene product naming was determined by BER (http://ber.sourceforge.net). Our goal is to explore this WGS draft sequence of A. duodenale to better define proteins involved in nematode parasitism that impact health and disease and are relevant to both host-parasite relationships and basic biological processes. For information regarding this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. For specific questions regarding the A. duodenale genome project contact Makedonka Mitreva (mmitreva@genome.wustl.edu) at Washington University School of Medicine. The National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) provided funds for this project. ##Genome-Assembly-Data-START## Current Finishing Status :: High-Quality Draft Assembly Method :: Newbler Version MapAsmResearch v. 04/19/2010-patch-08/17/2010 Assembly Name :: A_duodenale_2.2.ec.cg.pg Genome Coverage :: 19.00x Sequencing Technology :: 454 ##Genome-Assembly-Data-END## FEATURES Qualifiers source /organism="Ancylostoma duodenale" /mol_type="genomic DNA" /submitter_seqid="A_duodenale-1.0_Cont1433" /strain="Zhejiang" /host="Homo sapiens" /db_xref="taxon:51022" /chromosome="Unknown" /lab_host="dog" /country="China: Zhejiang Province" protein /locus_tag="ANCDUO_04255" /inference="protein motif:HMMPfam:IPR008160" /note="KEGG: isc:IscW_ISCW022619 6.0e-36 glycine rich secreted cement protein, putative K06237" /db_xref="InterPro:IPR008160" intron_pos 51:0 (1/6) intron_pos 84:1 (2/6) intron_pos 114:1 (3/6) intron_pos 157:1 (4/6) intron_pos 190:1 (5/6) intron_pos 263:0 (6/6) BEGIN 1 MVENSKDRIA WVLSFGCLAF IVGAATLVAR LHSELASFTE QAEVELATFN QSMEIKAAHE 61 DLWKEALKLA DKHGRNLVKR NVVYPQQING YKTVRCECPS GPPGPPGPNG YDGIPGFPGK 121 DGQNGEDDHT LRLHYSDACA LCPAGPPGPP GDPGPEGETG RKGFPGLPGV DGKPGTPGPK 181 GPTGDRGAPG QPGLPGPPGI PGQKGTTNAF IPGPPGPPGP IGLPGMVGEP GLPGIEGPPG 241 RPGPPGWPGQ NGGQGPDGIY GPPGEQGLPG ESGGYCPCAG RATRNSSDSV TEISQKIHST 301 A //