LOCUS KHJ47266.1 2365 aa PRT CON 17-DEC-2014 DEFINITION Trichuris suis hypothetical protein protein. ACCESSION KN538379-22 PROTEIN_ID KHJ47266.1 SOURCE Trichuris suis (pig whipworm) ORGANISM Trichuris suis Eukaryota; Metazoa; Ecdysozoa; Nematoda; Enoplea; Dorylaimia; Trichinellida; Trichuridae; Trichuris. REFERENCE 1 (bases 1 to 2343098) AUTHORS Mitreva,M. TITLE Draft genome of Trichuris suis JOURNAL Unpublished REFERENCE 2 (bases 1 to 2343098) AUTHORS Mitreva,M., Pepin,K.H., Abubucker,S., Martin,J., Minx,P., Warren,C., Palsikar,V.B., Zhang,X., Rosa,B.A. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (27-JAN-2014) The Genome Institute, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT Trichuris suis contaminated a dirt lot located at the USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Animal Parasitic Disease Laboratory in Beltsville, MD since the early 1960s. Adult worms were isolated for passage from pigs placed on the lot and naturally infected. The T. suis adults were manually removed from the cecum and proximal colon tissue and cultured in vitro to release fertilized eggs that were removed after 24-48 hours and embryonated to an infective stage (Hill et al., Experimental Parasitology 77, 170-178, 1993). The strain has been actively passed in pigs one to two times per year since that time and characterized for pathogenesis in pigs (Mansfield and Urban, Veterinary Immunology and Immunopathology 50, 1-17, 1996). This strain of T. suis has also been used therapeutically in human subjects with inflammatory bowel disease (Trichuris suis therapy in Crohn's disease. Summers RW, Elliott DE, Urban JF Jr, Thompson R, Weinstock JV. Gut. 2005 Jan;54(1):87-90. The Genome Institute collaborators that provided material for the genome/transcriptome sequencing are: USDA - Urban, Jr., J.F., Hill, D. E. and Michigan State University - Mansfield L.S. The repeat library was generated using Repeatmodeler (A. Smit, R. Hubley http://www.systemsbiology.org/). The Ribosomal RNA genes were identified using RNAmmer ((http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer ) and transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy, 1997). Non-coding RNAs, such as microRNAs, were identified by sequence homology search of the Rfam database (http://selab.janelia.org/software.html). Repeats and predicted RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding genes were predicted using a combination of ab initio programs Snap (I. Korf, 2004), Fgenesh (Softberry, Corp) and Augustus (M. Stanke, et. Al 2008) and the annotation pipeline tool Maker (M. Yandell et. al., 2007) which aligns mRNA, EST and protein information from same species or cross-species to aid in gene structure determination and modifications. A consensus gene set from the above prediction algorithms was generated, using a logical, hierarchical approach developed at the Genome institute. Gene product naming was determined by BER (JCVI: http://ber.sourceforge.net). Our goal is to explore this WGS draft sequence of Trichuris suis to better define proteins involved in nematode parasitism that impact health and disease and are relevant to both host-parasite relationships and basic biological processes. For information regarding this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. For specific questions regarding the Trichuris suis genome project contact Makedonka Mitreva (mmitreva@genome.wustl.edu) at Washington University School of Medicine. The National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) provided funds for this project. ##Genome-Assembly-Data-START## Finishing Goal :: High-Quality Draft Current Finishing Status :: High-Quality Draft Assembly Method :: ALLPATHS_LG v. 2012-11-02 Assembly Name :: T_suis_1.0.allpaths Genome Coverage :: 392x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## FEATURES Qualifiers source /organism="Trichuris suis" /mol_type="genomic DNA" /submitter_seqid="T_suis-1.0_Cont4" /isolation_source="cecum and proximal colon of infected animals which were naturally infected" /host="Sus scrofa (pig)" /db_xref="taxon:68888" /chromosome="Unknown" /dev_stage="adult" /country="USA: Beltsville, MD" protein /locus_tag="D918_02126" /inference="protein motif:HMMPfam:IPR001747" /inference="protein motif:HMMPfam:IPR015255" /db_xref="InterPro:IPR001747" /db_xref="InterPro:IPR015255" intron_pos 19:1 (1/28) intron_pos 35:1 (2/28) intron_pos 92:0 (3/28) intron_pos 164:0 (4/28) intron_pos 213:0 (5/28) intron_pos 278:1 (6/28) intron_pos 423:1 (7/28) intron_pos 473:0 (8/28) intron_pos 524:0 (9/28) intron_pos 572:1 (10/28) intron_pos 715:0 (11/28) intron_pos 771:0 (12/28) intron_pos 894:2 (13/28) intron_pos 982:0 (14/28) intron_pos 1053:0 (15/28) intron_pos 1102:1 (16/28) intron_pos 1208:0 (17/28) intron_pos 1333:0 (18/28) intron_pos 1466:0 (19/28) intron_pos 1529:2 (20/28) intron_pos 1762:2 (21/28) intron_pos 1941:0 (22/28) intron_pos 1985:2 (23/28) intron_pos 2095:2 (24/28) intron_pos 2196:1 (25/28) intron_pos 2259:0 (26/28) intron_pos 2310:2 (27/28) intron_pos 2359:0 (28/28) BEGIN 1 MKSHLASFVL FLFLCLHQEV YCKTHGDACF AECSGKNDKF HFDRGYRYAY DWHLKTKVAY 61 NGTDRSPEVL TYNLNGKVHL YAMAPCEFTM KIISAEQTGG GTAQLDLEVL EKYTTQFSFD 121 NGAIRSICPD PVEPIWSVNI KRGILSAFQN LYEETALESE VEEADVSGLC PTSYRGKASG 181 SDLLVTKTKY LNSCRNRGHG SFGRMVPYKS RSNYQNVPLF SGQQRCTQEI RNKILHRASC 241 EESNTFAPAY TDARGGLFVY AEMELSYRDS AAETVPAEQR NHQPLYFDYS DQEREMSSDA 301 VSKMEALLPK FCAASSEVVR DTPDLFSELV HCLTRVTRTD IEAVWNRVEK QHECDAQLNY 361 IADGLSACGS PSCVSFIIDH LSQRQQTEDQ KKRWLDSLIF IDNPTLDVIS SLTPLIKKSE 421 KEELFRLSNV IHKYCAHDSH CASHREIIQA LDALSGHLTH GCNPRSPEDF DKMLFALRAA 481 GNIDVPSTGL QNQLKCLSDK TAADELRLAA VQSVRRLPCH PSANTHLFAL LNDQSESVEI 541 RISAFMVLMK CPTDSMVDRI LALLKTETSM QVASFIWSYL NSIIKGNSPS QEHLRHLLMF 601 KTLEPKVDRD ARKFSKFSQF STYSTHGNMG LDVEKSLIFS PDEYIPRQAL LNLTMHIYGQ 661 SVNFLEIGAR TVGLDSLVEH LFGPDGYFTN PEGHYFKEPI SAHSNPSIRT LTDKFQRHMK 721 TKENFDLAVY ARIFGDDVFH YSGREMDAKH FVRESATIEN LMRQLAKERH FEKKRNVVFA 781 DWKYTMPTVG GLPLQFDLNG VASVNVALTG KLDVKDLLKG KTDVDIKTNV RGSVAARTAA 841 VVSILPGKSS FGMSIKTRFF ANRAVELAAL LKGGKSLHLK FLLPKEEMDL LRIESETFVV 901 RGDTWAPLYP PDASGHLTHA CSNKELNEWL GFKVCYSQRL PEKLISPQAL LTGPTLFRLN 961 LKNNVHGQQA YEMTVTWKMT PEEKRVHVKA EHSDYTADFV YYPKGELLME VKSDKRPMNL 1021 RINAEYTHEK KLANFTFSGD KGQIFSGHAY LKRKVASPLE SSVEFDLLIS AHHERKLLLH 1081 YDYNHKYPQA GRQTRSLGKR YGSRRRRDNS DEEAYLLNLN FESPWRNLHL NGKLNASPHV 1141 SLMALTADYQ RGTKDDTAKL KAEYRRSVQG EHAFIVLASS LEFTKHPERR TEFSLKASNM 1201 SNSLSVEAHL TSGERTMSFV GNAKKDPANG IYSAEGKIGG NVLSEHLLSG TYQNKLPNLF 1261 KFGARVQGPN IGEAAYDVKY TKKNERLTRV EIESKAQYGK KTLDFVSTVS EIEPKIYDVA 1321 AELKSPERNY IMIKGKAMAR IHSTKKFNCS LNSILRILDQ PDLTLVKSIQ RDGFNIYIDS 1381 ILSDAASELY HGTFKTASKD AGVYDGHLSF KSKVGKPMDL VVDSHWTPTD NGHVSILKMR 1441 NMDHEIFKTE LIIPKSFKQK FNQIKLQIIS NWRNAHKHLI MDYSLTRSDV ITHSLDLEGK 1501 LMDHNHYLNV KLIKAKVNQL TVKLGKGGSV NLQIHSNYSV QQSPTKDGYN TVLTTALTSN 1561 LGGQSCSWTG NLTFEKHPKG HLAHAEVKSA ENKNARFAFL YGSRMDVGER FRKTDFLVHV 1621 KLHGLQFEVK DEFWRQMQKQ NFSNKLTVTW NKNSLVNSDQ RFTNDFHICL SPFAVSTSTE 1681 FGSGHSSMLH SSVLFGVRPE HFDASIKLDL ANKSVLHFNG MYRLRGRHEV FEGMLASQMG 1741 NSYNLGVHLS RDRTLYGTFS AELTFGDRNV IKINRRKETN SEGNHVERLS FEQHSKGELV 1801 RYLTTVVEIQ KQTQLQKFEI AFKVGGIDYS AHGEYSTKDK SGRIQIKKTK EGTALTDGKF 1861 TEQAPNEYAT VLTGGQLVGE VDIKINTHAD KQMLKIDVRH VDSPFTLEAE SHWQAQRASI 1921 RILVIQRDSQ GKETSKSGFA FSVQTPKTHS ITIDCEFLRS NGDILIHLSY VNERPALLNF 1981 KFLINPARPS REQWSGLSFV HRLESSEKRS LKISLIDPKI ADNVYLTTNW QLSGDCRLLK 2041 PQSCRISVDS EASYSSDKNR LIKATFNFNR ENAADGVWAV DAGIQNAFTN LDFHCYHTYK 2101 FRKCKNSDLW CERQVEWVIR RLDEQGQAKR YELSYNEHRG QSAQLKAKCN EGEWVLQVHK 2161 EEQGYKVNIL RDGAFKYTLA CTYVREEKLA KCQVENPYTI LLHGYAQLVT NEWFTGSVWH 2221 SKTNGERIND IDVSSKIEAN DILTTRVYIR PDIKSSVKSW LDGIDARSKL NEYLSGISRT 2281 SGLLKKLMII ADDQVVQLAS TREQWQKEFS QLVNDHADAV RELTRVRQAA RSWLKGASME 2341 LYEIFIGMGR RFVEATKHIY LLIAM //