LOCUS KHJ47492.1 4892 aa PRT CON 17-DEC-2014 DEFINITION Trichuris suis hypothetical protein protein. ACCESSION KN538379-248 PROTEIN_ID KHJ47492.1 SOURCE Trichuris suis (pig whipworm) ORGANISM Trichuris suis Eukaryota; Metazoa; Ecdysozoa; Nematoda; Enoplea; Dorylaimia; Trichinellida; Trichuridae; Trichuris. REFERENCE 1 (bases 1 to 2343098) AUTHORS Mitreva,M. TITLE Draft genome of Trichuris suis JOURNAL Unpublished REFERENCE 2 (bases 1 to 2343098) AUTHORS Mitreva,M., Pepin,K.H., Abubucker,S., Martin,J., Minx,P., Warren,C., Palsikar,V.B., Zhang,X., Rosa,B.A. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (27-JAN-2014) The Genome Institute, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT Trichuris suis contaminated a dirt lot located at the USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Animal Parasitic Disease Laboratory in Beltsville, MD since the early 1960s. Adult worms were isolated for passage from pigs placed on the lot and naturally infected. The T. suis adults were manually removed from the cecum and proximal colon tissue and cultured in vitro to release fertilized eggs that were removed after 24-48 hours and embryonated to an infective stage (Hill et al., Experimental Parasitology 77, 170-178, 1993). The strain has been actively passed in pigs one to two times per year since that time and characterized for pathogenesis in pigs (Mansfield and Urban, Veterinary Immunology and Immunopathology 50, 1-17, 1996). This strain of T. suis has also been used therapeutically in human subjects with inflammatory bowel disease (Trichuris suis therapy in Crohn's disease. Summers RW, Elliott DE, Urban JF Jr, Thompson R, Weinstock JV. Gut. 2005 Jan;54(1):87-90. The Genome Institute collaborators that provided material for the genome/transcriptome sequencing are: USDA - Urban, Jr., J.F., Hill, D. E. and Michigan State University - Mansfield L.S. The repeat library was generated using Repeatmodeler (A. Smit, R. Hubley http://www.systemsbiology.org/). The Ribosomal RNA genes were identified using RNAmmer ((http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer ) and transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy, 1997). Non-coding RNAs, such as microRNAs, were identified by sequence homology search of the Rfam database (http://selab.janelia.org/software.html). Repeats and predicted RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding genes were predicted using a combination of ab initio programs Snap (I. Korf, 2004), Fgenesh (Softberry, Corp) and Augustus (M. Stanke, et. Al 2008) and the annotation pipeline tool Maker (M. Yandell et. al., 2007) which aligns mRNA, EST and protein information from same species or cross-species to aid in gene structure determination and modifications. A consensus gene set from the above prediction algorithms was generated, using a logical, hierarchical approach developed at the Genome institute. Gene product naming was determined by BER (JCVI: http://ber.sourceforge.net). Our goal is to explore this WGS draft sequence of Trichuris suis to better define proteins involved in nematode parasitism that impact health and disease and are relevant to both host-parasite relationships and basic biological processes. For information regarding this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. For specific questions regarding the Trichuris suis genome project contact Makedonka Mitreva (mmitreva@genome.wustl.edu) at Washington University School of Medicine. The National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) provided funds for this project. ##Genome-Assembly-Data-START## Finishing Goal :: High-Quality Draft Current Finishing Status :: High-Quality Draft Assembly Method :: ALLPATHS_LG v. 2012-11-02 Assembly Name :: T_suis_1.0.allpaths Genome Coverage :: 392x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## FEATURES Qualifiers source /organism="Trichuris suis" /mol_type="genomic DNA" /submitter_seqid="T_suis-1.0_Cont4" /isolation_source="cecum and proximal colon of infected animals which were naturally infected" /host="Sus scrofa (pig)" /db_xref="taxon:68888" /chromosome="Unknown" /dev_stage="adult" /country="USA: Beltsville, MD" protein /locus_tag="D918_02352" /inference="protein motif:HMMPfam:IPR013098" /note="KEGG: ecb:100053844 0. hypothetical LOC100053844; K12567 titin" /db_xref="InterPro:IPR013098" intron_pos 55:0 (1/56) intron_pos 105:0 (2/56) intron_pos 161:1 (3/56) intron_pos 199:1 (4/56) intron_pos 234:1 (5/56) intron_pos 278:2 (6/56) intron_pos 338:1 (7/56) intron_pos 522:2 (8/56) intron_pos 661:0 (9/56) intron_pos 1407:2 (10/56) intron_pos 1449:0 (11/56) intron_pos 1494:2 (12/56) intron_pos 1554:1 (13/56) intron_pos 1660:1 (14/56) intron_pos 1766:1 (15/56) intron_pos 1871:1 (16/56) intron_pos 1977:1 (17/56) intron_pos 2082:1 (18/56) intron_pos 2188:1 (19/56) intron_pos 2288:1 (20/56) intron_pos 2394:1 (21/56) intron_pos 2499:1 (22/56) intron_pos 2605:1 (23/56) intron_pos 2725:1 (24/56) intron_pos 2831:1 (25/56) intron_pos 2868:0 (26/56) intron_pos 2936:1 (27/56) intron_pos 2973:0 (28/56) intron_pos 3042:1 (29/56) intron_pos 3076:0 (30/56) intron_pos 3203:2 (31/56) intron_pos 3264:1 (32/56) intron_pos 3301:2 (33/56) intron_pos 3360:1 (34/56) intron_pos 3396:2 (35/56) intron_pos 3455:1 (36/56) intron_pos 3595:2 (37/56) intron_pos 3653:1 (38/56) intron_pos 3689:2 (39/56) intron_pos 3748:1 (40/56) intron_pos 3888:2 (41/56) intron_pos 3946:1 (42/56) intron_pos 3982:2 (43/56) intron_pos 4085:2 (44/56) intron_pos 4146:1 (45/56) intron_pos 4183:2 (46/56) intron_pos 4286:2 (47/56) intron_pos 4346:1 (48/56) intron_pos 4383:2 (49/56) intron_pos 4545:1 (50/56) intron_pos 4639:1 (51/56) intron_pos 4681:2 (52/56) intron_pos 4740:1 (53/56) intron_pos 4787:2 (54/56) intron_pos 4846:1 (55/56) intron_pos 4880:0 (56/56) BEGIN 1 MIRYSITANV PTDSLQRALE LMLWIPKRAS DLNLIHNILD LPTETAKLGR LLRHEQFEVW 61 QDSLDRPKGQ LHVFLFKTKI VATEKVEPED PDEVPEFKHV FTVRLDKYDI REYLGNSNIV 121 QLVPVDASLP TYFFKATAPD NAEIVKQAWI KDVQENKETT GELPESEVEV QGDFIDFSDI 181 KSEFSEYSSV SRKSSEYGDG RDDESPPAKK PKTPPAISRS TSAQSVYSMN MESLTQTGSI 241 EMEGSSVTRT QYGFRTVHET TAKMSLKVTG NPMPVITWYK DGVLLQEDER KKFYSDDDGY 301 FALTIEPVQV EDTGRYTCVA TNEYGQARTS AFFRVVRVDR EPEQPKFLKV MRDLELHEGD 361 TATFTCEVEG WPEPEIQFYL DGQPIHISRE HNIEYDGRTV RLTVREVQPE DGGSYVLKAV 421 NGSGEVQCAA TLTVIKDLEK NKMPPYFQQQ LNDVVVVEDQ SVKFKTVCSG DPTPEVVWYI 481 NGVQLTNSDK VHMIAEDGVY ILTIDNITQH FDGELTCCAF NRLGEISCCC RITVNKADYP 541 PSFEQELRDQ VVTAGEAVKL QVVISAQPEA TVSWWFNDEQ ISEYHPSVRL SAKPQAGIYS 601 IEITKATTDM CGVIKCQASN YLGTVFSSAN LSVEEAKSAP AFRNMLQDIV ALPREYLKIH 661 VSTTGYPRPT VAWKLNGEEI KESPNVKISS SGNDYYLEIL SFTENDAGEL ECTAVNSLGI 721 ASSKCQVSTS PAKMKSNFEK DLPKSTTAEE GKPISFSVQS KQSATFSWFL NGRELHDSDS 781 GIRIMAIGDQ ESRLIIESFS ESLSGRLTCT ATSPLGVSET STDIKSTLGM KEGAGPLPPI 841 VLAEYGGRVL LKVTIESDQA EVCWRLNGEP LKNSEKVHVG RKGADFFLEI EDIDDSASGE 901 LLCEVRQGTR SDTFGTTVKV ERRALTTLLD GLKDTTVRVG DTVQFNLSLK DAKKYEVKWF 961 LNDHELVESE KVHINVYPEQ AECCLKIESI TEEFNGELKC NIMTPSGDYV SSAQLHVVPR 1021 AAPVVVKGLS DSFVQAGDTV KFSVFAENAV EGKVKWVLNG SELLPSDSVI IATEQQNECS 1081 LTLKNINPQQ SGVLECLIST PYGVSRSSAK LTVKPLPEKL MKPEFRAIMA PVVIYEGDTL 1141 ETRVVLEGEP QPEVKWYIND VPLVEDSNVE ITTEKGISKV EIKKVNFDLN GVLKCVAKNE 1201 YGEVATSTSV SIRRQIPVEF EQFLCDTTCR EGDTLKLKAV LLGQPRPEVS WYLKGKKLVQ 1261 TDKIHIYTEQ NTYVAIISDI TCDYSGEVLC KAVNEFGEAS SSAMLTVLPR GVPPDFLEWM 1321 NSISAVDGAE VVHHVKFTGE PTPTIRWFIN NQEVHNSDNF AIHTDKDVCT LTIKHFSASL 1381 VGEIICKAEN DAGEVSCTSQ MSLAPAGYIR EEVSERSELE AVALSTGEIE GSEAGTDFAV 1441 SLPDEDMEEE TSRLESSVLA PKFITKIKDT RVTVGKQAVF ECIVPGTKGV CVKWLKDNKE 1501 IELLARIRVG SHKEENLIIH RLTVDDVTKA DAGTYTCVVM NEHGQEICSA RLEVDEQWTV 1561 AQLPETVPEI VEVLHSCVVE ENEEAIMHCT VTGCPDAQVR WYKDNVQLES SKRHQMVSEA 1621 EGVFVLKIPN VTMKDAGEYK CEVFNASGVA SSTATLTVTV PTAKEPTVGE AMAPKFVEPL 1681 MVSEDAHKKV TMFTCKIYGQ PRPQVRWFKN EKMLNSSYKY EIINEEQDSY VLKIHDTTTE 1741 DNGEYRCEAF NENGIASSSV ALTMKFEQVT ESTLPESAPE FSKPLKSAVL NKGEALHLEC 1801 TVVGQPAPSI QWFKSEEELK TTETVTITSL PNGVECLDMK ETMPSDSGDY RCIATNRLGY 1861 SSSEATVAVH APEEMETAEL LESTTEFVES LHNQTVKENE TGILSCRVTG PPVTSVQWFK 1921 EDVLLESSEK YEIISEANGV FALQVHDSTA EDSGEFKCEV TTEKGSSISK AYLTVEKSSV 1981 KEEFAEEQPP EILKSLTPTS LREGESLTLE CTVTGKPRPT VQWFRNGEEI EVDESVKLET 2041 SSGGVCRLTV HNVTQKDAGE YRCIATNTCG ASWSDASVTV NVAEQAVPTS LTEAAPLFVR 2101 QLQSCTVKAN EQHILQCRVT GAPKPQVRWL KNGVELEPSA KYEIVCEDGE THILKVQNFN 2161 KDDSAVYRCE ASNEKGFAAS EANIEIQRAE VATAVPTFSE TLVSTSVVEG EPLLLECTVS 2221 GEQDSIVEWF KNGQKIEATQ NVKIESLAGG VQHLEVRNIA LSDSGVYRCV VANQLGDSST 2281 EARVTVQTHE TVEEAELCEA VPVFIEVLKS QVAKENEVAS LSCKVYGLPT AEVQWFKDGV 2341 ELKSSEKYEL ISEVSGVFLL QVHNIGKQDA GEYTCVATNK MGSVSSNAFL TVSVAEERVS 2401 ASQQMAPAFL SAITEAFPMC GEKVHLECVV TGNPPPEVHW LKNGQELLST DMMKVSSFSD 2461 GTQCLEIEHV AVKDAGDYCC VATNPLGEAS TKIAVVVQTA ETEGDFKLYG TVPQFVETLH 2521 NCDVQENETA VMKCKVIGTP LPEVRWFKEN ASLESSAKFE LTSDVDGLFL LKIHNAKEED 2581 VGEYRCEVFN CKGVASSKAQ LSVTVGETSE AVKALEAPLF VKTLTSCSLT EGEHLQLNCA 2641 VSGQPTPTIQ WFKNGEEIKA TGLVKIESLP EGILTLVVQN AHVGDSGMYR CLATNEAGEC 2701 STEATVSVQG KYFPSIIQFF PLNTACETME LEELAETAPE FVQVLHFCDV AETQEGILSC 2761 KLTGFPKPQV RWFKDGVPIQ PSEKYEMVTD DNGLVLLKVH VAGREDSGEY RCEAFNSRGV 2821 AWTEAPLNVK AAELMEYEEG EEVAPDFLEP VQACVVNKGE DAVLICRISG VPTPQVCWYK 2881 NGVPLVPDER VEITSFGDRH TLVVHNAQQE DVADYRCEAK NDAGVVWSDA TVCVLSEDYL 2941 MESVQDEVAP FFVREIQQES VNAGDRAVLT CQVSGNPTPE VHWYRDGKLL DLSKDVEIAS 3001 TSDGTVTLVI QHAKVEDQGN YRCEAVNILG SACSKAPLSV FPTEEIMEVE ESASMQFIEP 3061 LHHYMREEDT VAVFECKLKA QPVSTIVWYK DGAPILPDNF TVIESLPDGL QRLTLRCTTA 3121 SAVGHYACLA KSDITEVKTE DDLLSSSACC FSSNEIYSSV SKVPLALEFL QGLKRQCVKK 3181 GDTVTMRCQV NMKGRSKLPT VKWYKSGREL LPDKRVKMEA TMDGWFTLTV SKFEENDAGM 3241 YTCIITENSS VIKSEAPVEL TVTEAGGEFT IVKELASQTI SAGEKLELEL VTSESWDDIR 3301 WLHNNEVVLN DKRTKIEQPE ACVCRLTVAE TSPKDQGNYF VIATKGNKTV ESQAKVIVTE 3361 GKHLEITKAL EDITVPAGAE VTLEVQFNEP IDHAQWYLDS KELVSDAKVS LEQLDERILR 3421 LHLKNADKND AGVYGVVAHS GEQTTECKAN LSVQAQQFLT ITKGLDDVLV EPHKPLTLTV 3481 NVDGLPEKVE WLKDGEPLKE QANLKIEVPA NGVYRLAISD TLPDSAGLYT FRAIDETGVI 3541 ESSGTLSLMD MAEELPQKRI TKLEMVEGLM DQAVAEREPV CLRIKLNKKP KVVKWYKNGK 3601 EIIPSNRFKA EVDETGASLE ISSLLAEDSG LYEVVASDER DSVASSGKLL VTSASQLDIT 3661 RGLQDRTVMK GTELTLEIQL SKPADMATWY HGTEKLANGQ NLRLEEIDGR VYRLHVQHAD 3721 IQDAGEYRVV VKGDGEKAES KANITVKTIP NLRISKELQD VSPTLHETVI MEVLLEGLPD 3781 NVEWFKNGSK LSSVPGMRIE VGGDGWHRLI LEDVLPDSAG LYKFRASNPD GSVESSGTVI 3841 LKQPVEEKPG DAEAALTLMK GLEDQTVDLG HPISLSVKLN KRPKDIKWYS NGKEIRPSMK 3901 KKIKFDGLEA TLEVDKASED DGGIYEVVVS DENTTIRSSG NVRIAVPTAL KIISGLKPCK 3961 VTVGEPLVLS VGLEGNADLI EWLKDGVKLT NVPNYSFTCT EGLYNLQVKQ AEMGNAGEYM 4021 FIARKASDAV SSVGSVVVKS PPSEEAVEQP EKLTFVECLE DQQLQEGDEL KLKVQTNKKP 4081 MTVKWTRNGI PVTSTGGTKL VDNGDRVFEL IVENASRGDA GSYKVIIGDE HASAESVADV 4141 VVIEKAAEKP LQVLKGMADL TCDIGSKACF EVQISGKPKS YRWYKNGREV KQTPRISLKE 4201 MDGGRYCLEI EKTVTDDGGE FSFEAENDVG KVHSQALLTV QSPAARKGPA AEPLVITRPL 4261 DDQTAEEGAE ISFEAEFNRG PKEVHWYKGM DVVTGNEKVK ISSPAVNASK LHVTRVSPED 4321 SSSYRVEAID DLGNIVTSTA RLTVNASPER LEFIKPMADV QVAKGDTATL EVQVKGIPQS 4381 VKWYKNGRLL PSQGRQQEIG KGTYILKIPN ASDDDQATYK CELENQLGSI STEGTLVVLP 4441 SVEEAGEKEI GALKVLKGLA DITLYVGDDL LLEVELSSKP EEVLWFNNGH PVLARNCEVE 4501 VMPNGHAVCR CRLPNIDISC DGTFTVKARN PYGSADSSNK LTVKGKPPKI LTGLEDRRTS 4561 PGNRVVFEVE VDRKPKLVKW YKNGRLVKEN ERTVLISVDD CTYQLVLNDV DKEDVGNYLV 4621 EVSNDFGVAK SEAKLTIIEP VDEMWRSSPR IVKGLNDVEI FDGNHATFSV TVEGKPTSVR 4681 WYRNGTELTS SQRVLPTQLD GFTYKLTLRE CHKDDMGVIK VLAMNDFGSD TSEGRLIVKE 4741 VPTGRMPSGM EKAARFIIPL EDVAAEESKK AVLSCKVEGL PTPLITWFKD GKEIDKNERV 4801 TYSMDEDKVC SLSIGAISPE DEGCYAALAK NSLGQDRTEC YLTIAAPKGA EELEHGVAPE 4861 FIKPLRSKSI IEGEDLTLDA RIIGNPLPEI CW //