KHJ47415.1

LOCUS       KHJ47415.1              1106 aa    PRT              CON 17-DEC-2014
DEFINITION  Trichuris suis receptor L domain protein protein.
ACCESSION   KN538379-171
PROTEIN_ID  KHJ47415.1
SOURCE      Trichuris suis (pig whipworm)
  ORGANISM  Trichuris suis
            Eukaryota; Metazoa; Ecdysozoa; Nematoda; Enoplea; Dorylaimia;
            Trichinellida; Trichuridae; Trichuris.
REFERENCE   1  (bases 1 to 2343098)
  AUTHORS   Mitreva,M.
  TITLE     Draft genome of Trichuris suis
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 2343098)
  AUTHORS   Mitreva,M., Pepin,K.H., Abubucker,S., Martin,J., Minx,P.,
            Warren,C., Palsikar,V.B., Zhang,X., Rosa,B.A. and Wilson,R.K.
  TITLE     Direct Submission
  JOURNAL   Submitted (27-JAN-2014) The Genome Institute, Washington University
            School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA
COMMENT     Trichuris suis contaminated a dirt lot located at the USDA,
            Agricultural Research Service, Beltsville Agricultural Research
            Center, Animal Parasitic Disease Laboratory in Beltsville, MD since
            the early 1960s.  Adult worms were isolated for passage from pigs
            placed on the lot and naturally infected.  The T. suis adults were
            manually removed from the cecum and proximal colon tissue and
            cultured in vitro to release fertilized eggs that were removed
            after 24-48 hours and embryonated to an infective stage (Hill et
            al., Experimental Parasitology 77, 170-178, 1993).  The strain has
            been actively passed in pigs one to two times per year since that
            time and characterized for pathogenesis in pigs (Mansfield and
            Urban, Veterinary Immunology and Immunopathology 50, 1-17, 1996).
            This strain of T. suis has also been used therapeutically in human
            subjects with inflammatory bowel disease (Trichuris suis therapy in
            Crohn's disease. Summers RW, Elliott DE, Urban JF Jr, Thompson R,
            Weinstock JV. Gut. 2005 Jan;54(1):87-90. The Genome Institute
            collaborators that provided material for the genome/transcriptome
            sequencing are: USDA - Urban, Jr., J.F., Hill, D. E. and Michigan
            State University - Mansfield L.S.
            
            The repeat library was generated using Repeatmodeler (A. Smit, R.
            Hubley http://www.systemsbiology.org/). The Ribosomal RNA genes
            were identified using RNAmmer
            ((http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer ) and
            transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy,
            1997). Non-coding RNAs, such as microRNAs, were identified by
            sequence homology search of the Rfam database
            (http://selab.janelia.org/software.html). Repeats and predicted
            RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P.
            Green http://repeatmasker.org). Protein-coding genes were predicted
            using a combination of ab initio programs Snap (I. Korf, 2004),
            Fgenesh (Softberry, Corp) and Augustus (M. Stanke, et. Al 2008) and
            the annotation pipeline tool Maker (M. Yandell et. al., 2007) which
            aligns mRNA, EST and protein information from same species or
            cross-species to aid in gene structure determination and
            modifications. A consensus gene set from the above prediction
            algorithms was generated, using a logical, hierarchical approach
            developed at the Genome institute. Gene product naming was
            determined by BER (JCVI: http://ber.sourceforge.net).
            
            Our goal is to explore this WGS draft sequence of Trichuris suis to
            better define proteins involved in nematode parasitism that impact
            health and disease and are relevant to both host-parasite
            relationships and basic biological processes.
            
            For information regarding this assembly or project, or any other
            GSC genome project, please visit our Genome Groups web page
            (http://genome.wustl.edu/genome_group_index.cgi) and email the
            designated contact person. For specific questions regarding the
            Trichuris suis genome project contact Makedonka Mitreva
            (mmitreva@genome.wustl.edu) at Washington University School of
            Medicine. The National Human Genome Research Institute (NHGRI) of
            the National Institutes of Health (NIH) provided funds for this
            project.
            
            ##Genome-Assembly-Data-START##
            Finishing Goal           :: High-Quality Draft
            Current Finishing Status :: High-Quality Draft
            Assembly Method          :: ALLPATHS_LG v. 2012-11-02
            Assembly Name            :: T_suis_1.0.allpaths
            Genome Coverage          :: 392x
            Sequencing Technology    :: Illumina
            ##Genome-Assembly-Data-END##
FEATURES             Qualifiers
     source          /organism="Trichuris suis"
                     /mol_type="genomic DNA"
                     /submitter_seqid="T_suis-1.0_Cont4"
                     /isolation_source="cecum and proximal colon of infected
                     animals which were naturally infected"
                     /host="Sus scrofa (pig)"
                     /db_xref="taxon:68888"
                     /chromosome="Unknown"
                     /dev_stage="adult"
                     /country="USA: Beltsville, MD"
     protein         /locus_tag="D918_02275"
                     /inference="protein motif:HMMPfam:IPR000494"
                     /inference="protein motif:HMMPfam:IPR001245"
                     /inference="protein motif:HMMPfam:IPR006211"
                     /note="KEGG: phu:Phum_PHUM581050 4.0e-228 Epidermal growth
                     factor receptor precursor, putative"
                     /db_xref="InterPro:IPR000494"
                     /db_xref="InterPro:IPR001245"
                     /db_xref="InterPro:IPR006211"
     intron_pos      30:0 (1/23)
     intron_pos      77:1 (2/23)
     intron_pos      113:1 (3/23)
     intron_pos      156:2 (4/23)
     intron_pos      207:2 (5/23)
     intron_pos      275:1 (6/23)
     intron_pos      319:2 (7/23)
     intron_pos      342:0 (8/23)
     intron_pos      387:2 (9/23)
     intron_pos      416:0 (10/23)
     intron_pos      455:0 (11/23)
     intron_pos      500:2 (12/23)
     intron_pos      550:0 (13/23)
     intron_pos      578:2 (14/23)
     intron_pos      662:0 (15/23)
     intron_pos      756:0 (16/23)
     intron_pos      808:0 (17/23)
     intron_pos      833:1 (18/23)
     intron_pos      882:1 (19/23)
     intron_pos      915:0 (20/23)
     intron_pos      968:0 (21/23)
     intron_pos      1005:1 (22/23)
     intron_pos      1054:1 (23/23)
BEGIN
        1 MCFGEYNPQY CCHAECAAGC HGPSDRDCYG CRAMRDDGRC VDKCPTPELY DPVTTQYIKN
       61 PNGKYAFNRE CVTVCPSHMV IYKDGCVSRC PEGYFAEVGS NVCEPCQGVC PKTCIIEHHV
      121 SSFNIKDFIG CTKVDGFIEI RKDTFEYGAL FLANDSFVRY DPMSEDQLEA LSSVQQVTHY
      181 VLIQSERLKS LSFLRNLEKI EGRKLFESSY ALYITHSFSM QYLGTVSLRS ILNGDVYIAS
      241 NYDLCYIHNI PWSKRIISVG HSSRVRSNRK ADICELEGKV CDPSCDHSQG CWGPGPEMCF
      301 DCMHWRLGNS CVDECNADGL YRAAPKQCAY CHVECMKCTG AGPRNCTECK HVSLDGECVL
      361 SCPKYTHYEN ILTRKCEKCH QNCYGYGCTG PGNFVGPGGC KRCKYGVLDE ESQTITRCLQ
      421 ELSAEKPCSH DADLENYYWT VPLSRKIQMS VALAVCIKCH PVCRRCYGFG TDYAHYGCEC
      481 LKYSLRESSN STTCVLECPR NTFTATPLQD DAAGECIPCD PQCNSCTGPE STDCLECLHY
      541 KDFIAESDKF NCTSSCPEQR PFVSQDRLCT DVDIMMEKHK KSQVIIGVVV AALVSLCILF
      601 FICLLFIRPK ASLLAHFKEP NPKLEIVAAV TPNLARLLLI RDFELNRGGV LGYGAFGTVY
      661 KGIWTPEKEK VKIPVAIKVL HEANASAQQE TLEEARIMAS MSHPHLVQLI GVCVGQQMML
      721 VTPLMPLGNL LDYVQNNRSK IGSAALICWC TQIADGMSYL EEHRLVHRDL AARNVLVQKP
      781 YHIRITDFGL AKLLEYGENE VKIFEGKMPI KWLALECIKY RRYTHKSDIW AFGITLWELF
      841 TFGDKPYKDV PLHQIPQLLE NGERLSQPKT ATLDMYMIMI RCWMVDADAR PTFKELKDIF
      901 VKMAKDPGRY LVVEGDALLR LPNYTPQDQR DMIRQLIDDP EVVDPDEYFS AAPSSSPPVS
      961 PTTLRKPLIE DREARASCCN QRHTSAVSQR YASDPLKNMP GLNTSKFSDS PTLTTDVENY
     1021 LIPDSQQASQ GLSPCDDGVF MVYGGPGELP CCTGGHAYYN EFSKPNTGIC LENLEYMQGS
     1081 LSEQDYHNCV PKNLQMSDIV QNETVV
//