LOCUS CAF31492.2 2209 aa PRT CON 06-FEB-2024 DEFINITION Caenorhabditis elegans Transcription factor sma-9 protein. ACCESSION BX284606-2543 PROTEIN_ID CAF31492.2 SOURCE Caenorhabditis elegans ORGANISM Caenorhabditis elegans Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis. REFERENCE 1 (bases 1 to 17718942) AUTHORS WormBase. CONSRTM WormBase Consortium JOURNAL Submitted (04-FEB-2024) to the INSDC. WormBase Group, European Bioinformatics Institute, Cambridge, CB10 1SA, UK. Email: help@wormbase.org REFERENCE 2 (bases 1 to 17718942) AUTHORS Sulson J.E., Waterston R. JOURNAL Submitted (03-MAR-2003) to the INSDC. Nematode Sequencing Project: Sanger Institute, Hinxton, Cambridge CB10 1SA, UK and The Genome Institute at Washington University, St. Louis, MO 63110, USA. REFERENCE 3 (bases 1 to 17718942) AUTHORS Sulson J.E., Waterston R. CONSRTM Caenorhabditis elegans Sequencing Consortium TITLE Genome sequence of the nematode C. elegans: a platform for investigating biology JOURNAL Science 282(5396), 2012-2018(1998). COMMENT Annotated features correspond to WormBase release WS292. Protein-coding gene structures below are the result of integration and manual review of the following types of data: ab initio predictions by Genefinder (P. Green and L. Hillier, pers. comm.); alignments to published proteins and cDNAs; genome sequence conservation with other nematodes (e.g. to C. briggsae using WABA: Genome Res. 2000. 10:1115-1125); sequence features (such as trans-splice and polyA sites). Sources of data: large-scale EST projects of Yuji Kohara (http://www.ddbj.nig.ac.jp/c-elegans/html/CE_INDEX.html); ORFeome cloning project (http://worfdb.dfci.harvard.edu); RST large-scale sequencing project (Genome Res. 2009. 19:2334-2342); IST library (Science. 2004. 303:540-3); RT-PCR EST set (Ewing B. Green P. 2010 Unpublished); UTRome EST data submission (UTRome v1 Mangone M. Piano F. 2009); TEC-RED data (PNAS 2004. 101:1650-1655); RNA Deep sequencing data (454 read clusters - Makedonka Mitreva, unpublished; Illumina sequence data, Genome Res. 2009. 19:657-66); Numerous data sets from the modENCODE project (Science. 2010. 330:1775-87); Individual C. elegans Nucleotide Database submissions; Personal communications with C. elegans researchers; Non-Coding gene structures below are derived using the following methods and data: ab initio prediction of tRNAs by tRNAscan-SE (Nucl. Acids. Res., 25, 955-964); integration and appraisal of miRNAs from miRBase (http://www.mirbase.org); integration and appraisal of RFAM predictions (rfam.sanger.ac.uk); 21U-RNAs (Cell. 2006. 127:1193-1207); modENCODE data (Science. 2010. 330:1775-87); manual curation of novel published ncRNAs from the literature. FEATURES Qualifiers source /organism="Caenorhabditis elegans" /chromosome="X" /strain="Bristol N2" /mol_type="genomic DNA" /db_xref="taxon:6239" protein /transl_table=1 /gene="sma-9" /locus_tag="CELE_T05A10.1" /standard_name="T05A10.1e" /note="Partially confirmed by transcript evidence" /db_xref="EnsemblGenomes-Gn:WBGene00004862" /db_xref="EnsemblGenomes-Tr:T05A10.1e" /db_xref="GOA:Q7JM43" /db_xref="InterPro:IPR003604" /db_xref="InterPro:IPR013087" /db_xref="InterPro:IPR036236" /db_xref="UniProtKB/TrEMBL:Q7JM43" /db_xref="WormBase:WBGene00004862" intron_pos 189:0 (1/25) intron_pos 241:0 (2/25) intron_pos 420:1 (3/25) intron_pos 571:0 (4/25) intron_pos 596:1 (5/25) intron_pos 733:0 (6/25) intron_pos 779:0 (7/25) intron_pos 852:0 (8/25) intron_pos 1359:2 (9/25) intron_pos 1394:2 (10/25) intron_pos 1435:0 (11/25) intron_pos 1467:2 (12/25) intron_pos 1506:1 (13/25) intron_pos 1542:2 (14/25) intron_pos 1616:1 (15/25) intron_pos 1669:0 (16/25) intron_pos 1761:0 (17/25) intron_pos 1810:2 (18/25) intron_pos 1858:1 (19/25) intron_pos 1927:0 (20/25) intron_pos 1980:0 (21/25) intron_pos 2013:2 (22/25) intron_pos 2095:0 (23/25) intron_pos 2142:1 (24/25) intron_pos 2188:0 (25/25) BEGIN 1 MSHQAIGISN NFQQVQREQL NHQRLLQAQL QTNGPGSVSQ QQQASQQQQQ VQHVQQQQQS 61 QQQQQQVASQ QNQPQLQMNA QILQALSTPQ GQNLVNMLLM QQALNAQQTD TQNPQQIMQH 121 QLAQQQAQQA QQAQQAQQAR QQAEQQAQAQ ARHQAEQQAQ QQAQQQAQAR QQEQQAQLAA 181 IQQQVTPQQF AQILHMQQQL QQQQFQQQQL QQQQLQQQQL QQQQLQQQQL QQQQLQQVLQ 241 ISQAQQQAQQ AQHVQSRQMQ PSQQSQVQAQ LQQQQQLQQQ QAQQLSQQQA QQQQQLQQLQ 301 LQQFLQQQQQ LHQQRAAAQQ AQAQNNASQQ RPSVASTPAL SSTPQLNDLT QTMQAQLQQQ 361 LLLQQQQAQA QQAQQAQQAQ LAQQAQQQQQ GQSQNRTVSQ ALQYIQSMQL QQRADGTPNA 421 ESQEERLAQM LNEQQQRMLQ NQAREAQHRQ LLISSTPAPR GGITMGTPIG IARREEQPVP 481 STVAVTTAPA AVRTPVAVPP MKQNSNPSMN PSSTSTSASA TSSHQILAPS LSKPLEQPSS 541 SKAASSGNES MSDHISRIIS ENEVILQGDP VIRKKRPYHR QIGAQSSVDH DSNSGGSTRT 601 SPGPKDSRML QAASRSQSLF ELSGSKHFMG SLTSGQPLLR PIQAHNDPNY TPECIYCKLT 661 FPNEAGLQAH EVVCGKKKEL EKAQIAQEGN PHSALKRRHT HQDATLAMHS PLAAHTPSNM 721 PGPSEPAIKL KKDDSTELDG TSKPDALQSS SSFPRSLPKE WEQHMLTLQN LAAIIPPFVQ 781 AFLAKVLQTK LTMVSSGTLT HIYENYNISP ICVKQFLHFA SQLTDQQLEE MTVESEKQYL 841 ENAEEYQAKG IIQALPCDNL EVTEMLKQQE TIMKGQVPDL LVSTQAVLHH CSGKRDNKDP 901 SKNFVMIKLL HANGKDITEI PMKAETDLDE CNARFVYQMR EFQKINNMND RLIEVLKTDP 961 AAAATPLFQV AREQSASVKF LIRQTGMVHL TIVASVAKRL GMSSMQQDVP ESNQIPNGII 1021 AFDGVQIKQE PEDPPGDDDD DDDDCIIEVV DEDQKQHIAM LTAAASGQDI GIGGQFQKVT 1081 QNLEPAIVNQ QSIVNHGDSI VNNASIPIAN NAPVQIVQNG GPMQHEVVAM PVQLKLEDLR 1141 LEPQTSGDTD KPYWLVINGD IGGRPSFMTT AGMTSRTHRT RNITSETYVT VPRQQPMFAV 1201 QDGTLSMYAK WNVPVHNDAE TKMNLSFMGM VSLRRRTGQQ KFFKYTTANK DQGHYRMTHS 1261 SFWDISTKIR DRQASMSEEK TPEESVDYDA QFIERLVGGT YTNTDLQGPS NAPVTIPVLV 1321 APEDSTSAEP STSGQSLLMR SPRPQSPPLR DIKMDLSDDD YSATDLATTC KLEPKQESFE 1381 DVKDVKRENS PDARPTVIIS DDAGRIRRER FANKYISRIR PKHDQIIGGH RTDEVYVYVR 1441 GRGRGRYICD RCGIRCKKPS MLKKHIKSHT DVRAFNCTAC NFSFKTKGNL TKHLSSKTHQ 1501 RRISNIQAGN DSDGTTPSTS SMMNMDDGYH RNQPLFDDYD NNSSDEEDYD HLNRMQAEHK 1561 FKFGQEHILF ERTAHTPPTR WCLVEAQNDH YWPSPDRRSC MSAPPVAMQR DFDDRAMTPV 1621 SGANSPYLSQ QVHSPMSTSS QSNIILDIPN NQKSNCSSVS NVSPSNSQNF QSLSTVPTCA 1681 SSSSNVLVPN VNFLQKDETL KCDQCDRTFR KISDLTLHQH THNIEMQQSK NRMYQCSECK 1741 IPIRTKAQLQ KHLERNHGVH MDESVTACID PLASTQSVLG GPSTSNPRSF MCVDCDIGFR 1801 KHGILAKHLR SKTHVMKLES LQRLPVDTLS LITKKDNGAC LNDIDTTDCE KARISLLAIV 1861 EKLRNEADKD EQGSVVPTTS IPAPQPVALT PEMIRALANA QTPVTASMTN TPSTAQFPVG 1921 VVSTPSVFPT PNPSSPLESS SMQFRKKAVL DSATHANDMP RTILRISEIP SSLPVNHQLH 1981 RDLSFLAHTT SRSESSITSP IVSSSTNFSY RKRSESSLSG SSPTHTKKLM VWNPPLAEPS 2041 FYSPKAALHP LSTDKAHASE SLSDRLHNKR PRPIPDNTKC QICADEFSTP IELQVHLHVD 2101 HVRMMDGAEY KCPRKFCGLN YESLDSLRAH VTAHYETDRQ KLLEEKVLLA EADFPIDNSK 2161 IEKLNSPKKE SMNKFTTPFK AISDHHELYA QTQQGAGSST SNQSPKAAN //