LOCUS CAB36627.1 1540 aa PRT BCT 23-OCT-2008 DEFINITION Mycobacterium leprae putative polyketide synthase protein. ACCESSION AL035480-1 PROTEIN_ID CAB36627.1 SOURCE Mycobacterium leprae ORGANISM Mycobacterium leprae Bacteria; Actinobacteria; Corynebacteriales; Mycobacteriaceae; Mycobacterium. REFERENCE 1 (bases 1 to 34316) AUTHORS Murphy L., Harris D. JOURNAL Unpublished. REFERENCE 2 (bases 1 to 34316) AUTHORS James K.D., Parkhill J., Barrell B.G., Rajandream M.A. JOURNAL Submitted (15-FEB-1999) to the INSDC. Mycobacterium leprae sequencing project, Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA E-mail: barrell@sanger.ac.uk Cosmids supplied by Dr. Stewart T. Cole, [3] Unite de Genetique Moleculaire Bacterienne, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France Requests for cosmids should be sent to Karin Eiglmeier (kei@pasteur.fr). REFERENCE 3 (bases 1 to 34316) AUTHORS Eiglmeier K., Honore N., Woods S.A., Caudron B., Cole S.T. TITLE Use of an ordered cosmid library to deduce the genomic organization of Mycobacterium leprae JOURNAL Mol. Microbiol. 7(2), 197-206(1993). PUBMED 8446027 COMMENT Notes: The Sanger Centre is funded to complete the sequence of M. leprae by the Heiser Program for Research in Leprosy and Tuberculosis of The New York Community Trust. Work in Paris is supported by the Heiser Trust, the Association Francaise Raoul Follereau and the Groupement de Recherches et des Etudes des Genomes (GIP-GREG). Details of M. leprae sequencing at the Sanger Centre are available on the World Wide Web. (URL, http://www.sanger.ac.uk/Projects/) CDS are numbered using the following system eg MLCB33.01c. ML (M. leprae), cB33 (cosmid name), .01 (first CDS), c (complementary strand). The more significant matches with motifs in the PROSITE database are also included but some of these may be fortuitous. The length in codons is given for each CDS. Usually the highest scoring match found by fasta -o is given for CDS which show significant similarity to other CDS in the database. The position of possible ribosome binding site sequences are given where these have been used to deduce the initiation codon. All CDS over 100 codons have been analysed. Gene prediction is based on positional base preference in codons especially where there is an increase in the observed/expected third position G + C. CAUTION: We may not have predicted the correct initiation codon. Where possible we choose an initiation codon (atg, gtg, or ttg) which is preceded by an upstream ribosome binding site sequence (optimally 5-13bp before the initiation codon). If this cannot be identified we choose the most upstream initiation codon. IMPORTANT: This sequence MAY NOT be the entire insert of the sequenced clone. It may be shorter because we only sequence overlapping sections once, or longer, because we arrange for a small overlap between neighbouring submissions. FEATURES Qualifiers source /organism="Mycobacterium leprae" /mol_type="genomic DNA" /clone="cosmid B12" /db_xref="taxon:1769" protein /transl_table=11 /gene="MLCB12.01c" /note="MLCB12.01c, probable polyketide synthase, len: 1540 aa; similar to ERY1_SACER (EMBL:M63676) Saccharopolyspora erythraea erythronolide synthase (3491 aa), fasta scores; opt: 2230 z-score: 2283.7 E(): 0, 36.6% identity in 1550 aa overlap. N-terminus similar to the N-terminus of MCAS_MYCBO (EMBL:M95808) Mycobacterium bovis mycocerosic acid synthase (2110 aa) (42.2% identity in 879 aa overlap). Also similar to many mycobacterial putative polyketide synthases e.g. ppsB, Rv2932, (MTV011.01-MTCY338.21) M.tuberculosis putative polyketide synthase involved in phenolpthiocerol synthesis (1538 aa) (76.3% identity in 1561 aa overlap). Annotated as ORF TR:Q49932, designated pksC in M.leprae cosmid EMBL:U00023. Contains Pfam match to entry PF00550 pp-binding, Phosphopantetheine attachment site, score 42.90, E-value 3e-10, Pfam match to entry PF00698 Acyl_transf, Acyltransferase domain, score 467.00, E-value 1.6e-136, Pfam match to entry PF001 09 ketoacyl-synt, Beta-ketoacyl synthase, score 694.00, E-value 7.4e-205, Pfam match to entry PF00501 AMP-binding, AMP-binding enzyme, score -162.30, E-value 4.9e-06. Contains PS00606 Beta-ketoacyl synthases active site" /db_xref="GOA:Q9S384" /db_xref="InterPro:IPR009081" /db_xref="InterPro:IPR013968" /db_xref="InterPro:IPR014030" /db_xref="InterPro:IPR014031" /db_xref="InterPro:IPR014043" /db_xref="InterPro:IPR016035" /db_xref="InterPro:IPR016036" /db_xref="InterPro:IPR016039" /db_xref="InterPro:IPR018201" /db_xref="InterPro:IPR020801" /db_xref="InterPro:IPR020806" /db_xref="InterPro:IPR020841" /db_xref="InterPro:IPR032821" /db_xref="InterPro:IPR036291" /db_xref="InterPro:IPR036736" /db_xref="UniProtKB/TrEMBL:Q9S384" BEGIN 1 MRTAFSRISG MTTQQRAALI EEFTKLSRIA VAEPIAVVGI GCRFPGDVTG PDSFWDLLID 61 GRNAISRVPA DRWDADAFYD PDPLTPGRMT TKWGGFVSDI AGFDAAFFGI TPREAAAMDP 121 QQRILLEVAW EALENAGIPP DSLGNSRTGV MIGVYFNEYQ SMLASSLENV DAYSGTGNAH 181 SITVGRISYL LGLRGPSVAV DTACSSSLVA VHLACQSLRL RETDLVLAGG VSITLRPETQ 241 IAISAWGLLS PHGRCAAFDA AADGFVRGEG AGVVVLKRLT DAVRDGDLVL AVVRGSAVNQ 301 DGRSNGVTAP NTAAQCDVIT DALRSSDVAP ESVNYVESHG TGTVLGDPIE FEALAATYGR 361 GESACALGAV KTNLGHLEAA AGIAGFIKTV LAVQRGQIPP NLHFSQWNPA IDAASTRFFV 421 PTENSSWPIS DGQAGPRRAA VSSFGLGGTN AHVVIEQGPE LTPVTECSSN TAVSTLVVTG 481 KTASRVAAMA GMLADWVEGP GAEVALADVA HTLNHHRSRH AKFGTVVARD RIQAVAGLRA 541 LAAGKQAPGV VGQQDGTPGS GTVFVYSGRG SQWAGMGRQL LADEPAFTAA VAELEPVFVV 601 HAGFSLHDVL ANGKELVGIE QIQLGLIGMQ LTLTELWRSY GVQPDLVIGH SLGEVAAAVV 661 AGALTAAEGL RVTATRSRLM APLSGQGGMA LLELDAVETE ALIVDYSQVT LAIYNSPRQT 721 VIAGPTEQID ELIDRVRAQN RFASRVNIEV APHNPAMDAL QPQMRSELAD VAPRTPTIPI 781 LSTTYADLGS CPVFDAQHWA TNMRNPVHFQ QAIMTAGTDH RTFIEISAHP LLTQAITDTL 841 HGTRCISIGT LQRDADDTVT FHTNLNNVHT VHPPHTPHPA EPHVTIPSTP WQHTRHWITT 901 PLASTTALQH LDRNRVTTHA TGIGNTELDD WIYQLDWPTR PLTSEPGATW ASSGSWLVVS 961 DAGLSDELAR LVVLADPASR VEYLAPSALD HDVSTLHDLH NALRGVDNVL YAPSVSTESV 1021 DPVDPESGYR LFHAVRRLAA AMVAGTSKPK LVVVTRNAQP VAVGERANPA HAVLWGFGRT 1081 LALEHPEIWG SVIDLDASMP PELAARCILD EVASQDEKDR EDQVVYRACL RRAPRLQRRT 1141 ASAVSSVKLS SGQDTTQLVI GATGNIGPHL IRQLAAMGAT TIVAVSRNPS TRLLELDKNL 1201 AATGTNLISV AADATDPTAM TALFDRFGVD LPPLEGIYLA AFAGRPVLLS DMTDNDVVAM 1261 FRPKLDVASL LHRLSLKQPV KYFLLFSSIS GLIGSRWLAH YTATSAFLDI LGYARRLVGL 1321 PATVVAWGLW KSLADAQQDA TQVSAESGLQ PMTDDVAIGA LPLVMSPDAP VHSVVAAADW 1381 PLLAAAYQTR GVLRMVDDLL PAPGEVLMQE SEFRKALRTC QAERRHNMLF DHVSALVAKA 1441 MGLLPAETLD PSAGFFQLGM DSLMSVTLQR ALSDSLGEFL PASVVFDYPT VYSLTDYLAT 1501 ILPEFLEPEA PSGRTTADAY DEFTESELLD QLSERLRGTQ //