LOCUS       CAB36627.1              1540 aa    PRT              BCT 23-OCT-2008
DEFINITION  Mycobacterium leprae putative polyketide synthase protein.
ACCESSION   AL035480-1
PROTEIN_ID  CAB36627.1
SOURCE      Mycobacterium leprae
  ORGANISM  Mycobacterium leprae
            Bacteria; Actinobacteria; Corynebacteriales; Mycobacteriaceae;
            Mycobacterium.
REFERENCE   1  (bases 1 to 34316)
  AUTHORS   Murphy L., Harris D.
  JOURNAL   Unpublished.
REFERENCE   2  (bases 1 to 34316)
  AUTHORS   James K.D., Parkhill J., Barrell B.G., Rajandream M.A.
  JOURNAL   Submitted (15-FEB-1999) to the INSDC. Mycobacterium leprae
            sequencing project, Sanger Centre, Wellcome Trust Genome Campus,
            Hinxton, Cambridge CB10 1SA E-mail: barrell@sanger.ac.uk Cosmids
            supplied by Dr. Stewart T. Cole, [3] Unite de Genetique Moleculaire
            Bacterienne, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris
            Cedex 15, France Requests for cosmids should be sent to Karin
            Eiglmeier (kei@pasteur.fr).
REFERENCE   3  (bases 1 to 34316)
  AUTHORS   Eiglmeier K., Honore N., Woods S.A., Caudron B., Cole S.T.
  TITLE     Use of an ordered cosmid library to deduce the genomic organization
            of Mycobacterium leprae
  JOURNAL   Mol. Microbiol. 7(2), 197-206(1993).
   PUBMED   8446027
COMMENT     Notes:
            
            The Sanger Centre is funded to complete the sequence of M. leprae
            by the Heiser Program for Research in Leprosy and Tuberculosis of
            The New York Community Trust.
            
            Work in Paris is supported by the Heiser Trust, the Association
            Francaise Raoul Follereau and the Groupement de Recherches et des
            Etudes des Genomes (GIP-GREG).
            
            Details of M. leprae sequencing at the Sanger Centre
            are available on the World Wide Web.
            (URL, http://www.sanger.ac.uk/Projects/)
            
            CDS are numbered using the following system eg MLCB33.01c.
            ML (M. leprae), cB33 (cosmid name), .01 (first CDS),
            c (complementary strand).
            
            The more significant matches with motifs in the PROSITE
            database are also included but some of these may be fortuitous.
            
            The length in codons is given for each CDS.
            
            Usually the highest scoring match found by fasta -o is given for
            CDS which show significant similarity to other CDS in the database.
            The position of possible ribosome binding site sequences are
            given where these have been used to deduce the initiation codon.
            
            All CDS over 100 codons have been analysed.  Gene prediction
            is based on positional base preference in codons especially
            where there is an increase in the observed/expected third
            position G + C.  CAUTION:  We may not have predicted the
            correct initiation codon.  Where possible we choose an
            initiation codon (atg, gtg, or ttg) which is preceded by an
            upstream ribosome binding site sequence (optimally 5-13bp
            before the initiation codon).  If this cannot be identified
            we choose the most upstream initiation codon.
            
            IMPORTANT: This sequence MAY NOT be the entire insert of
            the sequenced clone.  It may be shorter because we only
            sequence overlapping sections once, or longer, because we
            arrange for a small overlap between neighbouring submissions.
FEATURES             Qualifiers
     source          /organism="Mycobacterium leprae"
                     /mol_type="genomic DNA"
                     /clone="cosmid B12"
                     /db_xref="taxon:1769"
     protein         /transl_table=11
                     /gene="MLCB12.01c"
                     /note="MLCB12.01c, probable polyketide synthase, len: 1540
                     aa; similar to ERY1_SACER (EMBL:M63676) Saccharopolyspora
                     erythraea erythronolide synthase (3491 aa), fasta scores;
                     opt: 2230 z-score: 2283.7 E(): 0, 36.6% identity in 1550
                     aa overlap. N-terminus similar to the N-terminus of
                     MCAS_MYCBO (EMBL:M95808) Mycobacterium bovis mycocerosic
                     acid synthase (2110 aa) (42.2% identity in 879 aa
                     overlap). Also similar to many mycobacterial putative
                     polyketide synthases e.g. ppsB, Rv2932,
                     (MTV011.01-MTCY338.21) M.tuberculosis putative polyketide
                     synthase involved in phenolpthiocerol synthesis (1538 aa)
                     (76.3% identity in 1561 aa overlap). Annotated as ORF
                     TR:Q49932, designated pksC in M.leprae cosmid EMBL:U00023.
                     Contains Pfam match to entry PF00550 pp-binding,
                     Phosphopantetheine attachment site, score 42.90, E-value
                     3e-10, Pfam match to entry PF00698 Acyl_transf,
                     Acyltransferase domain, score 467.00, E-value 1.6e-136,
                     Pfam match to entry PF001 09 ketoacyl-synt, Beta-ketoacyl
                     synthase, score 694.00, E-value 7.4e-205, Pfam match to
                     entry PF00501 AMP-binding, AMP-binding enzyme, score
                     -162.30, E-value 4.9e-06. Contains PS00606 Beta-ketoacyl
                     synthases active site"
                     /db_xref="GOA:Q9S384"
                     /db_xref="InterPro:IPR009081"
                     /db_xref="InterPro:IPR013968"
                     /db_xref="InterPro:IPR014030"
                     /db_xref="InterPro:IPR014031"
                     /db_xref="InterPro:IPR014043"
                     /db_xref="InterPro:IPR016035"
                     /db_xref="InterPro:IPR016036"
                     /db_xref="InterPro:IPR016039"
                     /db_xref="InterPro:IPR018201"
                     /db_xref="InterPro:IPR020801"
                     /db_xref="InterPro:IPR020806"
                     /db_xref="InterPro:IPR020841"
                     /db_xref="InterPro:IPR032821"
                     /db_xref="InterPro:IPR036291"
                     /db_xref="InterPro:IPR036736"
                     /db_xref="UniProtKB/TrEMBL:Q9S384"
BEGIN
        1 MRTAFSRISG MTTQQRAALI EEFTKLSRIA VAEPIAVVGI GCRFPGDVTG PDSFWDLLID
       61 GRNAISRVPA DRWDADAFYD PDPLTPGRMT TKWGGFVSDI AGFDAAFFGI TPREAAAMDP
      121 QQRILLEVAW EALENAGIPP DSLGNSRTGV MIGVYFNEYQ SMLASSLENV DAYSGTGNAH
      181 SITVGRISYL LGLRGPSVAV DTACSSSLVA VHLACQSLRL RETDLVLAGG VSITLRPETQ
      241 IAISAWGLLS PHGRCAAFDA AADGFVRGEG AGVVVLKRLT DAVRDGDLVL AVVRGSAVNQ
      301 DGRSNGVTAP NTAAQCDVIT DALRSSDVAP ESVNYVESHG TGTVLGDPIE FEALAATYGR
      361 GESACALGAV KTNLGHLEAA AGIAGFIKTV LAVQRGQIPP NLHFSQWNPA IDAASTRFFV
      421 PTENSSWPIS DGQAGPRRAA VSSFGLGGTN AHVVIEQGPE LTPVTECSSN TAVSTLVVTG
      481 KTASRVAAMA GMLADWVEGP GAEVALADVA HTLNHHRSRH AKFGTVVARD RIQAVAGLRA
      541 LAAGKQAPGV VGQQDGTPGS GTVFVYSGRG SQWAGMGRQL LADEPAFTAA VAELEPVFVV
      601 HAGFSLHDVL ANGKELVGIE QIQLGLIGMQ LTLTELWRSY GVQPDLVIGH SLGEVAAAVV
      661 AGALTAAEGL RVTATRSRLM APLSGQGGMA LLELDAVETE ALIVDYSQVT LAIYNSPRQT
      721 VIAGPTEQID ELIDRVRAQN RFASRVNIEV APHNPAMDAL QPQMRSELAD VAPRTPTIPI
      781 LSTTYADLGS CPVFDAQHWA TNMRNPVHFQ QAIMTAGTDH RTFIEISAHP LLTQAITDTL
      841 HGTRCISIGT LQRDADDTVT FHTNLNNVHT VHPPHTPHPA EPHVTIPSTP WQHTRHWITT
      901 PLASTTALQH LDRNRVTTHA TGIGNTELDD WIYQLDWPTR PLTSEPGATW ASSGSWLVVS
      961 DAGLSDELAR LVVLADPASR VEYLAPSALD HDVSTLHDLH NALRGVDNVL YAPSVSTESV
     1021 DPVDPESGYR LFHAVRRLAA AMVAGTSKPK LVVVTRNAQP VAVGERANPA HAVLWGFGRT
     1081 LALEHPEIWG SVIDLDASMP PELAARCILD EVASQDEKDR EDQVVYRACL RRAPRLQRRT
     1141 ASAVSSVKLS SGQDTTQLVI GATGNIGPHL IRQLAAMGAT TIVAVSRNPS TRLLELDKNL
     1201 AATGTNLISV AADATDPTAM TALFDRFGVD LPPLEGIYLA AFAGRPVLLS DMTDNDVVAM
     1261 FRPKLDVASL LHRLSLKQPV KYFLLFSSIS GLIGSRWLAH YTATSAFLDI LGYARRLVGL
     1321 PATVVAWGLW KSLADAQQDA TQVSAESGLQ PMTDDVAIGA LPLVMSPDAP VHSVVAAADW
     1381 PLLAAAYQTR GVLRMVDDLL PAPGEVLMQE SEFRKALRTC QAERRHNMLF DHVSALVAKA
     1441 MGLLPAETLD PSAGFFQLGM DSLMSVTLQR ALSDSLGEFL PASVVFDYPT VYSLTDYLAT
     1501 ILPEFLEPEA PSGRTTADAY DEFTESELLD QLSERLRGTQ
//