LOCUS       WQZ29404.1              2398 aa    PRT              BCT 29-DEC-2023
DEFINITION  Helicobacter pylori vacuolating cytotoxin domain-containing
            protein protein.
ACCESSION   CP079244-875
PROTEIN_ID  WQZ29404.1
SOURCE      Helicobacter pylori
  ORGANISM  Helicobacter pylori
            Bacteria; Campylobacterota; Epsilonproteobacteria;
            Campylobacterales; Helicobacteraceae; Helicobacter.
REFERENCE   1  (bases 1 to 1570870)
  AUTHORS   Thorell,K., Munoz-Ramirez,Z.Y., Wang,D., Sandoval-Motta,S., Boscolo
            Agostini,R., Ghirotto,S., Torres,R.C., Falush,D., Camargo,M.C. and
            Rabkin,C.S.
  CONSRTM   HpGP Research Network
  TITLE     The Helicobacter pylori Genome Project: insights into H. pylori
            population structure from analysis of a worldwide collection of
            complete genomes
  JOURNAL   Nat Commun 14 (1), 8184 (2023)
   PUBMED   38081806
  REMARK    Publication Status: Online-Only
REFERENCE   2  (bases 1 to 1570870)
  AUTHORS   Camargo,M.C. and Rabkin,C.S.
  TITLE     Direct Submission
  JOURNAL   Submitted (15-JUL-2021) IIB, National Cancer Institute, 9609
            Medical Center Dr., Rm. 6E110, Bethesda, MD 20892, USA
COMMENT     The annotation was added by the NCBI Prokaryotic Genome Annotation
            Pipeline (PGAP). Information about PGAP can be found here:
            https://www.ncbi.nlm.nih.gov/genome/annotation_prok/
            
            ##Genome-Assembly-Data-START##
            Assembly Method        :: HGAP v. 4
            Assembly Name          :: HpGP-TWN-021
            Genome Representation  :: Full
            Expected Final Version :: Yes
            Genome Coverage        :: 2661x
            Sequencing Technology  :: PacBio Sequel II
            ##Genome-Assembly-Data-END##
            
            ##Genome-Annotation-Data-START##
            Annotation Provider               :: NCBI
            Annotation Date                   :: 07/16/2021 08:12:09
            Annotation Pipeline               :: NCBI Prokaryotic Genome
                                                 Annotation Pipeline (PGAP)
            Annotation Method                 :: Best-placed reference protein
                                                 set; GeneMarkS-2+
            Annotation Software revision      :: 5.2
            Features Annotated                :: Gene; CDS; rRNA; tRNA; ncRNA;
                                                 repeat_region
            Genes (total)                     :: 1,497
            CDSs (total)                      :: 1,452
            Genes (coding)                    :: 1,391
            CDSs (with protein)               :: 1,391
            Genes (RNA)                       :: 45
            rRNAs                             :: 2, 2, 2 (5S, 16S, 23S)
            complete rRNAs                    :: 2, 2, 2 (5S, 16S, 23S)
            tRNAs                             :: 36
            ncRNAs                            :: 3
            Pseudo Genes (total)              :: 61
            CDSs (without protein)            :: 61
            Pseudo Genes (ambiguous residues) :: 0 of 61
            Pseudo Genes (frameshifted)       :: 44 of 61
            Pseudo Genes (incomplete)         :: 11 of 61
            Pseudo Genes (internal stop)      :: 18 of 61
            Pseudo Genes (multiple problems)  :: 12 of 61
            ##Genome-Annotation-Data-END##
FEATURES             Qualifiers
     source          /organism="Helicobacter pylori"
                     /mol_type="genomic DNA"
                     /strain="HpGP-TWN-021"
                     /isolation_source="Biopsy"
                     /host="Homo sapiens"
                     /db_xref="taxon:210"
                     /geo_loc_name="Taiwan"
                     /lat_lon="23.30 N 121.00 E"
                     /collected_by="Maria Camargo and Charles Rabkins"
     protein         /locus_tag="E5P95_04470"
                     /inference="COORDINATES: similar to AA
                     sequence:RefSeq:WP_000874657.1"
                     /note="Derived by automated computational analysis using
                     gene prediction method: Protein Homology."
                     /transl_table=11
BEGIN
        1 MAFKKAGLIS KFISKGSFKL NKISKKIFKL NLILKREKPL KRHKKTKSIK KPFNKNKSFL
       61 KASVLLIGAL GGLSHLRASE CRYWSWSSWS YHDNIESGSN SPTHNSYCLF NSTQGSGTYY
      121 LNTLTTYSPG GASFTQKFNN GTLDVGGNIR FGGMGVNGGN VGYITGTYDA QTINFNSSRI
      181 TTGNSFSTGG GATLNFNATN RITINQASFN NGDAGTQHSY MNFSGSNINV ISSSFTDDTD
      241 GGFSFSGNGT NSNLSFDKTS FNQGTYKFTN SANLNFNNSA FNQGTYNFNS AQSVFENSNF
      301 NQGTYSFTDN TGLNFNNDTF SQGTYNFNTS KVSFSGANTL NSSSPFASLK GSVSFGSDAI
      361 FNLNQTLNSN QTYDILTTNG TIQYGVYQSY LWHLINYKGD KAISHVEVGN NTYDVTFDIN
      421 GQDETLQETF NNQSIITQFL GDDLQAKAQK TYQQDLSNSQ SALNNAADDN KIANSDTDYT
      481 KSSNPTIKKD AQNLENTDQT IQQDKQALEK DLANVKQLAN APTGFNEQAF NQAQNKEQQD
      541 EQTLQENEKT FSSEQEGLEK AIANAKPASP TPSPTPTPTK HTAPNTPPNK VPPTPPTQNL
      601 PTTNVWNGVY NLQNQTYSQK GVYYIDPNLS GQSGQSGNTL STYTANLFGR SFGVNIQNGT
      661 LIIGNNTESA NDNGLIWIGH GGFGYITGTF NATNIYLTNN FKTGEGVSGS DGGGANITFK
      721 ASDNITIDGL NYNDAETVTK MIQTGASQHS YAAFDATNNI SVTNSSFSDM TWGKFSFSAK
      781 NISFSNASFS GFTNPGGSSV ISANAANSLS FVNSRLNGGV VYNLWANSLI FNNTQAVFNV
      841 LYSRGTSNFN ATTQLLGNTS FTLSSQSLLN FNGDTTLQDN ANITLGNKSQ ATFKNSLTLD
      901 NNSDLSLDNQ SVLNANGASA FNNQASLNIY NGSQATFNSL FFNGGILSLN ASSKLNASSA
      961 SFSNNTTINL DDSVLSANNT SSLNANINFQ GASQANFGGN TTIDTASFNF DSASSLSFNN
     1021 LTANGALNFN GYAPSLTKAL MSVSGQFVLG NNGDINLSDI NIFDNITKSV TYNILNAQKG
     1081 ITGISGANGY EKILFYGMKI QNATYSDNNN IQTWSFINPL NSSQIIQESI KNGDLTIEVL
     1141 NNPNSASNTI FNIAPELYNY QASKQNPTGY SYDYSDNQAG TYYLTSSIKG LFTPKGSQTP
     1201 QTPGTYSPFN QPLNSLNIYN KGFSSGNLKT LLGILSQNSA TLKEMIESNQ LDNTTSINEV
     1261 LQLLDEIKIT PAQKQALLET INHLTDNINQ TFNNGNLVIG ATQDNVTNST SSIWFGGNGY
     1321 SSPCALDSAT CSSFRNTYLG QLLGSTSPYL GYINADFKAK SIYITGTLGS ANAFESGGSA
     1381 DVTFQSANNL VLNKANIEAQ ATDNIFNLLG QEGIDKIFNQ GNLANVLSQV AMEKIKQAGG
     1441 LGNFVENALS PLSKELSASL QNETLGQLIG QNNLDNLLNN SGVMNAIQNI ISKKLSIFGN
     1501 FVTPSIIENY LAKQSLKSML DDKGLLNFIG RYIDASELSS ILSVILKDIT NPPTSLQKDI
     1561 GVVANDLLNE FLGQDVVKKL ESQGLVSNII NNIISQGGLS GVYNQGLGSV LPPSLQNALK
     1621 ENDLGTLLSP RGLHDFWQKG YFNFLSNGYV FVNNSSFSNA TGGSLNFVTN KSIIFNGDNT
     1681 IDFSKYQGAL IFASNGVSNI NITTLNATNG LSLNAGLNNV SVQKGEICVN LANCPTTKNS
     1741 SSTNSSVTPT NESLSVRANN FTFLGTIASN GAIDLSQVKN NSVIGTLNLN ENATLQANNL
     1801 TITNAFNNAS NSTANINGNF TLNQQATLST NASGLNVMGN FNSYGDLVFN LSHSVSHAII
     1861 NAQGTATIMA NNNNPLIQFN TSSKEIGTYT LIDSAKAIYY GYNDQITGGS SLADYLKLYT
     1921 LIDINGKHMV MTDNGLTYNG QAVNIKDGGL IVGFKDSQNQ YIYTSILYNK VKIAVSNDPI
     1981 NNLQAPTLKQ YIAQIQGTQG VDSIDQAGGT QAVNWLNKIF ETKGSPLFAP YYLESHSTKD
     2041 LTTIAGDIAN TLEVIANPDF KNDATNILQI NTYTQQMSRL AKLSDTSTFA SADFHERLEA
     2101 LKNKRFADAI PNAMDVILKY SQRNRVKNNV WSTGVGGASF INGGTGTLYG INVGYDRFIK
     2161 GVIVGGYAAY GYSGFHANIT QSGSSNVNIG VYSRAFIKRS ELTMSLNETW GYNKTFINSY
     2221 DPLLSIINQS YKYNTWTTDA KINYGYDFMF KDKSVIFKPQ IGLAYYYIGL SDLRGIMDDP
     2281 IYNQFRANAD PNKKSVLTIN FALESRHYFN KNSYYFVIAD VGRDLFINSM GDKMVRFIGN
     2341 NTLSYRDGDR YNTFASIITG GEIRLFKTFY VNAGIGARFG LDYKDINITG NIGMRYAF
//