LOCUS WQZ30623.1 3165 aa PRT BCT 29-DEC-2023 DEFINITION Helicobacter pylori vacuolating cytotoxin domain-containing protein protein. ACCESSION CP079244-582 PROTEIN_ID WQZ30623.1 SOURCE Helicobacter pylori ORGANISM Helicobacter pylori Bacteria; Campylobacterota; Epsilonproteobacteria; Campylobacterales; Helicobacteraceae; Helicobacter. REFERENCE 1 (bases 1 to 1570870) AUTHORS Thorell,K., Munoz-Ramirez,Z.Y., Wang,D., Sandoval-Motta,S., Boscolo Agostini,R., Ghirotto,S., Torres,R.C., Falush,D., Camargo,M.C. and Rabkin,C.S. CONSRTM HpGP Research Network TITLE The Helicobacter pylori Genome Project: insights into H. pylori population structure from analysis of a worldwide collection of complete genomes JOURNAL Nat Commun 14 (1), 8184 (2023) PUBMED 38081806 REMARK Publication Status: Online-Only REFERENCE 2 (bases 1 to 1570870) AUTHORS Camargo,M.C. and Rabkin,C.S. TITLE Direct Submission JOURNAL Submitted (15-JUL-2021) IIB, National Cancer Institute, 9609 Medical Center Dr., Rm. 6E110, Bethesda, MD 20892, USA COMMENT The annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). Information about PGAP can be found here: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ ##Genome-Assembly-Data-START## Assembly Method :: HGAP v. 4 Assembly Name :: HpGP-TWN-021 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 2661x Sequencing Technology :: PacBio Sequel II ##Genome-Assembly-Data-END## ##Genome-Annotation-Data-START## Annotation Provider :: NCBI Annotation Date :: 07/16/2021 08:12:09 Annotation Pipeline :: NCBI Prokaryotic Genome Annotation Pipeline (PGAP) Annotation Method :: Best-placed reference protein set; GeneMarkS-2+ Annotation Software revision :: 5.2 Features Annotated :: Gene; CDS; rRNA; tRNA; ncRNA; repeat_region Genes (total) :: 1,497 CDSs (total) :: 1,452 Genes (coding) :: 1,391 CDSs (with protein) :: 1,391 Genes (RNA) :: 45 rRNAs :: 2, 2, 2 (5S, 16S, 23S) complete rRNAs :: 2, 2, 2 (5S, 16S, 23S) tRNAs :: 36 ncRNAs :: 3 Pseudo Genes (total) :: 61 CDSs (without protein) :: 61 Pseudo Genes (ambiguous residues) :: 0 of 61 Pseudo Genes (frameshifted) :: 44 of 61 Pseudo Genes (incomplete) :: 11 of 61 Pseudo Genes (internal stop) :: 18 of 61 Pseudo Genes (multiple problems) :: 12 of 61 ##Genome-Annotation-Data-END## FEATURES Qualifiers source /organism="Helicobacter pylori" /mol_type="genomic DNA" /strain="HpGP-TWN-021" /isolation_source="Biopsy" /host="Homo sapiens" /db_xref="taxon:210" /geo_loc_name="Taiwan" /lat_lon="23.30 N 121.00 E" /collected_by="Maria Camargo and Charles Rabkins" protein /locus_tag="E5P95_02980" /inference="COORDINATES: similar to AA sequence:RefSeq:WP_001919785.1" /note="Derived by automated computational analysis using gene prediction method: Protein Homology." /transl_table=11 BEGIN 1 MMDKNDKTDL KNKRLKNRSF KGVKKKIAKK YKIKNSSLTI YPLKTRSNFS ASFNKKIFLG 61 LGFVSALSAE DYNSSVYWLN SVNENNSNKS YYVSPLRTWA GGNRSFTQNY NNSKLYIGTK 121 NASATPNNSS VWFGEKGYIG FITGVFKARD IFITGAVGSG NEFKTGGGAI LVFESSNDLT 181 TDGAHFKNDK AGTQTSWINL ISNNSVNLTN TDFGNQTPNG GFNVMGREIT YNGGIVNGGN 241 FGFDNVDSNG TTTISGVTFN NNGALTYKGG NGIGGSITFT NSNINHYKLN LNANSVTFNN 301 SALGSMPNGS ANTVGNAYIL NASNITFNNL TFNGGWFVFM RPDSKIDFQG TTTINNPTSP 361 FVNMSAKVTI NPNAIFNIQN YTPTIGSTYT LFSMKNGSIT YNDANNLWNI IRLKNTQATK 421 DNSKNATSNN NTHTYYVTYN LGGTLYNFRQ IFSPDSIVLQ SVYYGANNIY YTNSVNIYDN 481 VFNLKNINDD RADAIFYLNG LNTWNYTNVR FSQTYGGKNS ALVFNATTPW ANGSIPKSNS 541 TVRFGGYEGV NWGKTGYITG TFTADRVYIT GNMMSGNGAQ TGGGATLNFV GATEINIAGA 601 TFKNLKTTSQ NSYMTFMALG DSSRSGKINV SQSDFYDWTG GGYDFTGNGA FDSVNFNKAY 661 YKFQGAKNSY TFKNTNFLAG NFKFQGKTTI EKSVLDDASY TFDGVNNAFN EDKFNGGSFS 721 FNAKQVDFSG NSFNGGVFDF NNTPKVSFTD DTFNVNNQFK INGAQTTFTF NKGVVFNMQG 781 LLNSLSVGTT YQLLNAKSVD YKNNNALYQM LHWTSGENPS GKLVDENKTA PSSAKIYNVQ 841 FIDNGLTYYI KENFNNGITL TRLCTLGYTH CVNINNDVFH LKNINNNASN TVFYLNGMTT 901 WKNAGTGVFT QDYSGANSVL VFNQTTPFLN GANPTSNSVV SFGKTSGAEW GLVGYIKGVF 961 KANQIDITGT IRSGNGAQTG GGATLVFNAQ KRLNIANASL NNDKAGLQDS WMNFIVNNGN 1021 LNATNANFSN QTPHGGFNLK ANNITWDKGS VNGGGNFGVD NANSNGTTTI SGVTFNNNGT 1081 LIYKGGENSA GNSLTLENNT FNSYNINAKV QNLIFNNNSF SGGSYSFNDT KNTTFKGTNT 1141 LINSDPFSRL QGSIAIDNNS IFNIERDLTD KTTYTLLSGN NIKYNNQALA DNAFSKNLWN 1201 LIHYGGERGT LLRTEKNTYF VQFTQSNGQK FVFEETFNSG SITYKYLTLN SSPFHTDADS 1261 KDIWSQVRKQ FDFIPGKTPV CVGVCYIAPY KNQDLIGSSA FAWSLNFGAT VVGTLLLGNA 1321 QEKANNNGGS IWFGKNNLLY LHGNFKATNI FLTNNFNVGN PNAGGGATIN FNADETLNAD 1381 GLNYTNFQTV AMGLQTSASQ HSWANFNSKF SMDIKNSNFR DFTWGGFNFN SGRITFENTT 1441 FSGWTNINGA TESGSSYVNM VANTDLIFTN SILGGGIRYD LKANNIIFNN SQMVIDVSKN 1501 VNQSSLNGNV TFNNSRLSIK PNAAINIGDS QTQTTLENAS SLSFYNNSVA NFNGTTAFNG 1561 VSYLNLNPNA QLSFNQANFN NANVTFYGIP LFGKTPDFGN SVRLINFKGN TNFNQATLNL 1621 RAKNIHINFQ GASTFENNST MNLAESSQAS FNTLIVEGET DFNLNGSSLL NFNGDSVFNA 1681 PVSFYANNSQ ISFTKLATFN ADASFDLGNN STLNFQSVLL NGTLNLLGNS ANALSVNASG 1741 NFSFGSKGVL NLSNVNLFDA KNKPLVYNIL QAQNIQGLMG NNGYEKIRFY GIQIDKADYS 1801 FNNGVYSWSF TNPLNTTETI TETLHNNRLK VQISQNGSSN NEMFNLAPSL YDYQKNPYDE 1861 SANSYNYTSG KAGTYYLTSN IKGFSQNNEI LGTYNAQNQP LQALHIYNQA ITKQDLSIIA 1921 NLGKEFLPKI ANLLSSGALD SLNLNSPNSF ETILGIFEKY GITLNQENWK SLLKIINGFS 1981 NTANYHFSQG NLVVGAIKEG QTNTNSVVWF GGDGYKEPCA VGNNTCQMFR QTNLGQLLHS 2041 TSPYLGYINA NFRAKNIYIT GTIGSGNAWG SGGSANVSFE SGTNLVLNQA NIDAQGTDKI 2101 FSYLGQGGIE KLFGEKGLGN ILSNIIYEES LNDNAIPKDL ASMIPKDFGY KTLSSLLSPT 2161 EVNNLLGVNA FKNAIMEILN SKTVGDVFGE NGLLNALDPI KRKEIDQMLL EQIQAHSSGF 2221 EKFIVKTLGI ENVENFINNW YGKQSLSSFA NNFVPGGLNQ ALDKIGSSSD AKDLQSFLDK 2281 TTFGDILNQM ISQAPLINKL ISWLGPQDLS VLVNIALNSI TNPSKELTST ISSIGEKVLN 2341 DLLGEGVVNK IMSNQVLGQM INKIIADKGF GGVYNQGLGS ILPKSLQKEL EQFGLGSLLG 2401 SRGLHNLWQK GNFNFLAKDY VFVNNSSFSN ATGGELNFVA GKSIIFNGKN TINFTQYQGR 2461 LSFISQDFSN ISLDTLNATN GLTLNAPRND ISVQKGQICV NVLNCMGEKK ANPSNTSAPT 2521 DETLEVNANN FAFLGTIKAN GLVDFSKVLQ NTTIGTLDLG SNATFKANNL IVNSAFNNNS 2581 NYRVNISGNF NVVKGATLGT NENGLNVGGD FKSEGPLIFN LNNPTHQTII NVTGASTIMS 2641 YNNQALINLN TQLKQGAYTL INAKRMVYGY DNQMILGGSL SDYLKLYTLI DFNGKRMQLN 2701 GDSLSYDNQP VNIKDGGLVV SFKDNQGQMV YSSILYDKVQ VTVSDKPINI QAPSLEYYIK 2761 YIQGSAGLNA IKSAGINSLM WLNALFVAKG GNPLFAPYYL QDNSTEHIVT LMKDITSALG 2821 MLSNSHLKNN STDVLQLNTY TQQMGRLAKL SNFASFDSTD FSERLSSLKN QRFADAIPNA 2881 MDVILKYSQR DKLKNNLWAT GVGGVSFVEN GTGTLYGINV GYDRFIKGVI VGGYAAYGYS 2941 GFYERITSSK SDNVDVGLYA RAFIKKSELT FSVNETWGAN KTQISSNDAL LSMINQSYQY 3001 STWTTNARVN YGYDFMFKNK SVIVKPQIGL RYYYIGMTGL DGVMNNALYN QFKANADPSK 3061 KSVLMIDFAF ENRHYFNKNS YFYAIGGIGR DLLVRSMGDK LVRFIGDNIL SYRKGELYNT 3121 FANITTGGEI RLFKSFYVNA GVGARFGLDY KMINITGNIG MRLAF //