LOCUS WQZ30623.1 3165 aa PRT BCT 29-DEC-2023
DEFINITION Helicobacter pylori vacuolating cytotoxin domain-containing
protein protein.
ACCESSION CP079244-582
PROTEIN_ID WQZ30623.1
SOURCE Helicobacter pylori
ORGANISM Helicobacter pylori
Bacteria; Campylobacterota; Epsilonproteobacteria;
Campylobacterales; Helicobacteraceae; Helicobacter.
REFERENCE 1 (bases 1 to 1570870)
AUTHORS Thorell,K., Munoz-Ramirez,Z.Y., Wang,D., Sandoval-Motta,S., Boscolo
Agostini,R., Ghirotto,S., Torres,R.C., Falush,D., Camargo,M.C. and
Rabkin,C.S.
CONSRTM HpGP Research Network
TITLE The Helicobacter pylori Genome Project: insights into H. pylori
population structure from analysis of a worldwide collection of
complete genomes
JOURNAL Nat Commun 14 (1), 8184 (2023)
PUBMED 38081806
REMARK Publication Status: Online-Only
REFERENCE 2 (bases 1 to 1570870)
AUTHORS Camargo,M.C. and Rabkin,C.S.
TITLE Direct Submission
JOURNAL Submitted (15-JUL-2021) IIB, National Cancer Institute, 9609
Medical Center Dr., Rm. 6E110, Bethesda, MD 20892, USA
COMMENT The annotation was added by the NCBI Prokaryotic Genome Annotation
Pipeline (PGAP). Information about PGAP can be found here:
https://www.ncbi.nlm.nih.gov/genome/annotation_prok/
##Genome-Assembly-Data-START##
Assembly Method :: HGAP v. 4
Assembly Name :: HpGP-TWN-021
Genome Representation :: Full
Expected Final Version :: Yes
Genome Coverage :: 2661x
Sequencing Technology :: PacBio Sequel II
##Genome-Assembly-Data-END##
##Genome-Annotation-Data-START##
Annotation Provider :: NCBI
Annotation Date :: 07/16/2021 08:12:09
Annotation Pipeline :: NCBI Prokaryotic Genome
Annotation Pipeline (PGAP)
Annotation Method :: Best-placed reference protein
set; GeneMarkS-2+
Annotation Software revision :: 5.2
Features Annotated :: Gene; CDS; rRNA; tRNA; ncRNA;
repeat_region
Genes (total) :: 1,497
CDSs (total) :: 1,452
Genes (coding) :: 1,391
CDSs (with protein) :: 1,391
Genes (RNA) :: 45
rRNAs :: 2, 2, 2 (5S, 16S, 23S)
complete rRNAs :: 2, 2, 2 (5S, 16S, 23S)
tRNAs :: 36
ncRNAs :: 3
Pseudo Genes (total) :: 61
CDSs (without protein) :: 61
Pseudo Genes (ambiguous residues) :: 0 of 61
Pseudo Genes (frameshifted) :: 44 of 61
Pseudo Genes (incomplete) :: 11 of 61
Pseudo Genes (internal stop) :: 18 of 61
Pseudo Genes (multiple problems) :: 12 of 61
##Genome-Annotation-Data-END##
FEATURES Qualifiers
source /organism="Helicobacter pylori"
/mol_type="genomic DNA"
/strain="HpGP-TWN-021"
/isolation_source="Biopsy"
/host="Homo sapiens"
/db_xref="taxon:210"
/geo_loc_name="Taiwan"
/lat_lon="23.30 N 121.00 E"
/collected_by="Maria Camargo and Charles Rabkins"
protein /locus_tag="E5P95_02980"
/inference="COORDINATES: similar to AA
sequence:RefSeq:WP_001919785.1"
/note="Derived by automated computational analysis using
gene prediction method: Protein Homology."
/transl_table=11
BEGIN
1 MMDKNDKTDL KNKRLKNRSF KGVKKKIAKK YKIKNSSLTI YPLKTRSNFS ASFNKKIFLG
61 LGFVSALSAE DYNSSVYWLN SVNENNSNKS YYVSPLRTWA GGNRSFTQNY NNSKLYIGTK
121 NASATPNNSS VWFGEKGYIG FITGVFKARD IFITGAVGSG NEFKTGGGAI LVFESSNDLT
181 TDGAHFKNDK AGTQTSWINL ISNNSVNLTN TDFGNQTPNG GFNVMGREIT YNGGIVNGGN
241 FGFDNVDSNG TTTISGVTFN NNGALTYKGG NGIGGSITFT NSNINHYKLN LNANSVTFNN
301 SALGSMPNGS ANTVGNAYIL NASNITFNNL TFNGGWFVFM RPDSKIDFQG TTTINNPTSP
361 FVNMSAKVTI NPNAIFNIQN YTPTIGSTYT LFSMKNGSIT YNDANNLWNI IRLKNTQATK
421 DNSKNATSNN NTHTYYVTYN LGGTLYNFRQ IFSPDSIVLQ SVYYGANNIY YTNSVNIYDN
481 VFNLKNINDD RADAIFYLNG LNTWNYTNVR FSQTYGGKNS ALVFNATTPW ANGSIPKSNS
541 TVRFGGYEGV NWGKTGYITG TFTADRVYIT GNMMSGNGAQ TGGGATLNFV GATEINIAGA
601 TFKNLKTTSQ NSYMTFMALG DSSRSGKINV SQSDFYDWTG GGYDFTGNGA FDSVNFNKAY
661 YKFQGAKNSY TFKNTNFLAG NFKFQGKTTI EKSVLDDASY TFDGVNNAFN EDKFNGGSFS
721 FNAKQVDFSG NSFNGGVFDF NNTPKVSFTD DTFNVNNQFK INGAQTTFTF NKGVVFNMQG
781 LLNSLSVGTT YQLLNAKSVD YKNNNALYQM LHWTSGENPS GKLVDENKTA PSSAKIYNVQ
841 FIDNGLTYYI KENFNNGITL TRLCTLGYTH CVNINNDVFH LKNINNNASN TVFYLNGMTT
901 WKNAGTGVFT QDYSGANSVL VFNQTTPFLN GANPTSNSVV SFGKTSGAEW GLVGYIKGVF
961 KANQIDITGT IRSGNGAQTG GGATLVFNAQ KRLNIANASL NNDKAGLQDS WMNFIVNNGN
1021 LNATNANFSN QTPHGGFNLK ANNITWDKGS VNGGGNFGVD NANSNGTTTI SGVTFNNNGT
1081 LIYKGGENSA GNSLTLENNT FNSYNINAKV QNLIFNNNSF SGGSYSFNDT KNTTFKGTNT
1141 LINSDPFSRL QGSIAIDNNS IFNIERDLTD KTTYTLLSGN NIKYNNQALA DNAFSKNLWN
1201 LIHYGGERGT LLRTEKNTYF VQFTQSNGQK FVFEETFNSG SITYKYLTLN SSPFHTDADS
1261 KDIWSQVRKQ FDFIPGKTPV CVGVCYIAPY KNQDLIGSSA FAWSLNFGAT VVGTLLLGNA
1321 QEKANNNGGS IWFGKNNLLY LHGNFKATNI FLTNNFNVGN PNAGGGATIN FNADETLNAD
1381 GLNYTNFQTV AMGLQTSASQ HSWANFNSKF SMDIKNSNFR DFTWGGFNFN SGRITFENTT
1441 FSGWTNINGA TESGSSYVNM VANTDLIFTN SILGGGIRYD LKANNIIFNN SQMVIDVSKN
1501 VNQSSLNGNV TFNNSRLSIK PNAAINIGDS QTQTTLENAS SLSFYNNSVA NFNGTTAFNG
1561 VSYLNLNPNA QLSFNQANFN NANVTFYGIP LFGKTPDFGN SVRLINFKGN TNFNQATLNL
1621 RAKNIHINFQ GASTFENNST MNLAESSQAS FNTLIVEGET DFNLNGSSLL NFNGDSVFNA
1681 PVSFYANNSQ ISFTKLATFN ADASFDLGNN STLNFQSVLL NGTLNLLGNS ANALSVNASG
1741 NFSFGSKGVL NLSNVNLFDA KNKPLVYNIL QAQNIQGLMG NNGYEKIRFY GIQIDKADYS
1801 FNNGVYSWSF TNPLNTTETI TETLHNNRLK VQISQNGSSN NEMFNLAPSL YDYQKNPYDE
1861 SANSYNYTSG KAGTYYLTSN IKGFSQNNEI LGTYNAQNQP LQALHIYNQA ITKQDLSIIA
1921 NLGKEFLPKI ANLLSSGALD SLNLNSPNSF ETILGIFEKY GITLNQENWK SLLKIINGFS
1981 NTANYHFSQG NLVVGAIKEG QTNTNSVVWF GGDGYKEPCA VGNNTCQMFR QTNLGQLLHS
2041 TSPYLGYINA NFRAKNIYIT GTIGSGNAWG SGGSANVSFE SGTNLVLNQA NIDAQGTDKI
2101 FSYLGQGGIE KLFGEKGLGN ILSNIIYEES LNDNAIPKDL ASMIPKDFGY KTLSSLLSPT
2161 EVNNLLGVNA FKNAIMEILN SKTVGDVFGE NGLLNALDPI KRKEIDQMLL EQIQAHSSGF
2221 EKFIVKTLGI ENVENFINNW YGKQSLSSFA NNFVPGGLNQ ALDKIGSSSD AKDLQSFLDK
2281 TTFGDILNQM ISQAPLINKL ISWLGPQDLS VLVNIALNSI TNPSKELTST ISSIGEKVLN
2341 DLLGEGVVNK IMSNQVLGQM INKIIADKGF GGVYNQGLGS ILPKSLQKEL EQFGLGSLLG
2401 SRGLHNLWQK GNFNFLAKDY VFVNNSSFSN ATGGELNFVA GKSIIFNGKN TINFTQYQGR
2461 LSFISQDFSN ISLDTLNATN GLTLNAPRND ISVQKGQICV NVLNCMGEKK ANPSNTSAPT
2521 DETLEVNANN FAFLGTIKAN GLVDFSKVLQ NTTIGTLDLG SNATFKANNL IVNSAFNNNS
2581 NYRVNISGNF NVVKGATLGT NENGLNVGGD FKSEGPLIFN LNNPTHQTII NVTGASTIMS
2641 YNNQALINLN TQLKQGAYTL INAKRMVYGY DNQMILGGSL SDYLKLYTLI DFNGKRMQLN
2701 GDSLSYDNQP VNIKDGGLVV SFKDNQGQMV YSSILYDKVQ VTVSDKPINI QAPSLEYYIK
2761 YIQGSAGLNA IKSAGINSLM WLNALFVAKG GNPLFAPYYL QDNSTEHIVT LMKDITSALG
2821 MLSNSHLKNN STDVLQLNTY TQQMGRLAKL SNFASFDSTD FSERLSSLKN QRFADAIPNA
2881 MDVILKYSQR DKLKNNLWAT GVGGVSFVEN GTGTLYGINV GYDRFIKGVI VGGYAAYGYS
2941 GFYERITSSK SDNVDVGLYA RAFIKKSELT FSVNETWGAN KTQISSNDAL LSMINQSYQY
3001 STWTTNARVN YGYDFMFKNK SVIVKPQIGL RYYYIGMTGL DGVMNNALYN QFKANADPSK
3061 KSVLMIDFAF ENRHYFNKNS YFYAIGGIGR DLLVRSMGDK LVRFIGDNIL SYRKGELYNT
3121 FANITTGGEI RLFKSFYVNA GVGARFGLDY KMINITGNIG MRLAF
//