LOCUS WQZ29404.1 2398 aa PRT BCT 29-DEC-2023
DEFINITION Helicobacter pylori vacuolating cytotoxin domain-containing
protein protein.
ACCESSION CP079244-875
PROTEIN_ID WQZ29404.1
SOURCE Helicobacter pylori
ORGANISM Helicobacter pylori
Bacteria; Campylobacterota; Epsilonproteobacteria;
Campylobacterales; Helicobacteraceae; Helicobacter.
REFERENCE 1 (bases 1 to 1570870)
AUTHORS Thorell,K., Munoz-Ramirez,Z.Y., Wang,D., Sandoval-Motta,S., Boscolo
Agostini,R., Ghirotto,S., Torres,R.C., Falush,D., Camargo,M.C. and
Rabkin,C.S.
CONSRTM HpGP Research Network
TITLE The Helicobacter pylori Genome Project: insights into H. pylori
population structure from analysis of a worldwide collection of
complete genomes
JOURNAL Nat Commun 14 (1), 8184 (2023)
PUBMED 38081806
REMARK Publication Status: Online-Only
REFERENCE 2 (bases 1 to 1570870)
AUTHORS Camargo,M.C. and Rabkin,C.S.
TITLE Direct Submission
JOURNAL Submitted (15-JUL-2021) IIB, National Cancer Institute, 9609
Medical Center Dr., Rm. 6E110, Bethesda, MD 20892, USA
COMMENT The annotation was added by the NCBI Prokaryotic Genome Annotation
Pipeline (PGAP). Information about PGAP can be found here:
https://www.ncbi.nlm.nih.gov/genome/annotation_prok/
##Genome-Assembly-Data-START##
Assembly Method :: HGAP v. 4
Assembly Name :: HpGP-TWN-021
Genome Representation :: Full
Expected Final Version :: Yes
Genome Coverage :: 2661x
Sequencing Technology :: PacBio Sequel II
##Genome-Assembly-Data-END##
##Genome-Annotation-Data-START##
Annotation Provider :: NCBI
Annotation Date :: 07/16/2021 08:12:09
Annotation Pipeline :: NCBI Prokaryotic Genome
Annotation Pipeline (PGAP)
Annotation Method :: Best-placed reference protein
set; GeneMarkS-2+
Annotation Software revision :: 5.2
Features Annotated :: Gene; CDS; rRNA; tRNA; ncRNA;
repeat_region
Genes (total) :: 1,497
CDSs (total) :: 1,452
Genes (coding) :: 1,391
CDSs (with protein) :: 1,391
Genes (RNA) :: 45
rRNAs :: 2, 2, 2 (5S, 16S, 23S)
complete rRNAs :: 2, 2, 2 (5S, 16S, 23S)
tRNAs :: 36
ncRNAs :: 3
Pseudo Genes (total) :: 61
CDSs (without protein) :: 61
Pseudo Genes (ambiguous residues) :: 0 of 61
Pseudo Genes (frameshifted) :: 44 of 61
Pseudo Genes (incomplete) :: 11 of 61
Pseudo Genes (internal stop) :: 18 of 61
Pseudo Genes (multiple problems) :: 12 of 61
##Genome-Annotation-Data-END##
FEATURES Qualifiers
source /organism="Helicobacter pylori"
/mol_type="genomic DNA"
/strain="HpGP-TWN-021"
/isolation_source="Biopsy"
/host="Homo sapiens"
/db_xref="taxon:210"
/geo_loc_name="Taiwan"
/lat_lon="23.30 N 121.00 E"
/collected_by="Maria Camargo and Charles Rabkins"
protein /locus_tag="E5P95_04470"
/inference="COORDINATES: similar to AA
sequence:RefSeq:WP_000874657.1"
/note="Derived by automated computational analysis using
gene prediction method: Protein Homology."
/transl_table=11
BEGIN
1 MAFKKAGLIS KFISKGSFKL NKISKKIFKL NLILKREKPL KRHKKTKSIK KPFNKNKSFL
61 KASVLLIGAL GGLSHLRASE CRYWSWSSWS YHDNIESGSN SPTHNSYCLF NSTQGSGTYY
121 LNTLTTYSPG GASFTQKFNN GTLDVGGNIR FGGMGVNGGN VGYITGTYDA QTINFNSSRI
181 TTGNSFSTGG GATLNFNATN RITINQASFN NGDAGTQHSY MNFSGSNINV ISSSFTDDTD
241 GGFSFSGNGT NSNLSFDKTS FNQGTYKFTN SANLNFNNSA FNQGTYNFNS AQSVFENSNF
301 NQGTYSFTDN TGLNFNNDTF SQGTYNFNTS KVSFSGANTL NSSSPFASLK GSVSFGSDAI
361 FNLNQTLNSN QTYDILTTNG TIQYGVYQSY LWHLINYKGD KAISHVEVGN NTYDVTFDIN
421 GQDETLQETF NNQSIITQFL GDDLQAKAQK TYQQDLSNSQ SALNNAADDN KIANSDTDYT
481 KSSNPTIKKD AQNLENTDQT IQQDKQALEK DLANVKQLAN APTGFNEQAF NQAQNKEQQD
541 EQTLQENEKT FSSEQEGLEK AIANAKPASP TPSPTPTPTK HTAPNTPPNK VPPTPPTQNL
601 PTTNVWNGVY NLQNQTYSQK GVYYIDPNLS GQSGQSGNTL STYTANLFGR SFGVNIQNGT
661 LIIGNNTESA NDNGLIWIGH GGFGYITGTF NATNIYLTNN FKTGEGVSGS DGGGANITFK
721 ASDNITIDGL NYNDAETVTK MIQTGASQHS YAAFDATNNI SVTNSSFSDM TWGKFSFSAK
781 NISFSNASFS GFTNPGGSSV ISANAANSLS FVNSRLNGGV VYNLWANSLI FNNTQAVFNV
841 LYSRGTSNFN ATTQLLGNTS FTLSSQSLLN FNGDTTLQDN ANITLGNKSQ ATFKNSLTLD
901 NNSDLSLDNQ SVLNANGASA FNNQASLNIY NGSQATFNSL FFNGGILSLN ASSKLNASSA
961 SFSNNTTINL DDSVLSANNT SSLNANINFQ GASQANFGGN TTIDTASFNF DSASSLSFNN
1021 LTANGALNFN GYAPSLTKAL MSVSGQFVLG NNGDINLSDI NIFDNITKSV TYNILNAQKG
1081 ITGISGANGY EKILFYGMKI QNATYSDNNN IQTWSFINPL NSSQIIQESI KNGDLTIEVL
1141 NNPNSASNTI FNIAPELYNY QASKQNPTGY SYDYSDNQAG TYYLTSSIKG LFTPKGSQTP
1201 QTPGTYSPFN QPLNSLNIYN KGFSSGNLKT LLGILSQNSA TLKEMIESNQ LDNTTSINEV
1261 LQLLDEIKIT PAQKQALLET INHLTDNINQ TFNNGNLVIG ATQDNVTNST SSIWFGGNGY
1321 SSPCALDSAT CSSFRNTYLG QLLGSTSPYL GYINADFKAK SIYITGTLGS ANAFESGGSA
1381 DVTFQSANNL VLNKANIEAQ ATDNIFNLLG QEGIDKIFNQ GNLANVLSQV AMEKIKQAGG
1441 LGNFVENALS PLSKELSASL QNETLGQLIG QNNLDNLLNN SGVMNAIQNI ISKKLSIFGN
1501 FVTPSIIENY LAKQSLKSML DDKGLLNFIG RYIDASELSS ILSVILKDIT NPPTSLQKDI
1561 GVVANDLLNE FLGQDVVKKL ESQGLVSNII NNIISQGGLS GVYNQGLGSV LPPSLQNALK
1621 ENDLGTLLSP RGLHDFWQKG YFNFLSNGYV FVNNSSFSNA TGGSLNFVTN KSIIFNGDNT
1681 IDFSKYQGAL IFASNGVSNI NITTLNATNG LSLNAGLNNV SVQKGEICVN LANCPTTKNS
1741 SSTNSSVTPT NESLSVRANN FTFLGTIASN GAIDLSQVKN NSVIGTLNLN ENATLQANNL
1801 TITNAFNNAS NSTANINGNF TLNQQATLST NASGLNVMGN FNSYGDLVFN LSHSVSHAII
1861 NAQGTATIMA NNNNPLIQFN TSSKEIGTYT LIDSAKAIYY GYNDQITGGS SLADYLKLYT
1921 LIDINGKHMV MTDNGLTYNG QAVNIKDGGL IVGFKDSQNQ YIYTSILYNK VKIAVSNDPI
1981 NNLQAPTLKQ YIAQIQGTQG VDSIDQAGGT QAVNWLNKIF ETKGSPLFAP YYLESHSTKD
2041 LTTIAGDIAN TLEVIANPDF KNDATNILQI NTYTQQMSRL AKLSDTSTFA SADFHERLEA
2101 LKNKRFADAI PNAMDVILKY SQRNRVKNNV WSTGVGGASF INGGTGTLYG INVGYDRFIK
2161 GVIVGGYAAY GYSGFHANIT QSGSSNVNIG VYSRAFIKRS ELTMSLNETW GYNKTFINSY
2221 DPLLSIINQS YKYNTWTTDA KINYGYDFMF KDKSVIFKPQ IGLAYYYIGL SDLRGIMDDP
2281 IYNQFRANAD PNKKSVLTIN FALESRHYFN KNSYYFVIAD VGRDLFINSM GDKMVRFIGN
2341 NTLSYRDGDR YNTFASIITG GEIRLFKTFY VNAGIGARFG LDYKDINITG NIGMRYAF
//