LOCUS WQZ29404.1 2398 aa PRT BCT 29-DEC-2023 DEFINITION Helicobacter pylori vacuolating cytotoxin domain-containing protein protein. ACCESSION CP079244-875 PROTEIN_ID WQZ29404.1 SOURCE Helicobacter pylori ORGANISM Helicobacter pylori Bacteria; Campylobacterota; Epsilonproteobacteria; Campylobacterales; Helicobacteraceae; Helicobacter. REFERENCE 1 (bases 1 to 1570870) AUTHORS Thorell,K., Munoz-Ramirez,Z.Y., Wang,D., Sandoval-Motta,S., Boscolo Agostini,R., Ghirotto,S., Torres,R.C., Falush,D., Camargo,M.C. and Rabkin,C.S. CONSRTM HpGP Research Network TITLE The Helicobacter pylori Genome Project: insights into H. pylori population structure from analysis of a worldwide collection of complete genomes JOURNAL Nat Commun 14 (1), 8184 (2023) PUBMED 38081806 REMARK Publication Status: Online-Only REFERENCE 2 (bases 1 to 1570870) AUTHORS Camargo,M.C. and Rabkin,C.S. TITLE Direct Submission JOURNAL Submitted (15-JUL-2021) IIB, National Cancer Institute, 9609 Medical Center Dr., Rm. 6E110, Bethesda, MD 20892, USA COMMENT The annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). Information about PGAP can be found here: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ ##Genome-Assembly-Data-START## Assembly Method :: HGAP v. 4 Assembly Name :: HpGP-TWN-021 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 2661x Sequencing Technology :: PacBio Sequel II ##Genome-Assembly-Data-END## ##Genome-Annotation-Data-START## Annotation Provider :: NCBI Annotation Date :: 07/16/2021 08:12:09 Annotation Pipeline :: NCBI Prokaryotic Genome Annotation Pipeline (PGAP) Annotation Method :: Best-placed reference protein set; GeneMarkS-2+ Annotation Software revision :: 5.2 Features Annotated :: Gene; CDS; rRNA; tRNA; ncRNA; repeat_region Genes (total) :: 1,497 CDSs (total) :: 1,452 Genes (coding) :: 1,391 CDSs (with protein) :: 1,391 Genes (RNA) :: 45 rRNAs :: 2, 2, 2 (5S, 16S, 23S) complete rRNAs :: 2, 2, 2 (5S, 16S, 23S) tRNAs :: 36 ncRNAs :: 3 Pseudo Genes (total) :: 61 CDSs (without protein) :: 61 Pseudo Genes (ambiguous residues) :: 0 of 61 Pseudo Genes (frameshifted) :: 44 of 61 Pseudo Genes (incomplete) :: 11 of 61 Pseudo Genes (internal stop) :: 18 of 61 Pseudo Genes (multiple problems) :: 12 of 61 ##Genome-Annotation-Data-END## FEATURES Qualifiers source /organism="Helicobacter pylori" /mol_type="genomic DNA" /strain="HpGP-TWN-021" /isolation_source="Biopsy" /host="Homo sapiens" /db_xref="taxon:210" /geo_loc_name="Taiwan" /lat_lon="23.30 N 121.00 E" /collected_by="Maria Camargo and Charles Rabkins" protein /locus_tag="E5P95_04470" /inference="COORDINATES: similar to AA sequence:RefSeq:WP_000874657.1" /note="Derived by automated computational analysis using gene prediction method: Protein Homology." /transl_table=11 BEGIN 1 MAFKKAGLIS KFISKGSFKL NKISKKIFKL NLILKREKPL KRHKKTKSIK KPFNKNKSFL 61 KASVLLIGAL GGLSHLRASE CRYWSWSSWS YHDNIESGSN SPTHNSYCLF NSTQGSGTYY 121 LNTLTTYSPG GASFTQKFNN GTLDVGGNIR FGGMGVNGGN VGYITGTYDA QTINFNSSRI 181 TTGNSFSTGG GATLNFNATN RITINQASFN NGDAGTQHSY MNFSGSNINV ISSSFTDDTD 241 GGFSFSGNGT NSNLSFDKTS FNQGTYKFTN SANLNFNNSA FNQGTYNFNS AQSVFENSNF 301 NQGTYSFTDN TGLNFNNDTF SQGTYNFNTS KVSFSGANTL NSSSPFASLK GSVSFGSDAI 361 FNLNQTLNSN QTYDILTTNG TIQYGVYQSY LWHLINYKGD KAISHVEVGN NTYDVTFDIN 421 GQDETLQETF NNQSIITQFL GDDLQAKAQK TYQQDLSNSQ SALNNAADDN KIANSDTDYT 481 KSSNPTIKKD AQNLENTDQT IQQDKQALEK DLANVKQLAN APTGFNEQAF NQAQNKEQQD 541 EQTLQENEKT FSSEQEGLEK AIANAKPASP TPSPTPTPTK HTAPNTPPNK VPPTPPTQNL 601 PTTNVWNGVY NLQNQTYSQK GVYYIDPNLS GQSGQSGNTL STYTANLFGR SFGVNIQNGT 661 LIIGNNTESA NDNGLIWIGH GGFGYITGTF NATNIYLTNN FKTGEGVSGS DGGGANITFK 721 ASDNITIDGL NYNDAETVTK MIQTGASQHS YAAFDATNNI SVTNSSFSDM TWGKFSFSAK 781 NISFSNASFS GFTNPGGSSV ISANAANSLS FVNSRLNGGV VYNLWANSLI FNNTQAVFNV 841 LYSRGTSNFN ATTQLLGNTS FTLSSQSLLN FNGDTTLQDN ANITLGNKSQ ATFKNSLTLD 901 NNSDLSLDNQ SVLNANGASA FNNQASLNIY NGSQATFNSL FFNGGILSLN ASSKLNASSA 961 SFSNNTTINL DDSVLSANNT SSLNANINFQ GASQANFGGN TTIDTASFNF DSASSLSFNN 1021 LTANGALNFN GYAPSLTKAL MSVSGQFVLG NNGDINLSDI NIFDNITKSV TYNILNAQKG 1081 ITGISGANGY EKILFYGMKI QNATYSDNNN IQTWSFINPL NSSQIIQESI KNGDLTIEVL 1141 NNPNSASNTI FNIAPELYNY QASKQNPTGY SYDYSDNQAG TYYLTSSIKG LFTPKGSQTP 1201 QTPGTYSPFN QPLNSLNIYN KGFSSGNLKT LLGILSQNSA TLKEMIESNQ LDNTTSINEV 1261 LQLLDEIKIT PAQKQALLET INHLTDNINQ TFNNGNLVIG ATQDNVTNST SSIWFGGNGY 1321 SSPCALDSAT CSSFRNTYLG QLLGSTSPYL GYINADFKAK SIYITGTLGS ANAFESGGSA 1381 DVTFQSANNL VLNKANIEAQ ATDNIFNLLG QEGIDKIFNQ GNLANVLSQV AMEKIKQAGG 1441 LGNFVENALS PLSKELSASL QNETLGQLIG QNNLDNLLNN SGVMNAIQNI ISKKLSIFGN 1501 FVTPSIIENY LAKQSLKSML DDKGLLNFIG RYIDASELSS ILSVILKDIT NPPTSLQKDI 1561 GVVANDLLNE FLGQDVVKKL ESQGLVSNII NNIISQGGLS GVYNQGLGSV LPPSLQNALK 1621 ENDLGTLLSP RGLHDFWQKG YFNFLSNGYV FVNNSSFSNA TGGSLNFVTN KSIIFNGDNT 1681 IDFSKYQGAL IFASNGVSNI NITTLNATNG LSLNAGLNNV SVQKGEICVN LANCPTTKNS 1741 SSTNSSVTPT NESLSVRANN FTFLGTIASN GAIDLSQVKN NSVIGTLNLN ENATLQANNL 1801 TITNAFNNAS NSTANINGNF TLNQQATLST NASGLNVMGN FNSYGDLVFN LSHSVSHAII 1861 NAQGTATIMA NNNNPLIQFN TSSKEIGTYT LIDSAKAIYY GYNDQITGGS SLADYLKLYT 1921 LIDINGKHMV MTDNGLTYNG QAVNIKDGGL IVGFKDSQNQ YIYTSILYNK VKIAVSNDPI 1981 NNLQAPTLKQ YIAQIQGTQG VDSIDQAGGT QAVNWLNKIF ETKGSPLFAP YYLESHSTKD 2041 LTTIAGDIAN TLEVIANPDF KNDATNILQI NTYTQQMSRL AKLSDTSTFA SADFHERLEA 2101 LKNKRFADAI PNAMDVILKY SQRNRVKNNV WSTGVGGASF INGGTGTLYG INVGYDRFIK 2161 GVIVGGYAAY GYSGFHANIT QSGSSNVNIG VYSRAFIKRS ELTMSLNETW GYNKTFINSY 2221 DPLLSIINQS YKYNTWTTDA KINYGYDFMF KDKSVIFKPQ IGLAYYYIGL SDLRGIMDDP 2281 IYNQFRANAD PNKKSVLTIN FALESRHYFN KNSYYFVIAD VGRDLFINSM GDKMVRFIGN 2341 NTLSYRDGDR YNTFASIITG GEIRLFKTFY VNAGIGARFG LDYKDINITG NIGMRYAF //