LOCUS WQZ30391.1 1796 aa PRT BCT 29-DEC-2023 DEFINITION Helicobacter pylori type IV secretion system apparatus protein CagY protein. ACCESSION CP079244-501 PROTEIN_ID WQZ30391.1 SOURCE Helicobacter pylori ORGANISM Helicobacter pylori Bacteria; Campylobacterota; Epsilonproteobacteria; Campylobacterales; Helicobacteraceae; Helicobacter. REFERENCE 1 (bases 1 to 1570870) AUTHORS Thorell,K., Munoz-Ramirez,Z.Y., Wang,D., Sandoval-Motta,S., Boscolo Agostini,R., Ghirotto,S., Torres,R.C., Falush,D., Camargo,M.C. and Rabkin,C.S. CONSRTM HpGP Research Network TITLE The Helicobacter pylori Genome Project: insights into H. pylori population structure from analysis of a worldwide collection of complete genomes JOURNAL Nat Commun 14 (1), 8184 (2023) PUBMED 38081806 REMARK Publication Status: Online-Only REFERENCE 2 (bases 1 to 1570870) AUTHORS Camargo,M.C. and Rabkin,C.S. TITLE Direct Submission JOURNAL Submitted (15-JUL-2021) IIB, National Cancer Institute, 9609 Medical Center Dr., Rm. 6E110, Bethesda, MD 20892, USA COMMENT The annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). Information about PGAP can be found here: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ ##Genome-Assembly-Data-START## Assembly Method :: HGAP v. 4 Assembly Name :: HpGP-TWN-021 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 2661x Sequencing Technology :: PacBio Sequel II ##Genome-Assembly-Data-END## ##Genome-Annotation-Data-START## Annotation Provider :: NCBI Annotation Date :: 07/16/2021 08:12:09 Annotation Pipeline :: NCBI Prokaryotic Genome Annotation Pipeline (PGAP) Annotation Method :: Best-placed reference protein set; GeneMarkS-2+ Annotation Software revision :: 5.2 Features Annotated :: Gene; CDS; rRNA; tRNA; ncRNA; repeat_region Genes (total) :: 1,497 CDSs (total) :: 1,452 Genes (coding) :: 1,391 CDSs (with protein) :: 1,391 Genes (RNA) :: 45 rRNAs :: 2, 2, 2 (5S, 16S, 23S) complete rRNAs :: 2, 2, 2 (5S, 16S, 23S) tRNAs :: 36 ncRNAs :: 3 Pseudo Genes (total) :: 61 CDSs (without protein) :: 61 Pseudo Genes (ambiguous residues) :: 0 of 61 Pseudo Genes (frameshifted) :: 44 of 61 Pseudo Genes (incomplete) :: 11 of 61 Pseudo Genes (internal stop) :: 18 of 61 Pseudo Genes (multiple problems) :: 12 of 61 ##Genome-Annotation-Data-END## FEATURES Qualifiers source /organism="Helicobacter pylori" /mol_type="genomic DNA" /strain="HpGP-TWN-021" /isolation_source="Biopsy" /host="Homo sapiens" /db_xref="taxon:210" /geo_loc_name="Taiwan" /lat_lon="23.30 N 121.00 E" /collected_by="Maria Camargo and Charles Rabkins" protein /gene="cagY" /locus_tag="E5P95_02570" /inference="COORDINATES: similar to AA sequence:RefSeq:WP_015642692.1" /note="Derived by automated computational analysis using gene prediction method: Protein Homology." /transl_table=11 BEGIN 1 MNEENDKFET SKKTQQHSPQ DLSNEEATEA NHFEDSSKES KESSDHHLDN PTETKTNFDE 61 YESEETQTQM DSGGNETSES SNGSLADKLF KKARKLVDNK KPFTQQKNLD EEIQEPNEED 121 DQENNGYQEE TQTDLIDDET SKKTQQHSPQ DLSNEEITEA NHFEDSSKES KESSDHLDNP 181 TETKTNFDEY ESEEITNDSN DQEIIKGSKK KYIIGGIVVA VLIVIILFSR SIFHYFIPLE 241 DKSSRFSKDR NLYVNDEIQI RQEYNRLLKE RNEKGNMIDK NLFFNDDPNR TLYNYLNIAE 301 IEDKNPLRAF YECISNGGNY EECLKLIKDK KLQDQMKKTL EAYNDCIKNA KTEEERIKCL 361 DLIKDENLKK SLLNQQKVQV ALDCLKNAKT DEERNECLKL INDPEIREKF RKELGLQKEL 421 QEYKDCIKNA KTEAEKNECL KGLSKEAIER LKQQALDCLK NAKTDEERNE CLKNIPQDLQ 481 KELLADMSVK AYKDCVSKAR NEKEKKECEK LLTPEAKKKL EQQVLDCLKN AKTDEERKKC 541 LKDLPKDLQS DILAKESLKA YKDCVSQAKT EDEKKECEKL LTPEAKKLLE EEAKESVKAY 601 LDCVSQAKTE AEKKECEKLL TPEAKKKLEE AKKSVKAYLD CVSQAKNEAE KKECEKLLTP 661 EAKKLLEQQA LDCLKSAKTD EERKKCLKDL PKDLQKKVLA KESVKAYLDC VSQAKTEAEK 721 KECEKLLTPE AKKLLEEAKE SLKAYKDCVS RARNEKEKKE CEKLLTPEAK KLLEEEAKES 781 VKAYLDCVSQ AKTEAEKKEC EKLLTPEAKK KLEEAKKSVK AYLDCVSQAK TEAEKKECEK 841 LLTPEAKKLL EQQALDCLKS AKTEAEKKRC VKDLPKDLQK KVLAKESLKA YKDCVSRARN 901 EKEKKECEKL LTPEAKKLLE EAKESLKAYK DCVSRARNEK EKKECEKLLT PEAKKLLEEA 961 KESLKAYKDC VSRARNEKEK QECEKLLTPE AKNLLEQQAL DCLKNAKTES EKKRCVKDLP 1021 KDLQKKVLAK ESVKAYLDCV SRARNEKEKK ECEKLLTPEA KKLLEEAKES LKAYKDCLSQ 1081 ARNEEERRAC EKLLTPEARK LLEQEVKKSV KAYLDCVSRA RNEKEKQECE KLLTPEARKF 1141 LAKQVLSCLE KARNEEERKA CLKNIPKDLQ KNVLAKESLK AYKDCLSQAR NEEERRACEK 1201 LLTPEARKLL EQEVKKSVKA YLDCVSRARN EKEKQECEKL LTPEARKFLA KELQQKDKAI 1261 KDCLKNADPN DRAAIMKCLD GLSDEEKLKY LQEAREKAVL DCLKTARTDE EKRKCQNLYS 1321 DLIQEIQNKR TQSKQNQLSK TERLHQASEC LDNLDDPTDQ EAIEQCLEGL SDSERALILG 1381 IKRQADEVDL IYSDLRNRKT FDNMAAKGYP LLPMDFKNGG DIATINATNV DADKIASDNP 1441 IYASIEPDIT KQYETEKTIK DKNLEAKLAK ALGGNKKDDD KEKSKKSTAE ARVESNKIDK 1501 DVAETAKNIS EIALKNKKEK SGEFVDENGN PIDDKKKTET QDETSPVKQA FIGKSDPTFV 1561 LAQYTPIEIT LTSKVDATLT GIVSGVVAKD VWNMNGTMIL LDKGTKVYGN YQSVKGGTPI 1621 MTRLMIVFTK AITPDGVIIP LANAQAAGML GEAGVDGYVN NHFMKRIGFA VIASVVNSFL 1681 QTAPIIALDK LIGLGKGRSE RTPEFNYALG QAINGSMQSS AQMSNQILGQ LMNIPPSFYK 1741 NEGDSIKILT MDDIDFSGVY DVKITNKSVV DEIIKQSTKT LSREHEEITT SPKGGN //