LOCUS OON23567.1 493 aa PRT CON 24-FEB-2017 DEFINITION Opisthorchis viverrini hypothetical protein protein. ACCESSION KV891522-9 PROTEIN_ID OON23567.1 SOURCE Opisthorchis viverrini ORGANISM Opisthorchis viverrini Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Platyhelminthes; Trematoda; Digenea; Opisthorchiida; Opisthorchiata; Opisthorchiidae; Opisthorchis. REFERENCE 1 (bases 1 to 368823) AUTHORS Mitreva,M. TITLE Draft genome of the nematode, Opisthorchis viverrini JOURNAL Unpublished REFERENCE 2 (bases 1 to 368823) AUTHORS Mitreva,M., Pepin,K.H., Martin,J., Ozersky,P., Palsikar,V.B., Zhang,X. and Wilson,R.K. TITLE Direct Submission JOURNAL Submitted (31-MAR-2015) The Genome Institute, Washington University School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA COMMENT The human liver flukes, Opisthorchis viverrini, O. felineus and C. sinensis remain important public health problems in many parts of the world. Clonorchis sinensis is widespread in China, Korea and Vietnam, while O. viverrini is endemic in Southeast Asia, including Thailand, Lao Peoples Democratic Republic (Lao PDR), Cambodia and central Vietnam. Human infection follows the consumption of raw or undercooked cyprinoid (freshwater) fish haboring infective metacercariae. Recent reports suggested that about 35 million people are infected with C. sinensis globally; with up to 15 million human infections in China alone and another 8-10 million individuals infected with O. viverrini in Thailand and Lao PDR. More than 600 million people, mainly in Asia, are at risk of infection with these two liver flukes (Petney et al 2013 Int J Parasitol 43, 1031-46). The infections are associated with hepatobiliary diseases including hepatomegaly, cholangitis, fibrosis of the periportal system, cholecystitis, gallstones and are major aetiological agents of bile duct cancer, cholangiocarcinoma (CCA). O. viverrini and C. sinensis are classified as Group 1 carcinogens metazoan parasites that are carcinogenic to humans by the the International Agency for Research on Cancer, World Health Organization (WHO) (Bouvard et al., 2009 Lancet Oncol 10, 321-2). Therefore, not only do these liver flukes cause pathogenic helminth infections, they also are carcinogenic in humans in similar fashion to several other more well known biological carcinogens, in particular hepatitis viruses, human papilloma virus and Helicobacter pylori. The liver fluke endemic area of Khon Kaen, Northeast Thailand has reported the highest incidence of liver cancer in the world (Shin et al 2010 Cancer Sci. 101, 579-85; Sripa et al. 2014 Acta Tropica 141, 361-367). In regard to socioeconomic impact, it was estimated 20 years ago that the total direct cost of O. viverrini infection to the work force (between the age of 15 and 60 years) in Northeast Thailand was US$ 80 million per annum. More recently, it has been reported that liver and bile duct cancer, the end-stage consequence of liver fluke disease, ranks number five in Thai males among all diseases with highest number of disability-adjusted life years (DALYs) (see Sripa et al., 2012 Trends Parasitol 28, 395-407). The liver flukes used here as a source of genomic DNA for the WGS were obtained by collecting the metacerciae of O. viverrini from wild caught fresh water fishes in the vicinity of Khon Kaen City, Khon Kaen province, Thailand. Hamsters were experimentally infected with these metacercariae at Khon Kaen University, after which adult O. viverrini worms were recovered from the biliary tract of the hamsters at euthanasia six weeks after infection. [The outbred, male Syrian (golden) hamsters (Mesocricetus auratas) were reared at the animal facilities of the Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand. Protocols for the experiments were approved by the Animal Ethics Committee of Khon Kaen University, approval number AEKKU25/2554, according to the Ethics of Animal Experimentation of the National Research Council of Thailand.] Genomic DNAs were recovered from pools of 10 to 20 adult (hermaphroditc) flukes and provided by Dr. Banchob Sripa (banchob@kku.ac.th) and Dr. Paul Brindley (pbrindley@email.gwu.edu). This assembly consists of fragments, 3kb and 8kb insert whole genome shotgun libraries. The sequences were generating on the Illumina platform. An initial assembly was generated using Allpaths_LG. To improve scaffolding and contiguity, we used our in house tool Pygap (Gap closure tool), which uses the Pyramid assembler with Illumina paired reads to close gaps and extending contigs. An alternate assembly was generated by first using flash (fast length adjustment of short reads Bioinformatics 27:21 (2011), 2957-63) to merge the Illumina fragments. These reads were then fed to the Newbler assembler (Roche). Newbler contigs > 500 bases were used to fill gaps in the allpaths scaffolds using PBJelly (PLoS ONE 7(11): e47768. doi:10.1371/journal.pone.0047768). The final step was using L_RNA_scaffolder (BMC Genomics 2013, 14:604), which uses transcript alignments, to improve contiguity. The repeat library was generated using Repeatmodeler (A. Smit, R. Hubley http://www.systemsbiology.org/). The Ribosomal RNA genes were identified using RNAmmer ((http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer ) and transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy, 1997). Non-coding RNAs, such as microRNAs, were identified by sequence homology search of the Rfam database (http://selab.janelia.org/software.html). Repeats and predicted RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P. Green http://repeatmasker.org). Protein-coding genes were predicted using a combination of ab initio programs Snap (I. Korf, 2004), Fgenesh (Softberry, Corp) and Augustus (M. Stanke, et. Al 2008) and the annotation pipeline tool Maker (M. Yandell et. al., 2007) which aligns mRNA, EST and protein information from same species or cross-species to aid in gene structure determination and modifications. A consensus gene set from the above prediction algorithms was generated, using a logical, hierarchical approach developed at the Genome institute. Gene product naming was determined by BER (JCVI: http://ber.sourceforge.net). Our goal is to explore this WGS draft sequence of O. viverrini to better define proteins and other metabolites involved in parasitism that impact health and disease and are relevant to host-parasite relationships, parasitism, carcinogenesis and other biological and pathological processes. For information regarding this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. For specific questions regarding the O. viverrini genome project contact Makedonka Mitreva (mmitreva@genome.wustl.edu) at Washington University School of Medicine. The National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) provided funds for this project. ##Genome-Assembly-Data-START## Current Finishing Status :: High-Quality Draft Assembly Method :: allpaths LG v. 43357 (2012-12-28) Assembly Name :: O_viverrini_1.0.pg.lrna Genome Coverage :: 23x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## FEATURES Qualifiers source /organism="Opisthorchis viverrini" /mol_type="genomic DNA" /submitter_seqid="O_viverrini-1.0_Cont67" /isolate="Khon Kaen" /isolation_source="wild caught freshwater fish" /db_xref="taxon:6198" /chromosome="Unknown" /sex="hermaphrodite" /dev_stage="adult" /lab_host="hamster" /country="Thailand: Khon Kaen province" /note="pooled from 10-20 individuals" protein /locus_tag="X801_00523" intron_pos 20:0 (1/3) intron_pos 191:2 (2/3) intron_pos 408:1 (3/3) BEGIN 1 MNEQLVGASQ TYINQQFVQP RSEEPPVKVT RRSHEPPHQA KKAELTDIPP IQLGPESFNN 61 FSSVPPKQNT GFWNPTTQKA YLEKEYPGQM PTAPQFTQYL HARRLQEECN NRKLYIPQHL 121 MVPNVNGKRS FCQAFAERVP CAVKGTPCCV KPHAILPNPL ACMCGVQKCR AFVQQYCCSG 181 TSHSQTTPGG PCATCSLDQC ASHQRASIIC ELHPHVNQIW IMKNNWHSSN TCPHNESTNT 241 TGTQPAANLL SGSNGSGSNG VTPGSCYHHA CLYHSNRCRV ENTNPFPTCK CVSSVSSIPC 301 CCSGNPTTPL MNTFNYAISN WCNNQQMLPQ YPLNHLNENG ISQRGGEGAP NVFFSAAIKP 361 ATIPMVSNPI PNFAADGGRV CTGIQAQSNY QGPLAGDFRN CGEPMENVVL PLEVYETKFL 421 AHPHSSLPPE IITNTPAKPT TQPKQPYPST TEGLLKQTHR FAKFEHLQPV SSKVLRPPSS 481 HNESAEGQTN DLT //