LOCUS       KV894352               42007 bp    DNA     linear   CON 24-FEB-2017
DEFINITION  Opisthorchis viverrini isolate Khon Kaen unplaced genomic scaffold
            O_viverrini-1.0_Cont2898, whole genome shotgun sequence.
ACCESSION   KV894352 LASN01000000
VERSION     KV894352.1
DBLINK      BioProject: PRJNA230518
            BioSample: SAMN03378119
KEYWORDS    WGS; HIGH_QUALITY_DRAFT.
SOURCE      Opisthorchis viverrini
  ORGANISM  Opisthorchis viverrini
            Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Platyhelminthes;
            Trematoda; Digenea; Opisthorchiida; Opisthorchiata;
            Opisthorchiidae; Opisthorchis.
REFERENCE   1  (bases 1 to 42007)
  AUTHORS   Mitreva,M.
  TITLE     Draft genome of the nematode, Opisthorchis viverrini
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 42007)
  AUTHORS   Mitreva,M., Pepin,K.H., Martin,J., Ozersky,P., Palsikar,V.B.,
            Zhang,X. and Wilson,R.K.
  TITLE     Direct Submission
  JOURNAL   Submitted (31-MAR-2015) The Genome Institute, Washington University
            School of Medicine, 4444 Forest Park, St. Louis, MO 63108, USA
COMMENT     The human liver flukes, Opisthorchis viverrini, O. felineus and C.
            sinensis remain important public health problems in many parts of
            the world. Clonorchis sinensis is widespread in China, Korea and
            Vietnam, while O. viverrini is endemic in Southeast Asia, including
            Thailand, Lao Peoples Democratic Republic (Lao PDR), Cambodia and
            central Vietnam. Human infection follows the consumption of raw or
            undercooked cyprinoid (freshwater) fish haboring infective
            metacercariae. Recent reports suggested that about 35 million
            people are infected with C. sinensis globally; with up to 15
            million human infections in China alone and another 8-10 million
            individuals infected with O. viverrini in Thailand and Lao PDR.
            More than 600 million people, mainly in Asia, are at risk of
            infection with these two liver flukes (Petney et al 2013 Int J
            Parasitol 43, 1031-46).
            
            The infections are associated with hepatobiliary diseases including
            hepatomegaly, cholangitis, fibrosis of the periportal system,
            cholecystitis, gallstones and are major aetiological agents of bile
            duct cancer, cholangiocarcinoma (CCA). O. viverrini and C. sinensis
            are classified as Group 1 carcinogens  metazoan parasites that are
            carcinogenic to humans  by the the International Agency for
            Research on Cancer, World Health Organization (WHO) (Bouvard et
            al., 2009 Lancet Oncol 10, 321-2). Therefore, not only do these
            liver flukes cause pathogenic helminth infections, they also are
            carcinogenic in humans in similar fashion to several other more
            well known biological carcinogens, in particular hepatitis viruses,
            human papilloma virus and Helicobacter pylori.
            
            The liver fluke endemic area of Khon Kaen, Northeast Thailand has
            reported the highest incidence of liver cancer in the world (Shin
            et al 2010 Cancer Sci. 101, 579-85; Sripa et al. 2014 Acta Tropica
            141, 361-367). In regard to socioeconomic impact, it was estimated
            20 years ago that the total direct cost of O. viverrini infection
            to the work force (between the age of 15 and 60 years) in Northeast
            Thailand was US$ 80 million per annum. More recently, it has been
            reported that liver and bile duct cancer, the end-stage consequence
            of liver fluke disease, ranks number five in Thai males among all
            diseases with highest number of disability-adjusted life years
            (DALYs) (see Sripa et al., 2012 Trends Parasitol 28, 395-407).
            
            The liver flukes used here as a source of genomic DNA for the WGS
            were obtained by collecting the metacerciae of O. viverrini from
            wild caught fresh water fishes in the vicinity of Khon Kaen City,
            Khon Kaen province, Thailand. Hamsters were experimentally infected
            with these metacercariae at Khon Kaen University, after which adult
            O. viverrini worms were recovered from the biliary tract of the
            hamsters at euthanasia six weeks after infection.  [The outbred,
            male Syrian (golden) hamsters (Mesocricetus auratas) were reared at
            the animal facilities of the Faculty of Medicine, Khon Kaen
            University, Khon Kaen, Thailand. Protocols for the experiments were
            approved by the Animal Ethics Committee of Khon Kaen University,
            approval number AEKKU25/2554, according to the Ethics of Animal
            Experimentation of the National Research Council of Thailand.]
            Genomic DNAs were recovered from pools of 10 to 20 adult
            (hermaphroditc) flukes and provided by Dr. Banchob Sripa
            (banchob@kku.ac.th) and Dr. Paul Brindley
            (pbrindley@email.gwu.edu).
            
            This assembly consists of fragments, 3kb and 8kb insert whole
            genome shotgun libraries. The sequences were generating on the
            Illumina platform. An initial assembly was generated using
            Allpaths_LG. To improve scaffolding and contiguity, we used our in
            house tool Pygap (Gap closure tool), which uses the Pyramid
            assembler with Illumina paired reads to close gaps and extending
            contigs. An alternate assembly was generated by first using flash
            (fast length adjustment of short reads Bioinformatics 27:21 (2011),
            2957-63) to merge the Illumina fragments. These reads were then fed
            to the Newbler assembler (Roche). Newbler contigs > 500 bases were
            used to fill gaps in the allpaths scaffolds using PBJelly (PLoS ONE
            7(11): e47768. doi:10.1371/journal.pone.0047768). The final step
            was using L_RNA_scaffolder (BMC Genomics 2013, 14:604), which uses
            transcript alignments, to improve contiguity.
            
            The repeat library was generated using Repeatmodeler (A. Smit, R.
            Hubley http://www.systemsbiology.org/). The Ribosomal RNA genes
            were identified using RNAmmer
            ((http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer ) and
            transfer RNA's were identified with tRNAscan-SE (Lowe and Eddy,
            1997). Non-coding RNAs, such as microRNAs, were identified by
            sequence homology search of the Rfam database
            (http://selab.janelia.org/software.html). Repeats and predicted
            RNA's were then masked using RepeatMasker (A. Smit, R. Hubley & P.
            Green http://repeatmasker.org). Protein-coding genes were predicted
            using a combination of ab initio programs Snap (I. Korf, 2004),
            Fgenesh (Softberry, Corp) and Augustus (M. Stanke, et. Al 2008) and
            the annotation pipeline tool Maker (M. Yandell et. al., 2007) which
            aligns mRNA, EST and protein information from same species or
            cross-species to aid in gene structure determination and
            modifications. A consensus gene set from the above prediction
            algorithms was generated, using a logical, hierarchical approach
            developed at the Genome institute. Gene product naming was
            determined by BER (JCVI: http://ber.sourceforge.net).
            
            Our goal is to explore this WGS draft sequence of O. viverrini to
            better define proteins and other metabolites involved in parasitism
            that impact health and disease and are relevant to host-parasite
            relationships, parasitism, carcinogenesis and other biological and
            pathological processes.
            
            For information regarding this assembly or project, or any other
            GSC genome project, please visit our Genome Groups web page
            (http://genome.wustl.edu/genome_group_index.cgi) and email the
            designated contact person. For specific questions regarding the O.
            viverrini genome project contact Makedonka Mitreva
            (mmitreva@genome.wustl.edu) at Washington University School of
            Medicine. The National Human Genome Research Institute (NHGRI) of
            the National Institutes of Health (NIH) provided funds for this
            project.
            
            ##Genome-Assembly-Data-START##
            Current Finishing Status :: High-Quality Draft
            Assembly Method          :: allpaths LG v. 43357 (2012-12-28)
            Assembly Name            :: O_viverrini_1.0.pg.lrna
            Genome Coverage          :: 23x
            Sequencing Technology    :: Illumina
            ##Genome-Assembly-Data-END##
FEATURES             Location/Qualifiers
     source          1..42007
                     /organism="Opisthorchis viverrini"
                     /mol_type="genomic DNA"
                     /submitter_seqid="O_viverrini-1.0_Cont2898"
                     /isolate="Khon Kaen"
                     /isolation_source="wild caught freshwater fish"
                     /db_xref="taxon:6198"
                     /chromosome="Unknown"
                     /sex="hermaphrodite"
                     /dev_stage="adult"
                     /lab_host="hamster"
                     /country="Thailand: Khon Kaen province"
                     /note="pooled from 10-20 individuals"
     assembly_gap    5027..6327
                     /estimated_length=1301
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     gene            complement(<17739..33082)
                     /locus_tag="X801_05866"
     mRNA            complement(join(<17739..17844,19003..19155,19900..20183,
                     22379..22670,24545..24801,25968..26151,26588..26759,
                     27542..27676,29433..29569,29694..29875,31425..31548,
                     32529..33082))
                     /locus_tag="X801_05866"
                     /product="piwi domain protein"
     CDS             complement(join(<17739..17844,19003..19155,19900..20183,
                     22379..22670,24545..24801,25968..26151,26588..26759,
                     27542..27676,29433..29569,29694..29875,31425..31548,
                     32529..33082))
                     /locus_tag="X801_05866"
                     /inference="protein motif:HMMPfam:IPR003100"
                     /inference="protein motif:HMMPfam:IPR003165"
                     /inference="protein motif:HMMPfam:IPR014811"
                     /note="KEGG: scl:sce7039 1.1e-08 putative 5'-nucleotidase
                     family protein; K01081 5'-nucleotidase"
                     /codon_start=1
                     /product="piwi domain protein"
                     /protein_id="OON18282.1"
                     /db_xref="InterPro:IPR003100"
                     /db_xref="InterPro:IPR003165"
                     /db_xref="InterPro:IPR014811"
                     /translation="MASSTPATINQPVMVAPIGSPPGSAVLNNGCTISPNCGVTSNTT
                     NASMTYSNGPTGMASLSTGNGNGGNQCGNGSSPPAVGSGGDVPSIGGGGGSASSAGAG
                     SNSSGSGGAGGSTGGSGSCSGMGHSASSQHLIQFEPPARPGRGSDGRAISLRANHFEI
                     RMPKGFLHHYDVSIVPEKCPRRVNREIIETMVNSMHYQKYFYNQKPVFDGRRNMYTRD
                     PLPISKEKVELEVTLPGEGKDRVFRVAIKHVSEVSLFALEEALGGHIRHIPNDAVVSL
                     DVIMRHLPSMSYTPVGRSFFQNPDGYENPLGGGREVWFGFHQSVRPSQWRMMLNIDVS
                     ATAFYKAQSVIDFMCEVLDISDKSEQRRALTDSQRVKFTKEIKGLKVEITHCGTMRRK
                     YRVCNVTRRPAHTQSFPLQLETGATMECTVAKYFQERYNIRLEYPHLPCLQVGQEQKH
                     TYLPLEVCNMVAGQRCIKKLTDMQTSTMIKATARSAPDREKEINNLVKRANFNADPHL
                     QMFGINVNTRMAEIQGRVIPAPKIQYGGRTKAQASPQLGVWDMRGKQFFSGIEIKVWA
                     IACFAPQRIVREDSLRVRPAVFREPVIFLGADVTHPPAGDKTKPSIAAVVASMDAHPS
                     RYSATVRVQPHRQEIIQDLYPMVKDLLLQFYRATRFKPTRIIYYRDGVSEGQFLNVLN
                     HELRAIREACVKLELGYQPGITFIVVQKRHHTRLFCADKKDQMGKSGNIPAGTTVDQV
                     ITHPTEFDFYLCSHAGIQGTSRPSHYHVLWDDNRFSADDIQNLTYQLCHTYVRCTRSV
                     SIPAPAYYAHLVAFRARYHLVEKEIDSGEGSQKSGNSDERTPTAMMRAVTVHPETLRV
                     MYFA"
     assembly_gap    37233..38834
                     /estimated_length=1602
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
CONTIG      join(LASN01017599.1:1..5026,gap(1301),LASN01017600.1:1..30905,
            gap(1602),LASN01017601.1:1..3173)
//