Skip to main content

Assessing the utility of metabarcoding for diet analyses of the omnivorous wild pig (Sus scrofa)

By Publications No Comments

http://onlinelibrary.wiley.com/doi/10.1002/ece3.3638/full

Ecology and Evolution, (2018), 8:185-196. DOI: 10.1002/ece3.3638

Michael S. Robeson II, Kamil Khanipov, George Golovko, Samantha M. Wisely, Michael D. White, Michael Bodenchuck, Timothy J. Smyser, Yuriy Fofanov, Noah Fierer, Antoinette J. Piaggio

Abstract

Wild pigs (Sus scrofa) are an invasive species descended from both domestic swine and Eurasian wild boar that was introduced to North America during the early 1500s. Wild pigs have since become the most abundant free-ranging exotic ungulate in the United States. Large and ever-increasing populations of wild pigs negatively impact agricul- ture, sport hunting, and native ecosystems with costs estimated to exceed $1.5 bil- lion/year within the United States. Wild pigs are recognized as generalist feeders, able to exploit a broad array of locally available food resources, yet their feeding behaviors remain poorly understood as partially digested material is often unidentifiable through traditional stomach content analyses. To overcome the limitation of stomach content analyses, we developed a DNA sequencing-based protocol to describe the plant and animal diet composition of wild pigs. Additionally, we developed and evaluated block- ing primers to reduce the amplification and sequencing of host DNA, thus providing greater returns of sequences from diet items. We demonstrate that the use of block- ing primers produces significantly more sequencing reads per sample from diet items, which increases the robustness of ascertaining animal diet composition with molecular tools. Further, we show that the overall plant and animal diet composition is signifi- cantly different between the three areas sampled, demonstrating this approach is suit- able for describing differences in diet composition among the locations.

 

KEYWORDS

blocking primer, CO1, diet, feral swine, metabarcoding, trnL

dBBQs: dataBase of Bacterial Quality scores

By News, Publications No Comments

Visanu Wanchai, Preecha Patumcharoenpol, Intawat Nookaew and David Ussery
From The 14th Annual MCBIOS Conference
Little Rock, AR, USA. 23-25 March 2017

Abstract

Background:
It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from all major genome repositories and put them in a fast and easy-to-use database.

Suggested mechanisms for Zika virus causing microcephaly: what do the genomes tell us?

By News, Publications No Comments

Se-Ran JunTrudy M. WassenaarVisanu WanchaiPreecha PatumcharoenpolIntawat Nookaew and David W. Ussery.

Published: 28 December 2017

Abstract

Background

Zika virus (ZIKV) is an emerging human pathogen. Since its arrival in the Western hemisphere, from Africa via Asia, it has become a serious threat to pregnant women, causing microcephaly and other neuropathies in developing fetuses. The mechanisms behind these teratogenic effects are unknown, although epidemiological evidence suggests that microcephaly is not associated with the original, African lineage of ZIKV. The sequences of 196 published ZIKV genomes were used to assess whether recently proposed mechanistic explanations for microcephaly are supported by molecular level changes that may have increased its virulence since the virus left Africa. For this we performed phylogenetic, recombination, adaptive evolution and tetramer frequency analyses, and compared protein sequences for the presence of protease cleavage sites, Pfam domains, glycosylation sites, signal peptides, trans-membrane protein domains, and phosphorylation sites.

Read more: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1894-3

Visual Representation of Mumps RNA

By Galleries No Comments

This particular visualization of Mumps RNA is based on the idea that all RNA/DNA basis could be

represented as five discrete values of the gray-scale: U=black(100%), C=dark(75%), A=gray(50%), G=light(25%) and T=white(0%), while the linear RNA structure could be converted into 2D images built of 3×4 basic matrices(fig.1). In this representation the value difference between neighboring basis is 25% and between the base-pairs it is 50%.

On the next two images it is illustrated how this algorithm is implemented on the first 192 positions of the  Mumps RNA, first as a set of individual 3×4 matrices: micro-visualization(fig.2) and then as one integrated image: macro-visualization(fig.3). In this way it is be possible to compare patterns on the images representing different sequences, or to analyze the distribution of light and dark areas.

Gregor Mobius | gregor.mobius@jps.net

Genomic Basis for Microcephaly in Brazilian strains of Zika Virus

By Posters No Comments

Se-Ran Jun1, Trudy M. Wassenaar2, Visanu Wanchai1, Preecha Patumcharoenpol1, Intawat Nookaew1, David W. Ussery1

Background: The Zika virus (ZIKV), mainly transmitted by mosquitoes, is an emerging human- pathogenic flavivirus, and has shown similar spread pattern and clinical characteristics to Dengue virus. However, the ZIKV pandemic in South America is a serious threat to pregnant women, causing microcephaly in developing fetuses. The mechanism is unknown, although epidemiological evidence suggests that microcephaly is not associated with the African lineage of ZIKV.

Results: We examined 105 ZIKV complete genomes and complete coding sequences for genomic understanding of Zika virus epidemiology, based on phylogenetic comparative analysis, adaptive evolution analysis, recombination analysis, and protein properties, including protease cleavage sites, Pfam domains, glycosylation sites, signal peptides, trans-membrane protein domains, and phosphorylation sites. Recombination events within or between Asian and Brazil lineages were not observed, nor were changes in protease cleavage, glycosylation sites, signal peptides or trans-membrane domains between African and Brazil strains. Selection pressure was recognized at several polymorphic sites, mainly in the protein NS4B for the Brazil lineage. Importantly, positively selected mutations in NS4B resulted in an increased potential to be phosphorylated in Brazil strains.

Conclusion: ZIKV protein NS4B, together with NS4A, has been recently shown to inhibit human fetal neural stem cells’ Akt-mTOR signaling, a key pathway for brain development. We hypothesize that positive selection of novel phosphorylation sites in the protein NS4B of Brazil strains could interfere with phosphorylation of Akt and mTOR, impairing Akt-mTOR signaling and has resulted in an increased risk for the development of neuropathies.

This work is funded in part from the Arkansas Research Alliance and the Helen Adams & Arkansas Research Alliance Professor & Chair. This research is supported by the Arkansas High Performance Computing Center, which is funded through multiple National Science Foundation grants and the Arkansas Economic Development Commission.


1Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, 72205
2Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany

R-loop Forming Structure Prediction in Viral Genomes

By Posters No Comments

Thidathip Wongsurawat1, Piroon Jenjaroenpun1, Preecha Patumcharoenpol1, David Ussery1, and Intawat Nookaew1

Background: An R-loop is a triple-stranded nucleic acid structure comprising nascent RNA hybridized with its corresponding DNA template strand, while leaving the non-template DNA single-stranded. R-loop formation has been observed in a wide range of organisms, from bacteria to mammals. Possible roles of R-loops in transcription, telomere maintenance, genome instability, epigenetic regulation as well as disease involvement have been demonstrated. In viruses, R-loop detection is rare and their functional importance is poorly understood. Thus, we aim to investigate the prevalence and distribution of R-loop in the viral genomes.

Results: We use 6,153 viral complete genomes collected from NCBI as a reference set. R-loop prediction by QmRLFS-finder (http://rloop.bii.a-star.edu.sg/?pg=qmrlfs-finder) is performed on these genomes. A total of 1,586 out of 6,153 genomes contain at least one R-loop. The number of R-loops and the ratio of R-loop length per kb of the viral genome are presented. We find that herpesviruses are enriched with R-loops, especially human herpesvirus. In addition, the distribution of these R-loops throughout the genome is not uniform.

Conclusion: We report here the results of a search for the existence and prevalence of R-loops in viral genomes. The pervasiveness of R-loops, their enrichment at specific genomic locations suggest that these structural entities may represent a novel class of functional elements in herpesviruses. Future analysis will be focused on the R-loop-positive genes and regulatory elements of these viruses.

This work is funded in part from the Arkansas Research Alliance and the Helen Adams & Arkansas Research Alliance Professor & Chair.


1Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, 72205

Genome-Based Phylogeny of Clostridioides difficile

By Posters No Comments

David W. Ussery, Se-Ran Jun, Duah Alkam, Joyce J Johnsrud, Visanu Wanchai, and Trudy Wassenaar

Background: Clostridioides difficile infections are a major problem in hospitals. Some strains of C. difficile can spread as community acquired infections, whilst other strains are unique to individuals. Genome sequences from clinical samples can be used for epidemiological monitoring of C. difficile infections.

Results: Using Average Amino acid Identity (AAI) of more than 500 Clostridioides difficile genomes, we find several distinct clusters, some of which reflect known nosocomial infections from hospitals. Further, we find additional genomes in GenBank that are likely to be in the C. difficile group, but have different names, and some of the C. difficile genomes are likely to belong to different genera.

Conclusion: Our analysis of all the currently available C. difficile genomes allows a framework to place newly sequenced clinical isolates, quickly determining novel strains, as well as potential community-outbreak strains. This work is funded in part from the Arkansas Research Alliance and the Helen Adams & Arkansas Research Alliance Professor & Chair.

Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205

Department of Infectious Diseases, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205

Molecular Microbiology and Genomics Consultants, Tannenstrasse 7, D-55576 Zotzenheim, Germany

dBBQs : dataBase of Bacterial Quality scores

By Posters No Comments

Visanu Wanchai, Preecha Patumcharoenpol, Intawat Nookaew, and David Ussery

Background: It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for more than 100,000 bacterial genomes from all major genome repositories and put them in a fast and easy- to-use database.

Results: Prokaryotic genomic data from all sources were collected and combined to make a non- redundant set of bacterial and archaeal genomes. The genome quality score for each was calculated by four different measurements: assembly quality, number of rRNA and tRNA genes, and the occurrence of conserved functional domains. The dataBase of Bacterial Quality scores (dBBQs) was designed to store and retrieve quality scores. It offers searching function with Elasticsearch, a fast and scalable search and analytics engine for large scale database. In addition, the search results are shown in interactive JavaScript charts using dc.js. The analysis of quality scores across major public genome databases find that most (perhaps 80% or more) of the genomes are of acceptable quality for many uses. However, some genome sequences are of very quality, in a few cases even for ‘complete’ genomes.

Conclusion: dBBQs provides genome quality scores for all available prokaryotic genome sequences with a user-friendly Web- interface. These scores can be used as cut-offs to get a high- quality set of genomes for testing bioinformatics tools or improving the analysis. Moreover, all data of the four measurements that were combined to make the quality score for each genome, which can potentially be used for further analysis. dBBQs will be updated regularly as number of genomes in public databases growing rapidly and is freely use for non-commercial purpose. This work is funded in part from the Arkansas Research Alliance and the Helen Adams & Arkansas Research Alliance Professor & Chair.


Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205

KSiga: K-mer Signal analysis tool for virus genome analysis

By Posters No Comments

Preecha Patumcharoenpol, Se-Ran Jun, David Ussery, and Intawat Nookaew

Background: Genomic DNA sequences are the best ‘unique identifiers’ for an organism. Often biologists will want to know “where does this DNA sequence come from?”. However, matching a given DNA sequence with a likely genome can be difficult. For long DNA sequences, finding the best match in a large database can be time-consuming and computationally challenging. An easy and fast method for this involves looking at the distribution of smaller pieces (substrings) of DNA of the same length (“k”). The k-mer based approach has been explored as a basis of sequence analysis applications, including assembly, phylogenetic tree inference, and microbial classification. Although the k-mer based approach is not novel, selecting the appropriate k-mer length to obtain the best resolution in applications is rather arbitrary.

Results: KSiga is a computational tool which investigates k-mer content for assessing an optimal k-mer length for virus genome datasets of interest using a three step approach: (1) Cumulative Relative Entropy (CRE), (2) Average number of Common Features (ACF), and (3) Observed Feature Occurrence (OFC). Using the KSiga package, we demonstrate the reliability of these measurement by identifying an optimal k-mer length for a reference set of 6153 viral genomes. We are able to identify the optimal range of k-mer that can be used to group viral genomes visualized by a dendrogram.

Conclusion: KSiga provides a systematic way to measure the optimal k-mer length for virus genome sequences analysis. Our three step approach for an optimal k-mer length produces clusters in agreement with International Committee on Taxonomy of Viruses (ICTV) and the virus classification system, Baltimore classification, that our approach could potentially be used to improve virus genome classification. KSiga is available at https://github.com/yumyai/ksiga. This work is funded in part from the Arkansas Research Alliance and the Helen Adams & Arkansas Research Alliance Professor & Chair.


Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205