/tag/bioinformatics

  • Workshops

    UVA Research Computing provides training opportunities covering a variety of data analysis, basic programming and computational topics. All of the classes listed below are taught by experts and are freely available to UVa faculty, staff and students.
    New to High-Performance Computing? We offer orientation sessions to introduce you to the Afton & Rivanna HPC systems on Wednesdays (appointment required).
    Wednesdays 3:00-4:00pm Sign up for an “Intro to HPC” session Upcoming Workshops DATE WORKSHOP INSTRUCTOR There are currently no training events scheduled. Please check back soon! Research Computing is partnering with the Research Library and the Health Sciences Library to deliver workshops covering a variety of research computing topics.

  • Bioinformatics & Genomics

    UVA Research Computing (RC) can help with your bioinformatics project.
    Next-generation sequence data analysis RC staff can help you start to use popular bioinformatics software for functions such as
    Genome assembly, reference-based and/or de-novo Whole-Genome/Exome sequence analysis for variant calling/annotation RNA-Seq data analysis to quantify, discover and profile RNAs Mircobiome data analysis, including 16S rRNA surveys, OTU clustering, microbial profiling, taxonomic and functional analysis from whole shotgun metagenomic/metatranscriptomic datasets Epigenetic analysis from BSAS/ChIP-Seq/ATAC-Seq Computing Platforms UVA has three computing facilities available to researchers: Rivanna and Afton, for non-sensitive data, and Ivy, for sensitive data. In addition, cloud-based services offer a computing environment for running flexible, scalable on-demand applications.

  • COVID Saliva Testing

    In cooperation with the UVA Saliva Testing Lab, the UVA Health System, and the Virginia Department of Health, the “Be SAFE” saliva
    testing program was launched in late 2020. Now a retired project, Be SAFE used saliva samples to detect the COVID-19 virus through a diagnostic PCR test.
    Research Computing provided computational, storage, and data integration expertise to this project.

  • Bioinformatics Resources and UVA HPC

    The UVA research community has access to numerous bioinformatics software installed directly or available through the bioconda Python modules.
    Click here for a comprehensive list of currently-installed bioinformatics software.
    Popular Bioinformatics Software Below are some popular tools and useful links for their documentation and usage:
    Tool Version Description Useful Links BEDTools 2.26.0 BEDTools utilities allow one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. Homepage Tutorial BLAST+ 2.7.1 BLAST+ is a suite of command-line tools that offers applications for BLAST search, BLAST database creation/examination, and sequence filtering.

  • COVID-19 Surveillance Dashboard

    The Biocomplexity Institute at the University of Virginia has been at the forefront of epidemiological modeling to track the COVID-19 pandemic and has developed a suite of COVID-19 epidemic response resources including a series of dashboards to better help the public and the government better understand the pandemic. This is a static view of the Institute’s interactive COVID-19 Surveillance Dashboard, which provides a visualization of COVID-19 cases, recoveries, and deaths across the globe. In an effort to support the planning and response efforts for the recent Coronavirus pandemic, researchers prepared this visualization tool that provides a unique way of examining data curated by different data sources.
  • LOLAweb

    The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data resources and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, annotations from external data sources can be easily connected to new genomic data.
    SOM Research Computing is working with faculty in the UVA Center for Public Health Genomics to implement LOLAweb, an online tool for performing genomic locus overlap annotations and analyses. This project, written in the statistical programming language R, allows users to specify region set data in BED format for automated enrichment analysis.

  • Refgenie: A Reference Genome Resource Manager

    Reference genome assemblies are essential for high-throughput sequencing analysis projects. Typically, genome assemblies are stored on disk alongside related resources; e.g., many sequence aligners require the assembly to be indexed. The resulting indexes are broadly applicable for downstream analysis, so it makes sense to share them. However, there is no simple tool to do this.
    Refgenie is a reference genome assembly asset manager. Refgenie makes it easier to organize, retrieve, and share genome analysis resources. In addition to genome indexes, refgenie can manage any files related to reference genomes, including sequences and annotation files. Refgenie includes a command line interface and a server application that provides a RESTful API, so it is useful for both tool development and analysis.

  • NCBI Blast and UVA HPC

    Description Basic Local Alignment Search Tool, or BLAST, is an algorithm
    for comparing primary biological sequence information, such as the amino-acid
    sequences of different proteins or the nucleotides of DNA sequences.
    Software Category: bio
    For detailed information, visit the NCBI Blast
    website.
    Available Versions The current installation of NCBI Blast
    incorporates the most popular packages. To find the available versions and learn how to load them, run:
    module spider blast The output of the command shows the available NCBI Blast
    module versions.
    For detailed information about a particular NCBI Blast
    module, including how to load the module, run the module spider command with the module’s full version label.

  • Center for Diabetes Technology PriMed

    In their research around constant glucose monitoring and the automated maintenance of insulin for patients, the CDT is exploring data drawn from external data sources such as DexCom and FitBit. RC has assisted the CDT by designing a secure computing footprint in Amazon Web Services to pull in these data, parse and process them, in order to perform deeper analytics through machine learning. In January 2018, CDT sponsored a ski camp at Wintergreen Resort for a group of youth diagnosed with Type I diabetes with the goal of importing glucose, insulin, and exercise metrics at the end of each day through remote web APIs.
  • epihet

    RC is working with researchers in the Center for Public Health Genomics to write an R package to calculate Relative Proportion of Sites with Intermediate Methylation (RPIM) scores, which represent the epigenetic heterogeneity in a bisulfite sequencing sample.
    https://github.com/databio/epihet
    PI: Nathan Sheffield (Center for Public Health Genomics)

  • Microbiome Analysis of Hospital Sink Drains

    Sink drains are notoriously characterized as reservoirs of pathogens causing nosocomial transmissions in hospitals worldwide. Outbreaks where sinks have been implicated as source of antibiotic resistant bacteria have upsurged over the last few years. To understand transmission dynamics University of Virginia School of Medicine has established a unique “Sink Lab” for this research. This one-of-the kind laboratory establishes UVa as worldwide frontrunners in investigating sink related antibiotic resistant bacteria and how they spread. RC is working with the UVa Sink Lab for genomic analysis of the sink biomass.
    RC is contributing to:
    Comparative genomic analysis of gram-negative bacterial isolates:
    The analysis aims at tracking the mobile genetic element blaKPC gene, which encodes for Klebsiella pneumoniae carbapenemase (KPC) enzyme that confers resistance to all beta lactam agents including penicillins, cephalosporins, monobactams and carbapenems.

  • simpleCache

    In partnership with researchers in the Center for Public Health Genomics, School of Medicine Research Computing has contributed to the development of a novel package for computationally efficient caching and loading of data in R. simpleCache provides an interface to a series of functions to store and retrieve cached objects, including in the context batch processing or HPC environments. The package further extends base R functionality of saving and loading external representations of objects by enabling caching to pre-defined directories and timed cache operations.
    RC helped document and develop new functions for the package ahead of its release to the Comprehensive R Archive Network (CRAN).

  • Bioinformatics Packages on Ivy Linux VM

    Available Packages The following bioinformatics packages are available on the Ivy Linux Virtual Machines
    Bowtie2 Bowtie2 is a memory-efficient tool for aligning short sequences to long reference genomes.
    For bowtie2 usage information, please click [here] (/userinfo/ivy/ivy-linux-sw/bioinformatics/bowtie2)
    HISAT2 HISAT2 is a fast and sensitive tool for aligning short reads against the general human population
    (as well as single reference genome)

    • Requires approval before installation
      For HISAT2 usage information, please click here

  • Bioinformatics Packages on Windows VM

    Available Packages The following bioinformatics packages are available on the Windows Virtual Machines
    Bowtie2 For more information on bowtie2, please click [here] (/userinfo/ivy/ivy-windows-sw/bioinformatics/bowtie2) –>
    HISAT2 Requires approval before installation. For more information on HISAT2, please click here

  • Bowtie2 on Ivy Linux VM

    Bowtie2 is a memory-efficient tool for aligning short sequences to long reference genomes.
    It indexes the genome using FM Index, which is based on Burrows-Wheeler Transform algorithm,
    to keep its memory footprint small. Bowtie2 supports gapped, local and paired-end alignment modes.
    Alignment to a known reference using Bowtie2 is often an essential first step in a myriad of NGS analyses workflows.
    Bowtie2 Usage Alignment using bowtie2 is a 2-step process - indexing the reference genome, followed by aligning the sequence data.
    Create indexes of your reference genome of interest stored in reference.fasta file:
    bowtie2-build [option(s)] <reference.fasta> <bt2-index-basename> This will create new files with the provided basename and extensions .

  • Bowtie2 on Ivy Windows VM

    Bowtie2 is a memory-efficient tool for aligning short sequences to long reference genomes.
    It indexes the genome using FM Index, which is based on Burrows-Wheeler Transform algorithm,
    to keep its memory footprint small. Bowtie2 supports gapped, local and paired-end alignment modes.
    Alignment to a known reference using Bowtie2 is often an essential first step in a myriad of NGS analyses workflows.
    Bowtie2 Usage Alignment using bowtie2 is a 2-step process - indexing the reference genome, followed by aligning the sequence data.
    Create indexes of your reference genome of interest stored in reference.fasta file:
    bowtie2-build [option(s)] <reference.fasta> <bt2-index-basename> This will create new files with the provided basename and extensions .

  • HISAT2 on Ivy Linux VM

    • Please note that HISAT2 requires approval prior to installation on the VM
      HISAT2 is a fast and sensitive tool for aligning short reads against the general human population
      (as well as single reference genome). It indexes the genome using a Hierarchical Graph FM Index
      (HGFM) strategy, i.e. a large set of small indexes that collectively cover the whole genome
      (each index representing a genomic region of 56 Kbp).
      HISAT2 Usage: Alignment using HISAT2 is a 2-step process - indexing the reference genome, followed by aligning the sequence data.
      Create indexes of your reference genome of interest stored in reference.fasta file:

  • HISAT2 on Ivy Windows VM

    • Please note that HISAT2 requires approval prior to installation on the VM
      HISAT2 is a fast and sensitive tool for aligning short reads against the general human population
      (as well as single reference genome). It indexes the genome using a Hierarchical Graph FM Index
      (HGFM) strategy, i.e. a large set of small indexes that collectively cover the whole genome
      (each index representing a genomic region of 56 Kbp).
      HISAT2 Usage: Alignment using HISAT2 is a 2-step process - indexing the reference genome, followed by aligning the sequence data.
      Create indexes of your reference genome of interest stored in reference.fasta file: