The Jackson Laboratory

Oct 12, 2018

9:00 am - 4:30 pm

Instructors: Sue McClatchy, Asli Uyar, Dan Gatti

Helpers: Yuka Takemon, Duy Pham

General Information

This workshop is open to those who have met the prerequisite by taking a 2-day R workshop or otherwise being competent in R. The workshop is open to those at the Jackson Laboratory and neighboring institutions.

Where: Breezeway Bioinformatics Training Room, Bldg 1, Room 1540, 600 Main Street, Bar Harbor, Maine. Get directions with OpenStreetMap or Google Maps.

When: Oct 12, 2018. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Code of Conduct.

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email susan.mcclatchy@jax.org for more information.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Schedule

We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

  1. Find, install, and learn how to use Bioconductor packages
  2. Import and manipulate genomic files and Bioconductor data objects
  3. Visually assess quality of RNA-seq data
  4. Perform basic differential analysis of RNA-seq data
  5. Understand how to apply the GenomicRanges infrastructure to real-world problems
  6. Gain insight into the design principles of the GenomicRanges infrastructure and how it was meant to be used
  7. Learn about various annotation package and public data resources

Reference...


Setup

To participate in a workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio. If you already have R and RStudio installed on your machine, please upgrade to the latest versions of each.

Windows

Video Tutorial

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

macOS

Video Tutorial

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.

Linux

You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo dnf install R). Also, please install the RStudio IDE.

Bioconductor

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development. The current release of Bioconductor is version 3.7; it works with R version 3.5.0. Users of older R and Bioconductor must update their installation to take advantage of new features and to access packages that have been added to Bioconductor since the last release.

Packages

Packages available in Bioconductor are summarized at https://bioconductor.org/packages. The widget on the left summarizes four distinct types of Bioconductor packages: 1) software, 2) annotation, 3) experiment data, and 4) workflow. Like CRAN (R) packages, Bioconductor packages need to be installed only once per R installation, and then attached to each session where they are going to be used. Bioconductor packages are installed slightly differently from CRAN packages. The first step is to install the BiocManager package from CRAN. Open RStudio, then copy and paste the following code into the console:

if (!"BiocManager" %in% rownames(installed.packages()))
  install.packages("BiocManager", repos="https://cran.r-project.org")
The next step is to install the desired Bioconductor packages. The syntax to install the packages is
BiocManager::install(c("rtracklayer", "GenomicRanges", "SummarizedExperiment", "DESeq2", "tximportData", "airway", "apeglm", "AnnotationHub", "ReportingTools", "Glimma", "splatter"))
A convenient function in BiocManager is available(), which accepts a regular expression to find matching packages. The following finds all TxDb packages (describing exon, transcript, and gene coordinates).
BiocManager::available("TxDb")
Use the BiocManager::install() function above to install UCSC known genes for human hg38 and mouse mm10.
BiocManager::install(c("TxDb.Hsapiens.UCSC.hg38.knownGene", "TxDb.Mmusculus.UCSC.mm10.knownGene"))
Bioconductor packages tend to depend on one another quite alot, so it is important that the correct versions of all packages are installed. Validate your installation with
BiocManager::valid()
In addition to the Bioconductor packages named above, we'll use some of the R packages from tidyverse. Run the following code in the console, or install packages from the RStudio Packages tab.
install.packages("tidyverse")

Project organization

  1. Make a new folder in your Desktop called bioconductor.
  2. Move into this new folder.
  3. Create a data folder to hold the data, a scripts folder to house your scripts, and a results folder to hold results.
Alternatively, you can use the R console to run the following commands for steps 1-3.

    setwd("~/Desktop")
    dir.create("./bioconductor")
    setwd("~/Desktop/bioconductor")
    dir.create("./data")
    dir.create("./scripts")
    dir.create("./results")

Data

Please download the following large files before the workshop, and place them in your data folder. You can download the files from the URLs below and move the files the same way that you would for downloading and moving any other kind of file.