Summary and Schedule
Quantitative trait mapping is used in biomedical, agricultural, and evolutionary studies to find causal genes for quantitative traits, to aid crop and breed selection in agriculture, and to shed light on natural selection. Examples of quantitative traits include cholesterol level, plant yield, or egg size, all of which are continuous variables. The goal of quantitative trait locus (QTL) analysis is to identify genomic regions linked to a phenotype, to map these regions precisely, and to define the effects, number, and interactions of QTL.
QTL analysis can be performed in natural populations or in experimental crosses, and can be studied in humans and non-human species. Human studies, however, are very expensive, lack environmental control, and can be confounded by population structure such that associations between genotype and phenotype are not necessarily causal.
QTL analysis in experimental crosses requires two or more strains that differ genetically with regard to a phenotype of interest. Genetic markers, such as SNPs or microsatellites, distinguish between parental strains in the experimental cross. Markers that are genetically linked to a phenotype will segregate more often with phenotype values (high or low values, for example), while unlinked markers will not be significantly associated with the phenotype. The markers themselves might be associated with the phenotype but are not causal. Rather, markers may be associated with the phenotype through linkage to nearby QTL. They serve as signposts indicating the neighborhood of a QTL that influences a phenotype. Covariates such as sex or diet can also influence the phenotype.
R/qtl2 (aka qtl2) is a reimplementation of the QTL analysis software R/qtl to better handle high-dimensional data and complex cross designs such as the Diversity Outbred. Typically R/qtl2 will be employed in “batch” (for example, on a cluster) rather than interactively.
This lesson will focus on the R/qtl2 package in R. A workflow for quantitative trait mapping with R/qtl2 is shown here.
To cite R/qtl in publications: Broman KW, Wu H, Sen S, Churchill GA (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19:889-89
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to the Data Set | What data will we be using in this workshop? |
Duration: 00h 20m | 2. Input File Format |
How are the data files formatted for qtl2? Which data files are required for qtl2? Where can I find sample data for mapping with the qtl2 package? |
Duration: 00h 45m | 3. Calculating Genotype Probabilities |
How do I calculate QTL at positions between genotyped markers? How do I calculate QTL genotype probabilities? How do I calculate allele probabilities? How can I speed up calculations if I have a large data set? |
Duration: 01h 20m | 4. Performing a Genome Scan |
How do I perform a genome scan? How do I plot a genome scan? How do additive covariates differ from interactive covariates? |
Duration: 02h 50m | 5. Calculating A Kinship Matrix |
Why would I calculate kinship between individuals? How do I calculate kinship between individuals? What does a kinship matrix look like? |
Duration: 03h 20m | 6. Performing a genome scan with a linear mixed model |
How do I use a linear mixed model in a genome scan? How do different mapping and kinship calculation methods differ? |
Duration: 04h 05m | 7. Performing a Genome Scan with Binary Traits | How do I perform a genome scan for binary traits? |
Duration: 04h 55m | 8. Finding Significant Peaks via Permutation | How can I evaluate the statistical significance of genome scan results? |
Duration: 05h 25m | 9. Finding QTL peaks | How do I locate QTL peaks above a certain LOD threshold value? |
Duration: 06h 25m | 10. Estimating QTL effects | How do I find the founder allele effects at a QTL peak? |
Duration: 06h 55m | 11. Integrating Gene Expression Data |
How can I use gene expression data to identify candidate genes? What is expression QTL mapping? |
Duration: 08h 15m | 12. QTL Mapping in Diversity Outbred Mice |
How do I map traits in Diversity Outbred mice? How do I interpret the founder allele effects at a QTL peak? How do I perform association mapping in Diversity Outbred mice? How do I narrow down the set of candidate genes under a QTL? |
Duration: 10h 45m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Software Setup
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.
Install the latest version of R from CRAN.
Install the latest version of RStudio. Choose the free RStudio Desktop version for Windows, Mac, or Linux.
Start RStudio.
-
Install packages.
- The qtl2 package contains code for haplotype reconstruction, QTL mapping and plotting.
- The qtl2convert package contains code for converting data objects from one format to another.
- Install qtl2 by copying and pasting the following code in the R console.
R
install.packages(c("tidyverse", "ggbeeswarm", "qtl2", "qtl2convert"))
Once the installation is complete, load the libraries to make sure that they installed correctly.
R
library(tidyverse)
library(ggbeeswarm)
library(qtl2)
library(qtl2convert)
If the libraries don’t load and you received errors during the installation, please contact the workshop instructors before the workshop to help you.
Project organization
- Create a new project in your Desktop called
qtl_mapping
.
- Click the
File
menu button, thenNew Project
. - Click
New Directory
. - Click
New Project
. - Type
qtl_mapping
as the directory name. Browse to your Desktop to create the project there. - Click the
Create Project
button.
- Use the
Files
tab to create adata
folder to hold the data, ascripts
folder to house your scripts, and aresults
folder to hold results. Alternatively, you can use the R console to run the following commands for step 2 only. You still need to create a project with step 1.
R
dir.create("./data")
dir.create("./scripts")
dir.create("./results")
Data Sets
For this course, we will have several data files which you will need
to download to the data
directory in the project folder on
your Desktop. Copy, paste, and run the following code in the RStudio
console.
The first file contains the data that we will use for QTL mapping in an F2 population. Download it using the code below.
R
download.file(url = "https://thejacksonlaboratory.box.com/shared/static/svw7ivp5hhmd7vb8fy26tc53h7r85wez.zip",
destfile = "data/attie_b6btbr_grcm39.zip",
mode = "wb")
unzip(zipfile = "data/attie_b6btbr_grcm39.zip",
exdir = "./data/")
The second file contains the Diversity Outbred mapping data.
R
download.file(url = "https://thejacksonlaboratory.box.com/shared/static/wspizp2jgrtngvvw5ixredpu7627mh5w.rdata",
destfile = "data/qtl2_demo_grcm39.Rdata",
mode = "wb")
Next, download the MUGA marker positions from Karl Broman’s Github page.
R
download.file(url = "https://raw.githubusercontent.com/kbroman/MUGAarrays/main/UWisc/muga_uwisc_v4.csv",
destfile = "data/muga_uwisc_v4.csv",
mode = "wb")
Next, we need a database of the DO founder SNPs and gene positions. This file is 10 GB, so it will take a while to download.
R
download.file(url = "https://figshare.com/ndownloader/files/40157572",
destfile = "data/fv.2021.snps.db3",
mode = "wb")
If you get an error message downloading this file from figshare, use
a web browser to download from the URL. Go to https://figshare.com/ndownloader/files/40157572
to start the download. Then move the file from wherever your downloads
go (e.g. Downloads
) to the data
directory in the qtl_mapping
project. You can use a
graphical user interface (e.g. Windows File Explorer, Mac
Finder) to move the file.
Development of this lesson was funded by NIH award GM070683 to Dr. Gary Churchill at The Jackson Laboratory.