Mapping Many Gene Expression Traits
Last updated on 2024-11-12 | Edit this page
Estimated time: 60 minutes
Overview
Questions
- How do I map many genes?
Objectives
- To map several genes at the same time
Load Libraries
For this lesson, we need to install two more libraries and load them. You can do this by typing the following code:
R
BiocManager::install(c("AnnotationHub","rtracklayer"))
Let’s install our libraries, and source two other R scripts.
R
library(tidyverse)
library(knitr)
library(broom)
library(qtl2)
library(qtl2ggplot)
library(RColorBrewer)
#source("../code/gg_transcriptome_map.R")
#source("../code/qtl_heatmap.R")
Before we begin this lesson, we need to create another directory
called results
in our main directory. You can do this by
clicking on the “Files” tab and navigate into the main directory. Then
select “New Folder” and name it “results”.
Load Data
R
# expression data
load("../data/attie_DO500_expr.datasets.RData")
# data from paper
load("../data/dataset.islet.rnaseq.RData")
# phenotypes
load("../data/attie_DO500_clinical.phenotypes.RData")
# mapping data
load("../data/attie_DO500_mapping.data.RData")
# genotype probabilities
probs = readRDS("../data/attie_DO500_genoprobs_v5.rds")
Data Selection
For this lesson, lets choose a random set of 50 gene expression phenotypes.
R
genes = colnames(norm)
sams <- sample(length(genes), 50, replace = FALSE, prob = NULL)
ERROR
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
R
genes <- genes[sams]
ERROR
Error: object 'sams' not found
R
gene.info <- dataset.islet.rnaseq$annots[genes,]
ERROR
Error: object 'dataset.islet.rnaseq' not found
R
rownames(gene.info) = NULL
ERROR
Error: object 'gene.info' not found
R
kable(gene.info[1:10,])
ERROR
Error: object 'gene.info' not found
Expression Data
Lets check the distribution for the first 20 gene expression
phenotypes. If you would like to check the distribution of all 50 genes,
change for(gene in genes[1:20])
in the code below to
for(gene in genes)
.
R
par(mfrow=c(3,4))
for(gene in genes[1:20]){
hist(norm[,gene], main = gene)
}
Check the distributions. Do they all have a normal distribution?
You will notice that the distribtion of some genes are skewed to the left. This means that that only a small amount of samples have data and therefore, will need to be removed. A suitable qc would be keeping expression data that have at least 5% of the samples with more than 10 reads.
R
genes_qc <- which(as.numeric(colSums(counts[ , genes] > 10)) >= 0.05 * nrow(counts[,genes]))
ERROR
Error: object 'counts' not found
R
genes <- genes[genes_qc]
ERROR
Error: object 'genes_qc' not found
The Marker Map
We are using the same marker map as in the previous lesson
Genotype probabilities
We have explored this earlier in th previous lesson.
But, as a reminder, we have already calculated genotype probabilities
which we loaded above called probs
. This contains the 8
state g enotype probabilities using the 69k grid map of the same 500 DO
mice that also have clinical phenotypes.
Kinship Matrix
We have explored the kinship matrix in the previous lesson. It has already been calculated and loaded in above.
Covariates
Now let’s add the necessary covariates. For these 50 gene expression
data, we will correct for DOwave
,sex
and
diet_days
.
R
# convert sex and DO wave (batch) to factors
pheno_clin$sex = factor(pheno_clin$sex)
ERROR
Error: object 'pheno_clin' not found
R
pheno_clin$DOwave = factor(pheno_clin$DOwave)
ERROR
Error: object 'pheno_clin' not found
R
pheno_clin$diet_days = factor(pheno_clin$DOwave)
ERROR
Error: object 'pheno_clin' not found
R
covar = model.matrix(~sex + DOwave + diet_days, data = pheno_clin)[,-1]
ERROR
Error: object 'pheno_clin' not found
Performing a genome scan
Now lets perform the genome scan! We are also going to save our qtl
results in an Rdata
file to be used in further lessons. We
will not perform permutations in this lesson as it will take too long.
Instead we will use 6, which is the LOD score used in the paper to
determine significance.
QTL Scans
R
qtl.file = "../results/gene.norm_qtl_cis.trans.Rdata"
if(file.exists(qtl.file)) {
load(qtl.file)
} else {
qtl = scan1(genoprobs = probs,
pheno = norm[,genes, drop = FALSE],
kinship = K,
addcovar = covar,
cores = 2)
save(qtl, file = qtl.file)
}
QTL plots
Let’s plot the first 20 gene expression phenotypes. If you would like
to plot all 50, change for(i in 1:20)
in the code below to
for(i in 1:ncol(qtl))
.
R
par(mfrow=c(3,4))
for(i in 1:20) {
plot_scan1(x = qtl,
map = map,
lodcolumn = i,
main = colnames(qtl)[i])
abline(h = 6, col = 2, lwd = 2)
}
ERROR
Error: object 'qtl' not found
QTL Peaks
We are also going to save our peak results so we can use these again
else where.
First, lets get out peaks with a LOD score greater than 6.
R
lod_threshold = 6
peaks = find_peaks(scan1_output = qtl,
map = map,
threshold = lod_threshold,
peakdrop = 4,
prob = 0.95)
ERROR
Error: object 'qtl' not found
We will save these peaks into a csv file.
R
kable(peaks[1:10,] %>%
dplyr::select(-lodindex) %>%
arrange(chr, pos), caption = "Expression QTL (eQTL) Peaks with LOD >= 6")
# write_csv(peaks, "../results/gene.norm_qtl_peaks_cis.trans.csv")
ERROR
Error: object 'peaks' not found
QTL Peaks Figure
R
qtl_heatmap(qtl = qtl, map = map, low.thr = 3.5)
ERROR
Error in qtl_heatmap(qtl = qtl, map = map, low.thr = 3.5): could not find function "qtl_heatmap"
Challenge 1:
What do the qtl scans for all gene exression traits look like?
Note: Don’t worry, we’ve done the qtl scans for you!!! You can
read in this file, ../data/gene.norm_qtl_all.genes.Rdata
,
which are the scan1
results for all gene expression
traits.
R
load("../data/gene.norm_qtl_all.genes.Rdata")
lod_threshold = 6
peaks = find_peaks(scan1_output = qtl.all,
map = map,
threshold = lod_threshold,
peakdrop = 4,
prob = 0.95)
write_csv(peaks, "../results/gene.norm_qtl_all.genes_peaks.csv")
## Heat Map
qtl_heatmap(qtl = qtl, map = map, low.thr = 3.5)
Key Points
- Use
.md
files for episodes when you want static content - Use
.Rmd
files for episodes when you need to generate output - Run
sandpaper::check_lesson()
to identify any issues with your lesson - Run
sandpaper::build_lesson()
to preview your lesson locally