Finding QTL peaks
Last updated on 2024-11-15 | Edit this page
Overview
Questions
- How do I locate QTL peaks above a certain LOD threshold value?
Objectives
- Locate QTL peaks above a LOD threshold value throughout the genome.
- Identify the Bayesian credible interval for a QTL peak.
Once we have LOD scores from a genome scan and a significance threshold, we can look for QTL peaks associated with the phenotype. High LOD scores indicate the neighborhood of a QTL but don’t give its precise position. To find the exact position of a QTL, we define an interval that is likely to hold the QTL.
We’ll use the Bayesian credible interval, which is a method for defining QTL intervals. It describes the probability that the interval contains the true value. Credible intervals make a probabilistic statement about the true value, for example, a 95% credible interval states that there is a 95% chance that the true value lies within the interval.
Let’s remind ourselves how the genome scan for insulin looks.
R
thr <- summary(perm_add_loco)
plot_scan1(x = lod_add_loco,
map = cross$pmap,
main = "Insulin")
add_threshold(map = cross$pmap,
thresholdA = thr,
col = 'red')
To find peaks above a given threshold LOD value, use the function
find_peaks()
. It can also provide Bayesian credible
intervals by using the argument prob
(the nominal coverage
for the Bayes credible intervals). Set the argument
expand2markers = FALSE
to keep from expanding the interval
out to genotyped markers, or exclude this argument if you’d like to
include flanking markers.
You need to provide both the scan1()
output, the marker
map and a threshold. We will use the 95% threshold from the permutations
in the previous lesson.
R
find_peaks(scan1_output = lod_add_loco,
map = cross$pmap,
threshold = thr,
prob = 0.95,
expand2markers = FALSE)
OUTPUT
lodindex lodcolumn chr pos lod ci_lo ci_hi
1 1 log10_insulin_10wk 2 138.94475 7.127351 64.949395 149.57739
2 1 log10_insulin_10wk 7 144.18230 5.724018 139.368290 144.18230
3 1 log10_insulin_10wk 12 25.14494 4.310493 15.834815 29.05053
4 1 log10_insulin_10wk 14 22.24292 3.974322 6.241951 45.93876
5 1 log10_insulin_10wk 16 80.37433 4.114024 10.238134 80.37433
6 1 log10_insulin_10wk 19 54.83012 5.476587 48.370980 55.15007
In the table above, we have one peak per chromosome because that is
the default behavior of find_peaks()
. The
find_peaks()
function can also pick out multiple peaks on a
chromosome: each peak must exceed the chosen threshold, and the argument
peakdrop
indicates the amount that the LOD curve must drop
between the lowest of two adjacent peaks. Use this feature with
caution.
R
find_peaks(scan1_output = lod_add_loco,
map = cross$pmap,
threshold = thr,
peakdrop = 1.8,
prob = 0.95,
expand2markers = FALSE)
OUTPUT
lodindex lodcolumn chr pos lod ci_lo ci_hi
1 1 log10_insulin_10wk 2 138.94475 7.127351 64.949395 149.57739
2 1 log10_insulin_10wk 7 144.18230 5.724018 139.368290 144.18230
3 1 log10_insulin_10wk 12 25.14494 4.310493 15.834815 29.05053
4 1 log10_insulin_10wk 14 22.24292 3.974322 6.241951 45.93876
5 1 log10_insulin_10wk 16 17.48123 3.995627 5.604167 37.99110
6 1 log10_insulin_10wk 16 80.37433 4.114024 74.250773 80.37433
7 1 log10_insulin_10wk 19 54.83012 5.476587 48.370980 55.15007
Each row shows a different peak; the columns show the peak location, LOD score and the lower and upper interval endpoints. Note that we now have two peaks on chromosome 16, one at 17.5 Mb and one at 80.4 Mb.
Challenge 1
Find peaks in the genome scan object called lod_add_loco
that meet a threshold of 3 and are in the interval described by a 2
point LOD drop from the peak. How many peaks meet the LOD threshold of 3
and lie within the interval defined by a 2 point LOD drop from the
maximum peaks on each chromosome?
R
find_peaks(scan1_output = lod_add_loco,
map = cross$pmap,
threshold = 3,
drop = 2)
OUTPUT
lodindex lodcolumn chr pos lod ci_lo ci_hi
1 1 log10_insulin_10wk 2 138.94475 7.127351 63.943187 156.83772
2 1 log10_insulin_10wk 5 103.41486 3.130862 41.967549 132.28428
3 1 log10_insulin_10wk 7 144.18230 5.724018 129.414016 144.18230
4 1 log10_insulin_10wk 9 83.67606 3.865635 17.504307 111.02206
5 1 log10_insulin_10wk 12 25.14494 4.310493 9.998200 34.23274
6 1 log10_insulin_10wk 14 22.24292 3.974322 6.241951 68.04655
7 1 log10_insulin_10wk 16 80.37433 4.114024 3.804882 96.52406
8 1 log10_insulin_10wk 19 54.83012 5.476587 47.361847 56.37100
This produces 8 peaks on 8 different chromosomes that meet a LOD threshold of 3 and are within the interval defined by a 2-LOD drop from the maximum peak on each chromosome.
Challenge 2
1). Calculate the 90% Bayes credible interval. For chromosome 2, what
is the range of the interval that has a 90% chance of containing the
true QTL position?
2). Compare with the 95% Bayes credible interval calculated earlier. How
does the interval change as you increase the probability? Why?
1). This produces a range from 118.0 to 149.6 Mb.
R
pks = find_peaks(scan1_output = lod_add_loco,
map = cross$pmap,
prob = 0.90,
expand2markers = FALSE)
subset(pks, chr == '2')
OUTPUT
lodindex lodcolumn chr pos lod ci_lo ci_hi
1 1 log10_insulin_10wk 2 138.9448 7.127351 117.9557 149.5774
2). This produces a range from 64.9 to 149.6 Mb, which is much broader than the 90% interval. The interval widens because the probability that the interval contains the true QTL position has increased.
R
pks = find_peaks(scan1_output = lod_add_loco,
map = cross$pmap,
prob = 0.95,
expand2markers = FALSE)
subset(pks, chr == '2')
OUTPUT
lodindex lodcolumn chr pos lod ci_lo ci_hi
1 1 log10_insulin_10wk 2 138.9448 7.127351 64.94939 149.5774
Key Points
- LOD peaks and support intervals can be identified with
find_peaks()
. - The Bayesian Credible Interval estimates the width of the support interval around a QTL peak.
- Using a higher
prob
value for the Bayesian Credible Interval results in a wider support interval.