Performing a genome scan with binary traits

Last updated on 2024-10-21 | Edit this page

Overview

Questions

  • “How do I create a genome scan for binary traits?”

Objectives

  • Convert phenotypes to binary values.
  • Use logistic regression for genome scans with binary traits.
  • Plot and compare genome scans for binary traits.

The genome scans above were performed assuming that the residual variation followed a normal distribution. This will often provide reasonable results even if the residuals are not normal, but an important special case is that of a binary trait, with values 0 and 1, which is best treated differently. The scan1 function can perform a genome scan with binary traits by logistic regression, using the argument model="binary". (The default value for the model argument is "normal".) At present, we can not account for relationships among individuals in this analysis.

Let’s look at the phenotypes in the cross again.

R

head(cross$pheno)

OUTPUT

          log10_insulin_10wk agouti_tan tufted
Mouse3051              1.399          1      0
Mouse3551              0.369          1      1
Mouse3430              0.860          0      1
Mouse3476              0.800          1      0
Mouse3414              1.370          0      0
Mouse3145              1.783          1      0

There are two binary traits called “agouti_tan”, and “tufted” which are related to coat color and shape.

We perform a binary genome scan in a similar manner to mapping continuous traits by using scan1. When we mapped insulin, there was a hidden argument called model which told qtl2 which mapping model to use. There are two options: normal, the default, and binary. The normal argument tells qtl2 to ues a "normal" (least squares) linear model. To map a binary trait, we will includemodel = “binary”` to indicate that the phenotype is a binary trait with values 0 and 1.

R

lod_agouti <- scan1(genoprobs = probs, 
                    pheno     = cross$pheno[,'agouti_tan'], 
                    addcovar  = addcovar, 
                    model     = "binary")

Let’s plot the result and see if there is a peak.

R

plot_scan1(x    = lod_agouti, 
           map  = cross$pmap, 
           main = 'Agouti')

Yes! There is a big peak on chromosome 2. Let’s zoom in on chromosome 2.

R

plot_scan1(x    = lod_agouti, 
           map  = cross$pmap, 
           chr  = 2,
           main = 'Agouti')

We can use find_peaks to find the position of the highest LOD score.

R

find_peaks(scan1_output = lod_agouti, 
           map          = cross$pmap)

This turns out to be a well-known coat color locus for agouti coat color which contains the nonagouti gene. Mice carrying two black alleles will have a black coat, and mice carrying one or no black alleles will have agouti coats.

Challenge 1: How many mice have black coats?

Look at the frequency of the black (0) and agouti (1) phenotypes. What proportion of the mice are black? Can you use what you learned about how the nonagouti locus works and the cross design to explain the frequency of black mice?

First, get the number of black and agouti mice.

R

tbl <- table(cross$pheno[,"agouti_tan"])
tbl

OUTPUT


  0   1
125 356 

Then use the number of mice to calculate the proportion with each coat color.

R

tbl / sum(tbl)

OUTPUT


   0    1
0.26 0.74 

We can see that the black (0) mice occur about 25 % of the time. If the A allele causes mice to have black coats when it is recessive, and if a is the agouti allele, then, when breeding two heterozygous (Aa) mice together, we expect mean allele frequencies of:

Allele Frequency Coat Color
AA 0.25 black
Aa 0.5 agouti
aa 0.25 agouti

From this, we can see that about 25% of the mice should have black coats.

Challenge 2: Map the “tufted” phenotype.

Map the tufted phenotype an determine if there are any tall peaks for this trait.

First, map the trait.

R

lod_tufted <- scan1(genoprobs = probs, 
                    pheno     = cross$pheno[,"tufted"], 
                    addcovar  = addcovar, 
                    model     = "binary")

Then, plot the LOD score.

R

plot_scan1(x    = lod_tufted, 
           map  = cross$pmap, 
           main = "Tufted")

There is a large peak on chromosome 17. This is a known locus associated with the Itpr3 gene near 27.3 Mb on chromsome 17.

Key Points

  • “A genome scan for binary traits (0 and 1) requires special handling; scans for non-binary traits assume normal variation of the residuals.”
  • “A genome scan for binary traits is performed with logistic regression.”