Example Gene Expression DatasetsExplore a Gene Expression Dataset


Basic inference for high-throughput data


Figure 1


Figure 2

This implies that:


Figure 3


Figure 4

Normal qq-plots for one gene. Left plot shows first group and right plot shows second group.
Normal qq-plots for one gene. Left plot shows first group and right plot shows second group.

Procedures for Multiple Comparisons


Figure 1

confusion matrix showing specificity, sensitivity and Type I and Type II errors
confusion matrix showing specificity, sensitivity and Type I and Type II errors

Error Rates


Figure 1

Null distribution showing type I error as alpha
Null distribution showing type I error as alpha

Figure 2

Alternative hypothesis showing type II error as beta
Alternative hypothesis showing type II error as beta

Figure 3

confusion matrix showing error rates
confusion matrix showing error rates

The Bonferroni Correction


False Discovery Rate


Figure 1

confusion matrix showing error rates
confusion matrix showing error rates

Figure 2

Q (false positives divided by number of features called significant) is a random variable. Here we generated a distribution with a Monte Carlo simulation.
Q (false positives divided by number of features called significant) is a random variable. Here we generated a distribution with a Monte Carlo simulation.

Figure 3

Histogram of p-values. Monte Carlo simulation was used to generate data with m_1 genes having differences between groups.
Histogram of p-values. Monte Carlo simulation was used to generate data with m_1 genes having differences between groups.

Figure 4

Histogram of p-values with breaks at every 0.01. Monte Carlo simulation was used to generate data with m_1 genes having differences between groups.
Histogram of p-values with breaks at every 0.01. Monte Carlo simulation was used to generate data with m_1 genes having differences between groups.

Figure 5

Plotting p-values plotted against their rank illustrates the Benjamini-Hochberg procedure. The plot on the right is a close-up of the plot on the left.
Plotting p-values plotted against their rank illustrates the Benjamini-Hochberg procedure. The plot on the right is a close-up of the plot on the left.

Figure 6

FDR estimates plotted against p-value.
FDR estimates plotted against p-value.

Figure 7

Histogram of Q (false positives divided by number of features called significant) when the alternative hypothesis is true for some features.
Histogram of Q (false positives divided by number of features called significant) when the alternative hypothesis is true for some features.

Direct Approach to FDR and q-values


Figure 1

p-value histogram with pi0 estimate.
p-value histogram with pi0 estimate.

Figure 2

q-values versus p-values.
q-values versus p-values.

Basic EDA for high-throughput data


Figure 1


Figure 2


Figure 3


Figure 4


Figure 5


Figure 6


Figure 7


Figure 8


Principal Components AnalysisWhat is a principal component?How many principal components do we need?Using PCA to analyse gene expression dataUsing PCA output in further analysis


Figure 1


Figure 2

Simulated twin pair heights.
Simulated twin pair heights.

Figure 3

Twin height scatterplot (left) and MA-plot (right).
Twin height scatterplot (left) and MA-plot (right).

Figure 4

Histograms comparing variances between log-transformed prostate weight and benign enlargement.
Histograms comparing variances between log-transformed prostate weight and benign enlargement.

Figure 5

Screeplot showing proportion of variance explained by each principal component.
Screeplot showing proportion of variance explained by each principal component.

Figure 6

Biplot of first two principal components.
Biplot of first two principal components.

Figure 7

Biplot of first two principal components.
Biplot of first two principal components.

Figure 8

Biplot of principal components one and two showing two groups for PC1 according to gene expression.
Biplot of principal components one and two showing two groups for PC1 according to gene expression.

Figure 9

Plot of principal component loadings show magnitude and direction of gene probes.
Plot of principal component loadings show magnitude and direction of gene probes.

Figure 10

Pairsplot of principal components show clusters on PC1 only.
Pairsplot of principal components show clusters on PC1 only.

Statistical ModelsStatistical Models


Figure 1

Number of people that win the lottery obtained from Monte Carlo simulation.
Number of people that win the lottery obtained from Monte Carlo simulation.

Figure 2

MA plot of simulated RNA-seq data. Replicated measurements follow a Poisson distribution.
MA plot of simulated RNA-seq data. Replicated measurements follow a Poisson distribution.

Figure 3

MA plot of replicated RNA-seq data.
MA plot of replicated RNA-seq data.

Figure 4

Variance versus mean plot. Summaries were obtained from the RNA-seq data.
Variance versus mean plot. Summaries were obtained from the RNA-seq data.

Figure 5

Palindrome count histogram.
Palindrome count histogram.

Figure 6

Likelihood versus lambda.
Likelihood versus lambda.

Figure 7

Observed counts versus theoretical Poisson counts.
Observed counts versus theoretical Poisson counts.

Figure 8

Histograms of biological variance and technical variance.
Histograms of biological variance and technical variance.

Figure 9

Normal qq-plot for sample standard deviations.
Normal qq-plot for sample standard deviations.

Figure 10

Histograms of sample standard deviations and densities of estimated distributions.
Histograms of sample standard deviations and densities of estimated distributions.

Figure 11

qq-plot (left) and density (right) demonstrate that model fits data well.
qq-plot (left) and density (right) demonstrate that model fits data well.