Inference for High-dimensional Data: All Images

Beta

Inference for High-dimensional Data

Example Gene Expression DatasetsExplore a Gene Expression Dataset

Basic inference for high-throughput data

Figure 1

Figure 2

This implies that:

Figure 3

Figure 4

Normal qq-plots for one gene. Left plot shows first group and right plot shows second group.

Procedures for Multiple Comparisons

Figure 1

confusion matrix showing specificity, sensitivity and Type I and Type II errors

Error Rates

Figure 1

Null distribution showing type I error as alpha

Figure 2

Alternative hypothesis showing type II error as beta

Figure 3

confusion matrix showing error rates

The Bonferroni Correction

False Discovery Rate

Figure 1

confusion matrix showing error rates

Figure 2

Q (false positives divided by number of features called significant) is a random variable. Here we generated a distribution with a Monte Carlo simulation.

Figure 3

Histogram of p-values. Monte Carlo simulation was used to generate data with m_1 genes having differences between groups.

Figure 4

Histogram of p-values with breaks at every 0.01. Monte Carlo simulation was used to generate data with m_1 genes having differences between groups.

Figure 5

Plotting p-values plotted against their rank illustrates the Benjamini-Hochberg procedure. The plot on the right is a close-up of the plot on the left.

Figure 6

FDR estimates plotted against p-value.

Figure 7

Histogram of Q (false positives divided by number of features called significant) when the alternative hypothesis is true for some features.

Direct Approach to FDR and q-values

Figure 1

p-value histogram with pi0 estimate.

Figure 2

q-values versus p-values.

Basic EDA for high-throughput data

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Principal Components AnalysisWhat is a principal component?How many principal components do we need?Using PCA to analyse gene expression dataUsing PCA output in further analysis

Figure 1

Figure 2

Simulated twin pair heights.

Figure 3

Twin height scatterplot (left) and MA-plot (right).

Figure 4

Histograms comparing variances between log-transformed prostate weight and benign enlargement.

Figure 5

Screeplot showing proportion of variance explained by each principal component.

Figure 6

Biplot of first two principal components.

Figure 7

Biplot of first two principal components.

Figure 8

Biplot of principal components one and two showing two groups for PC1 according to gene expression.

Figure 9

Plot of principal component loadings show magnitude and direction of gene probes.

Figure 10

Pairsplot of principal components show clusters on PC1 only.

Statistical ModelsStatistical Models

Figure 1

Number of people that win the lottery obtained from Monte Carlo simulation.

Figure 2

MA plot of simulated RNA-seq data. Replicated measurements follow a Poisson distribution.

Figure 3

MA plot of replicated RNA-seq data.

Figure 4

Variance versus mean plot. Summaries were obtained from the RNA-seq data.

Figure 5

Palindrome count histogram.

Figure 6

Likelihood versus lambda.

Figure 7

Observed counts versus theoretical Poisson counts.

Figure 8

Histograms of biological variance and technical variance.

Figure 9

Normal qq-plot for sample standard deviations.

Figure 10

Histograms of sample standard deviations and densities of estimated distributions.

Figure 11

qq-plot (left) and density (right) demonstrate that model fits data well.