Putting it all Together
Last updated on 2024-06-20 | Edit this page
Estimated time: 85 minutes
Overview
Questions
- How do I put the whole spatial transcriptomics analysis together?
Objectives
- Understand the steps in spatial transcriptomics analysis.
- Perform a spatial transcriptomics analysis on another sample.
Introduction
Over the past day and a half, we have reviewed the steps in a spatial transcriptomics analysis. There were many steps and many lines of code. In this final lesson, you will run through the spatial transcriptomics analysis on your own with another sample.
R
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(spacexr))
source("https://raw.githubusercontent.com/smcclatchy/spatial-transcriptomics/main/code/spatial_utils.R")
R
st_obj <- Load10X_Spatial(data.dir = "./data/151508",
filename = "151508_raw_feature_bc_matrix.h5")
R
# Read in the tissue spot positions.
tissue_position <- read_csv("./data/151508/spatial/tissue_positions_list.csv",
col_names = FALSE, show_col_types = FALSE) %>%
column_to_rownames('X1')
colnames(tissue_position) <- c("in_tissue",
"array_row",
"array_col",
"pxl_row_in_fullres",
"pxl_col_in_fullres")
# Align the spot barcodes to match between the Seurat object and the new
# metadata.
tissue_position <- tissue_position[Cells(st_obj),]
stopifnot(rownames(tissue_position) == Cells(st_obj))
# Add the metadata to the Seurat object.
st_obj <- AddMetaData(object = st_obj, metadata = tissue_position)
R
SpatialFeaturePlot(st_obj, features = "nCount_Spatial")
![](../fig/final-exercise-rendered-unnamed-chunk-4-1.png)
R
# Get the counts from the Seurat object.
counts <- LayerData(st_obj, 'counts')
# Sum total counts across all samples for each gene.
gene_sums <- rowSums(counts)
# Select genes with total counts above our threshold.
keep_genes <- which(gene_sums > 10)
# Filter the Seurat object.
st_obj <- st_obj[keep_genes,]
# Print out the number of genes that we retained.
print(length(keep_genes))
OUTPUT
[1] 15311
Challenge 6: Normalize data using SCTransform
In the Normalization lesson, we used the SCTransform to normalize the counts. Normalize the counts in your Seurat object an plot the mean expression versus the residual variance for each gene. Use the built in Seurat function to do this. How many variable genes did the method select?
R
# Perform SCTransform on the counts.
st_obj <- SCTransform(st_obj,
assay = "Spatial")
OUTPUT
Running SCTransform on assay: Spatial
OUTPUT
Running SCTransform on layer: counts
OUTPUT
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
WARNING
Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'
OUTPUT
Variance stabilizing transformation of count matrix of size 15311 by 4992
OUTPUT
Model formula is y ~ log_umi
OUTPUT
Get Negative Binomial regression parameters per gene
OUTPUT
Using 2000 genes, 4992 cells
OUTPUT
Found 91 outliers - those will be ignored in fitting/regularization step
OUTPUT
Second step: Get residuals using fitted parameters for 15311 genes
OUTPUT
Computing corrected count matrix for 15311 genes
OUTPUT
Calculating gene attributes
OUTPUT
Wall clock passed: Time difference of 27.08926 secs
OUTPUT
Determine variable features
OUTPUT
Centering data matrix
OUTPUT
Set default assay to SCT
R
# Plot the mean veruss the variance for each gene.
VariableFeaturePlot(st_obj, log = NULL)
![](../fig/final-exercise-rendered-unnamed-chunk-6-1.png)
OUTPUT
Running SCTransform on assay: Spatial
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 15311 by 4992
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 4992 cells
Found 91 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 15311 genes
Computing corrected count matrix for 15311 genes
Calculating gene attributes
Wall clock passed: Time difference of 36.59206 secs
Determine variable features
Centering data matrix
|===================================================================================================================================================================| 100%
Set default assay to SCT
R
num_pcs <- 75
st_obj <- st_obj %>%
ScaleData() %>%
RunPCA(npcs = num_pcs, verbose = FALSE)
OUTPUT
Centering and scaling data matrix
R
ElbowPlot(st_obj, ndims = num_pcs)
![](../fig/final-exercise-rendered-unnamed-chunk-7-1.png)
In the next Challenge, you can assign a certain number of principal components and cluster resolution to each student or group of students and compare the clustering results.
Challenge 8: Cluster the spots in your data
In the Feature Selection lesson, we made a call about how many PCs to use. Make some decision about how many PCs you will use and assign this to a variable called “num_pcs”. Also, select a cluster resolution for the clustering function. It may be instructive for each person in the class to use a different number of PCs and compare results. What new column was added to the Seurat object metadata by the clustering algorithm?
We selected 50 PCs and a cluster resoltion of 1 for this exercise. This may not be optimal. You will have a chance to vary these parameters in a later exercise.
R
num_pcs <- 50
st_obj <- st_obj %>%
FindNeighbors(reduction = "pca",
dims = 1:num_pcs) %>%
FindClusters(resolution = 1)
OUTPUT
Computing nearest neighbor graph
OUTPUT
Computing SNN
OUTPUT
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 4992
Number of edges: 226317
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.7726
Number of communities: 10
Elapsed time: 0 seconds
R
colnames(st_obj[[]])[ncol(st_obj[[]])]
OUTPUT
[1] "seurat_clusters"
R
st_obj <- RunUMAP(st_obj,
reduction = 'pca',
dims = 1:num_pcs,
verbose = FALSE)
WARNING
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
R
UMAPPlot(st_obj,
label = TRUE,
pt.size = 2,
group.by = "seurat_clusters",
label.size = 6)
![](../fig/final-exercise-rendered-unnamed-chunk-9-1.png)
Challenge 10: Plot spot clusters over tissue section
Look in the Feature Selection lesson and find the code to plot the spots in the tissue section, colored by cluster. Compare this to the tissue section Supplemental Figure 5 in Maynard et al..
R
SpatialDimPlot(st_obj,
group.by = "seurat_clusters") +
ggtitle(label = 'Sample 151508')
![](../fig/final-exercise-rendered-unnamed-chunk-10-1.png)
Challenge 11: Varying number of PCs and cluster resolution.
Return to Challenge 8 and change either the number of PCs or the cluster resolution and re-run the subsequent Challenges. How does this change the output? Does a certain set of values make the SpatialDimPlot look more like the results in the manuscript?
For class discussion.