Putting it all Together
Last updated on 2024-10-01 | Edit this page
Estimated time: 85 minutes
Overview
Questions
- How do I put the whole spatial transcriptomics analysis together?
Objectives
- Understand the steps in spatial transcriptomics analysis.
- Perform a spatial transcriptomics analysis on another sample.
Introduction
Over the past day and a half, we have reviewed the steps in a spatial transcriptomics analysis. There were many steps and many lines of code. In this final lesson, you will run through the spatial transcriptomics analysis on your own with another sample.
Challenge 1: Reading in a new sample.
Look back through the lessons. What libraries and auxiliary functions did we use? Load those in now.
R
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(spacexr))
source("https://raw.githubusercontent.com/smcclatchy/spatial-transcriptomics/main/code/spatial_utils.R")
Challenge 2: Reading in a new sample.
The publication which used this data has several tissue sections. As part of the setup, you should have downloaded sample 151508. Look in the Data Preprocessing lesson and read in the filtered sample file for this sample.
R
st_obj <- Load10X_Spatial(data.dir = "./data/151508",
filename = "151508_raw_feature_bc_matrix.h5")
Challenge 3: Reading in sample metadata.
In the Data Preprocessing lesson, we showed you how to read in sample metadata and add it to the Seurat object. Find that code and add the new sample’s metadata to your Seurat object.
R
# Read in the tissue spot positions.
tissue_position <- read_csv("./data/151508/spatial/tissue_positions_list.csv",
col_names = FALSE, show_col_types = FALSE) %>%
column_to_rownames('X1')
colnames(tissue_position) <- c("in_tissue",
"array_row",
"array_col",
"pxl_row_in_fullres",
"pxl_col_in_fullres")
# Align the spot barcodes to match between the Seurat object and the new
# metadata.
tissue_position <- tissue_position[Cells(st_obj),]
stopifnot(rownames(tissue_position) == Cells(st_obj))
# Add the metadata to the Seurat object.
st_obj <- AddMetaData(object = st_obj, metadata = tissue_position)
Challenge 4: Plot the spot positions over the tissue section.
Look for the code to plot the number of counts in each spot in the Data Preprocessing lesson and adapt it to your tissue sample.
R
SpatialFeaturePlot(st_obj, features = "nCount_Spatial")
Challenge 5: Filter counts to retain expressed genes.
Look in the Data Preprocessing lesson and filter the counts in your Seurat object to retain genes which have a total sum across all samples of at least 10 counts. How many genes did you end up with?
R
# Get the counts from the Seurat object.
counts <- LayerData(st_obj, 'counts')
# Sum total counts across all samples for each gene.
gene_sums <- rowSums(counts)
# Select genes with total counts above our threshold.
keep_genes <- which(gene_sums > 10)
# Filter the Seurat object.
st_obj <- st_obj[keep_genes,]
# Print out the number of genes that we retained.
print(length(keep_genes))
OUTPUT
[1] 15311
Challenge 6: Normalize data using SCTransform
In the Normalization lesson, we used the SCTransform to normalize the counts. Normalize the counts in your Seurat object an plot the mean expression versus the residual variance for each gene. Use the built in Seurat function to do this. How many variable genes did the method select?
R
# Perform SCTransform on the counts.
st_obj <- SCTransform(st_obj,
assay = "Spatial")
OUTPUT
Running SCTransform on assay: Spatial
OUTPUT
Running SCTransform on layer: counts
OUTPUT
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
WARNING
Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'
OUTPUT
Variance stabilizing transformation of count matrix of size 15311 by 4992
OUTPUT
Model formula is y ~ log_umi
OUTPUT
Get Negative Binomial regression parameters per gene
OUTPUT
Using 2000 genes, 4992 cells
OUTPUT
Found 91 outliers - those will be ignored in fitting/regularization step
OUTPUT
Second step: Get residuals using fitted parameters for 15311 genes
OUTPUT
Computing corrected count matrix for 15311 genes
OUTPUT
Calculating gene attributes
OUTPUT
Wall clock passed: Time difference of 24.90132 secs
OUTPUT
Determine variable features
OUTPUT
Centering data matrix
OUTPUT
Set default assay to SCT
R
# Plot the mean veruss the variance for each gene.
VariableFeaturePlot(st_obj, log = NULL)
OUTPUT
Running SCTransform on assay: Spatial
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 15311 by 4992
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 4992 cells
Found 91 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 15311 genes
Computing corrected count matrix for 15311 genes
Calculating gene attributes
Wall clock passed: Time difference of 36.59206 secs
Determine variable features
Centering data matrix
|===================================================================================================================================================================| 100%
Set default assay to SCT
Challenge 7: Apply dimensionality reduction to your data.
In the Feature Selection lesson, we used the variable genes to calculate principal components of the data. Scale the data, calculate 75 PCs and plot the “Elbow Plot” of PCs versus standard deviation accounted for by each PC.
R
num_pcs <- 75
st_obj <- st_obj %>%
ScaleData() %>%
RunPCA(npcs = num_pcs, verbose = FALSE)
OUTPUT
Centering and scaling data matrix
R
ElbowPlot(st_obj, ndims = num_pcs)
In the next Challenge, you can assign a certain number of principal components and cluster resolution to each student or group of students and compare the clustering results.
Challenge 8: Cluster the spots in your data
In the Feature Selection lesson, we made a call about how many PCs to use. Make some decision about how many PCs you will use and assign this to a variable called “num_pcs”. Also, select a cluster resolution for the clustering function. It may be instructive for each person in the class to use a different number of PCs and compare results. What new column was added to the Seurat object metadata by the clustering algorithm?
We selected 50 PCs and a cluster resoltion of 1 for this exercise. This may not be optimal. You will have a chance to vary these parameters in a later exercise.
R
num_pcs <- 50
st_obj <- st_obj %>%
FindNeighbors(reduction = "pca",
dims = 1:num_pcs) %>%
FindClusters(resolution = 1)
OUTPUT
Computing nearest neighbor graph
OUTPUT
Computing SNN
OUTPUT
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 4992
Number of edges: 226317
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.7726
Number of communities: 10
Elapsed time: 0 seconds
R
colnames(st_obj[[]])[ncol(st_obj[[]])]
OUTPUT
[1] "seurat_clusters"
Challenge 9: Plot UMAP of spot clusters.
Look in the Feature Selection lesson and find the code to run UMAP and plot the spot clusters in UMAP space. You can use the default colors in Seurat.
R
st_obj <- RunUMAP(st_obj,
reduction = 'pca',
dims = 1:num_pcs,
verbose = FALSE)
WARNING
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
R
UMAPPlot(st_obj,
label = TRUE,
pt.size = 2,
group.by = "seurat_clusters",
label.size = 6)
Challenge 10: Plot spot clusters over tissue section
Look in the Feature Selection lesson and find the code to plot the spots in the tissue section, colored by cluster. Compare this to the tissue section Supplemental Figure 5 in Maynard et al..
R
SpatialDimPlot(st_obj,
group.by = "seurat_clusters") +
ggtitle(label = 'Sample 151508')
Challenge 11: Varying number of PCs and cluster resolution.
Return to Challenge 8 and change either the number of PCs or the cluster resolution and re-run the subsequent Challenges. How does this change the output? Does a certain set of values make the SpatialDimPlot look more like the results in the manuscript?
For class discussion.
Key Points
- There are many decisions which need to be made in spatial transcriptomics.
- It is essential to have an understanding of the tissue morphology before proceeding with a spatial transcriptomics analysis.