SpatialAnno for MOB • SpatialAnno

mouse olfactory bulb (MOB)

We obtained the mouse olfactory bulb ST data from the spatial transcriptomics research website (https://www.spatialresearch.org/). This data consists of gene expression levels in form of read counts which are collected for a number of spatial locations. We followed to focus on the mouse OB section 12, which contains 16,034 genes and 282 spatial locations. The gene expression of mouse OB section 12 and ground truth are both stored in the R package SpatialAnno.

load the MOB dataset

First, we load the MOB dataset and extract the position of each spot from spot name. The four anatomic layers manually annotated based on H&E staining was taken as ground truth.

set.seed(99)
library(SpatialAnno)
library(Seurat)

load(paste0(path.package("SpatialAnno"), "/extdata/Rep12_MOB_count_matrix-1.RData"))
print(MOB_raw[1:6,1:6])

## 6 x 6 sparse Matrix of class "dgCMatrix"
##         16.918x16.996 18.017x17.034 20.075x17.059 18.979x17.065 21.937x16.967
## Plekha1             5             1             .             .             1
## Cnih3               1             1             1             .             .
## Gpc5                1             .             .             .             .
## Sirt3               2             1             .             .             .
## Rab8a               5             1             .             .             1
## Nadk                1             1             .             1             1
##         20.899x17.015
## Plekha1             .
## Cnih3               1
## Gpc5                1
## Sirt3               .
## Rab8a               .
## Nadk                .

pos = cbind.data.frame(x=as.numeric(sapply(strsplit(colnames(MOB_raw),split="x"),"[",1)),y=as.numeric(sapply(strsplit(colnames(MOB_raw),split="x"),"[",2)))
head(pos)

##        x      y
## 1 16.918 16.996
## 2 18.017 17.034
## 3 20.075 17.059
## 4 18.979 17.065
## 5 21.937 16.967
## 6 20.899 17.015

load(paste0(path.package("SpatialAnno"), "/extdata/Rep12_MOB_manual_annotation.RData"))
y2 = Rep12_MOB_manual_annotation
y2[y2=="GCL"] = "GC"
y2[y2=="MCL"] = "M/TC"
y2[y2=="ONL"] = "OSNs"
y2[y2=="GL"] = "PGC"

Then we construct the Seurat object with the raw count of MOB.

MOB <- CreateSeuratObject(counts = MOB_raw, project = "MOB")

Then we perform the normalization using function SCTransform implemented in R package Seurat.

MOB = SCTransform(MOB)
X = t(MOB@assays$SCT@scale.data)

Then we read the cell-type-specific marker information and construct a cell type marker matrix. Top four DEGs of the scRNA-seq data from Gene Expression Omnibus (GEO;accession number GSE121891) are selected as markers based on log-fold change.

marker <- vector("list", 5)
names(marker) = c("GC", "PGC", "M/TC", "OSNs", "EPL-IN")
marker[[1]] = c("Gria2", "Meis2", "Prkca", "Penk")
marker[[2]] = c("Nppa", "Nrsn1", "Nxph1", "Th")
marker[[3]] = c("Cdhr1", "Slc17a7", "Olfm1", "Reln")
marker[[4]] = c("Gng13", "S100a5", "Omp", "Fam213b")
marker[[5]] = c("Kit", "Thy1",  "Dner", "Spock2")
rho <- marker_list_to_mat(marker, TRUE)
colnames(rho)[ncol(rho)] = "Unknown"
head(marker)

## $GC
## [1] "Gria2" "Meis2" "Prkca" "Penk" 
## 
## $PGC
## [1] "Nppa"  "Nrsn1" "Nxph1" "Th"   
## 
## $`M/TC`
## [1] "Cdhr1"   "Slc17a7" "Olfm1"   "Reln"   
## 
## $OSNs
## [1] "Gng13"   "S100a5"  "Omp"     "Fam213b"
## 
## $`EPL-IN`
## [1] "Kit"    "Thy1"   "Dner"   "Spock2"

Run SpatialAnno on dataset MOB

Then we find the neighborhoods using the function getneighborhhod_fast implemented in our package SpatialAnno, with specifying the cutoff as 1.2 to ensure that the median number of neighboorhoods is about 4, as MOB was sequenced on the platform ST. After obtaining the sparse neighborhoods matrix Adj_sp, we can run SpatialAnno with normalized gene expression matrix X, sparse neighborhoods matrix Adj_sp, and a list of markers marker. Note that we choose the initial value from annotation methods SCINA due to the large number of non-markers.

Adj_sp <- getneighborhood_fast(as.matrix(pos), cutoff = 1.2)
fit <- SpatialAnno(X = X, Adj_sp = Adj_sp, marker = marker, initial = "SCINA")

## [1] "Gria2" "Meis2" "Penk"  "Prkca"
## [1] "Nppa"  "Nrsn1" "Nxph1" "Th"   
## [1] "Cdhr1"   "Olfm1"   "Reln"    "Slc17a7"
## [1] "Fam213b" "Gng13"   "Omp"     "S100a5" 
## [1] "Dner"   "Kit"    "Spock2" "Thy1"  
## iter = 2, loglik= -1022883.466774, dloglik=0.999524 
## iter = 3, loglik= -1022389.735795, dloglik=0.000483 
## iter = 4, loglik= -1022343.565449, dloglik=0.000045 
## iter = 5, loglik= -1022330.201749, dloglik=0.000013 
## iter = 6, loglik= -1022317.792909, dloglik=0.000012 
## iter = 7, loglik= -1022308.740946, dloglik=0.000009 
## iter = 8, loglik= -1022297.279651, dloglik=0.000011 
## iter = 9, loglik= -1022291.161422, dloglik=0.000006 
## iter = 10, loglik= -1022287.419029, dloglik=0.000004 
## iter = 11, loglik= -1022280.593982, dloglik=0.000007 
## iter = 12, loglik= -1022277.870902, dloglik=0.000003 
## iter = 13, loglik= -1022274.239868, dloglik=0.000004 
## iter = 14, loglik= -1022271.060462, dloglik=0.000003 
## iter = 15, loglik= -1022269.511761, dloglik=0.000002 
## iter = 16, loglik= -1022268.370673, dloglik=0.000001 
## iter = 17, loglik= -1022267.370783, dloglik=0.000001

We demonstrate the output of SpatialAnno, which is a list contains many items. We will briefly explain them one by one in the following part.

str(fit)

## List of 13
##  $ R      : num [1:282, 1:6] 1.00 2.41e-22 1.80e-40 8.22e-39 6.75e-25 ...
##  $ xi     : num 2.3
##  $ type   : int [1:282, 1] 1 2 4 4 3 2 3 1 3 4 ...
##  $ alpha_m: num [1:20, 1] -0.252 -0.453 -0.594 -0.434 -0.129 ...
##  $ bet_m  : num [1:20, 1:6] 0.935 1.681 2.205 1.61 0 ...
##  $ mu_m   : num [1:20, 1:6] 0.683 1.228 1.611 1.176 -0.129 ...
##  $ sigma_m: num [1:20, 1] 1.044 0.888 0.837 0.87 0.82 ...
##  $ Ez_u   : num [1:282, 1:15] -18.23 8.72 2 1.82 13.02 ...
##  $ Mu_u   : num [1:6, 1:15] -14.93 6.49 10.53 1.11 1.21 ...
##  $ Sgm_u  : num [1:15, 1:15] 10.37 -4.65 2.53 -2.71 1.55 ...
##  $ W_u    : num [1:2980, 1:15] -0.01471 -0.01073 0.00933 -0.03249 -0.02646 ...
##  $ Lam_u  : num [1:2980, 1] 0.938 0.961 0.814 0.634 0.962 ...
##  $ loglik : num [1:16] -2.15e+09 -1.02e+06 -1.02e+06 -1.02e+06 -1.02e+06 ...

‘R’ the estimator of probability that each spot was assigned to these five cell types or “Unknown”. The order of the cell types is the same as that of cell types in gene marker matrix.
‘xi’ the estimator of smooth parameter.
‘type’ the predicated cell types for each spot.
‘alpha_m’ the estimator of base expression level.
‘bet_m’ the estimator of extra expression level.
‘mu_m’ the mean of cell types in marker group.
‘sigma_m’ the covariance matrix of cell types in marker group.
‘Ez_u’ the low-dimensional embedding.
‘Mu_u’ the mean of cell types in non-marker group.
‘Sgm_u’ the covariance matrix of cell types in non-marker group.
‘W_u’ the estimator of factor loading matrix.
‘Lam_u’ the variance of error in non-marker group.
‘loglik’ the vector of log-likelihood of SpatialAnno.

plot the annotation results

The predictions can be obtained in the following way. Then we can plot the annotation results of SpatialAnno on the spatial positions using R package ggplot2.

prediction = colnames(rho)[fit$type]
print(prediction[1:20])

##  [1] "GC"      "PGC"     "OSNs"    "OSNs"    "M/TC"    "PGC"     "M/TC"   
##  [8] "GC"      "M/TC"    "OSNs"    "OSNs"    "OSNs"    "OSNs"    "OSNs"   
## [15] "OSNs"    "Unknown" "PGC"     "Unknown" "GC"      "GC"

library(ggplot2)
library(scales)
library(RColorBrewer)
colfunc <- colorRampPalette(c("red", "white"))
fit_type = colnames(rho)[fit$type]
dat = data.frame(pos[,1], pos[,2], factor(fit_type))
names(dat)= c("imagerow", "imagecol", "Cell type")

p1 <- ggplot(dat, aes(x=imagerow, y=imagecol, color=`Cell type`)) +
  geom_point(size = 3, alpha = 0.5) +
  theme(axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        axis.ticks = element_blank())+
  guides(colour = guide_legend(override.aes = list(size = 3))) + scale_color_manual(values=c(c("#FFD700", colfunc(100)[(7)*5], brewer.pal(9, "Greens")[5],hue_pal()(4)[4], brewer.pal(9, "Blues")[7]), "#808080"))
p1

calculate Kappa, mF1, and acc

After obtaining the predictions, we can calculate Kappa, mF1 (mean F1), and acc to evalute the performance of annotation results by SpatialAnno. The function cohen.kappa to calculate Kappa is implemented in the package psych. The function evaluate to calculate mF1 is implemented in our package SpatialAnno. The Kappa is

library(psych)
idx = which(y2!="Unknown" & prediction!="Unknown")
Kappa = cohen.kappa(x = cbind(y2[idx], prediction[idx]))$kappa
print(Kappa)

## [1] 0.7391692

The mF1 is

mF1 = mean(evaluate(y2[idx], prediction[idx])$F1)
print(mF1)

## [1] 0.8138647

The acc is

acc = mean(y2[idx] == prediction[idx])
print(acc)

## [1] 0.8

extract the embeddings

Then we extract the embeddings from the output of SpatialAnno, and plot the tSNE and calculate ARI with the function adjustedRandIndex implemented in the R package mclust

library(scater)
embedding = fit$Ez_u
print(head(embedding))

##            [,1]      [,2]       [,3]      [,4]       [,5]        [,6]      [,7]
## [1,] -18.228770 15.640597 -8.5896255  4.426338  1.0558159   7.3838150 -5.827854
## [2,]   8.717592 -7.960629 -5.5023852  1.197099  3.7835684  -0.8201792 -2.920445
## [3,]   2.001903 -3.850211  1.2476928 -1.565176 -0.4773491 -18.8862657 21.000166
## [4,]   1.823851 -3.460382  5.1927435 -1.211686  2.2126560 -13.5406390 11.973193
## [5,]  13.017165  7.302807 -0.3653341 -1.301280 -2.0256613   4.9303242 -2.443902
## [6,]   4.938710 -2.023808 -1.9841202  3.138582  1.5821686   2.0986954 -1.698943
##           [,8]        [,9]      [,10]       [,11]      [,12]       [,13]
## [1,]  2.264353 -3.69928581  1.0718249 -1.45476754 -1.0693397  0.04821921
## [2,]  5.343680 -2.33882184  0.5669334 -3.82606817  0.7485158  1.19534126
## [3,]  4.764045 -5.68965761 -5.8818437  0.84893472  3.1287520  1.31251372
## [4,]  5.097046  0.08194153 -5.1038824 -0.30378157 12.5101063 -8.82683819
## [5,] -3.284255 -2.01694464  0.6703394 -0.04856227 -1.1720375 -0.62960772
## [6,]  3.484273 -0.97269636 -0.8826282  3.87656553 -0.2722532  0.31662710
##         [,14]        [,15]
## [1,] 1.189905  2.935940083
## [2,] 2.670998 -2.053067683
## [3,] 3.041590 -5.033696367
## [4,] 7.921903 -5.806457120
## [5,] 3.238418  0.008782105
## [6,] 3.724550  1.675002258

tsne = calculateTSNE(t(as.matrix(embedding)))

The tSNE plot is

dat = as.data.frame(tsne)
colnames(dat) = c("X", "Y")
dat$cluster = prediction
library(ggplot2)
p1 <- ggplot(dat, aes(x=X, y=Y, color=cluster)) +
  geom_point(size = 1, alpha = 0.5) +
  theme_Publication()+
  theme(legend.position = "right",
        legend.text = element_text(size=14),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        axis.ticks = element_blank())+
  guides(colour = guide_legend(override.aes = list(size = 3), ncol=1)) + scale_color_manual(values=c(c("#FFD700", colfunc(100)[(7)*5], brewer.pal(9, "Greens")[5],hue_pal()(4)[4], brewer.pal(9, "Blues")[7]), "#808080"))
p1

Then we perform the clustering analysis with GMM (other clustering methods are also can be used) on embeddings and calculate ARI. The ARI is

fit2 = Mclust(embedding, G = 5:10)
ARI = adjustedRandIndex(y2, fit2$classification)
print(ARI)

## [1] 0.4637656

We can also plot RGB plot with the function plot_RGB implemented in PRECAST. The RGB plot is demonstrated as follows

library(PRECAST)
tsne3dim = calculateTSNE(t(embedding), ncomponents = 3)
pList <- plot_RGB(pos, tsne3dim, pointsize = 2)
pList

gene expression plot of markers

We can also plot the expression levels of corresponding cell type-specific marker genes.

genes = c("Kit", "Penk","Cdhr1", "S100a5", "Th")
for (i in 1:5){
  if (i == 1){
    idx = match(genes[i], colnames(X))
    dat = data.frame(pos[,1], pos[,2], X[,idx])
    names(dat)= c("imagerow", "imagecol", "cluster")
    dat$gene = genes[i]
  }else{
    idx = match(genes[i], colnames(X))
    dat2 = data.frame(pos[,1], pos[,2], X[,idx])
    names(dat2)= c("imagerow", "imagecol", "cluster")
    dat2$gene = genes[i]
    dat = rbind(dat, dat2)
  }
}
cols <- c("#0571B0",  "#CA0020")
quant = 0.5
med <- apply(MOB[["SCT"]]@scale.data[c("Kit", "Penk","Cdhr1", "S100a5", "Th"),], 1, quantile, quant)
dat$gene = factor(dat$gene, levels = genes)
dat$marker = "marker"
p2 <- ggplot(dat, aes(x=imagerow, y=imagecol, color=cluster))+
  geom_point(size = 3) + 
  facet_grid(marker~gene, scales = "free")+
  scale_colour_gradient2(
    low = cols[1],
    mid = "white",
    high = cols[2], midpoint = 0.5)+
  theme_Publication()+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        strip.text.x = element_text(face = "italic"))
p2