- reference resources:
- Introduction to Splatter (bioconductor.org)[1]
- Oshlack/splatter: Simple simulation of single-cell RNA sequencing data (github.com)[2]
- splatPop: simulating single-cell data for populations[3]
- Detailed explanation of splatter package parameters (qq.com)
preparation
Recently, I saw a package in the article on single cell algorithm, which can easily generate various simulation data according to needs. Isn't it very convenient to make ground truth?
Just try.
Just install it directly and it's easy to:
copyBiocManager::install("splatter") p_load(splatter, scuttle, scater)
1 - function introduction
y1s1, the logo of this package is pretty good:

- Cell group effects: Where multiple, heterogeneous cell-groups are simulated for each individual. These groups could represent different cell-types or the same cell-type before/after a treatment. Group effects include group-specific differential expression (DE) and/or group-specific expression Quantitative Trait Loci (eQTL) effects.
- Conditional effects between individuals: Where individuals are simulated as belonging to different conditional cohorts (e.g. different treatment groups or groups with different disease statuses). Conditional effects include DE and/or eQTL effects.
- Batch effect from multiplexed experimental designs: Like in splat, batch effects are simulated by assigning small batch-specific DE effects to all genes. splatPop allows for the simulation of different patterns of batch effects, such as those resulting from multiplexed sequencing designs.
How does splat ter estimate single cell data?
The core of splat model is to use gamma Poisson and cell counts matrix to generate gene expression data.
The core of the Splat model is a gamma-Poisson distribution used to generate a gene by cell matrix of counts. Mean expression levels for each gene are simulated from a gamma distribution[4] and the Biological Coefficient of Variation is used to enforce a mean-variance trend before counts are simulated from a Poisson distribution[5]. Splat also allows you to simulate expression outlier genes (genes with mean expression outside the gamma distribution) and dropout (random knock out of counts based on mean expression). Each cell is given an expected library size (simulated from a log-normal distribution) that makes it easier to match to a given dataset.
2-splatparames object
All splat simulation related parameters are stored in the splatparames object.
copyparams <- newSimpleParams() params <- newSimpleParams(nGenes = 200, nCells = 10) > params A Params object of class SimpleParams Parameters can be (estimable) or [not estimable], 'Default' or 'NOT DEFAULT' Secondary parameters are usually set during simulation Global: (GENES) (CELLS) [Seed] 200 10 977126 3 additional parameters Mean: (Rate) (Shape) 0.3 0.4 Counts: [Dispersion] 0.1
Access and modify parameters
For the moment, we can understand the splatparames object as a parameter object used by splat model to create simulated single-cell data, which contains all the parameter information of the single-cell model. In addition to the basic information such as gene number and cell number, it also includes information such as mean, batch, confounding factors, outliers and so on. For details, please refer to [[SplatParams detailed parameter introduction]]
visit:
copygetParam(params, "nGenes") #> [1] 10000
Modification:
copy# Set multiple parameters at once (using a list) params <- setParams(params, update = list(nGenes = 8000, mean.rate = 0.5)) # Extract multiple parameters as a list getParams(params, c("nGenes", "mean.rate", "mean.shape")) #> $nGenes #> [1] 8000 #> #> $mean.rate #> [1] 0.5 #> #> $mean.shape #> [1] 0.6 # Set multiple parameters at once (using additional arguments) params <- setParams(params, mean.shape = 0.5, de.prob = 0.2) params #> A Params object of class SplatParams #> Parameters can be (estimable) or [not estimable], 'Default' or 'NOT DEFAULT' #> Secondary parameters are usually set during simulation #> #> Global: #> (GENES) (Cells) [SEED] #> 8000 100 81261 #> #> 29 additional parameters #> #> Batches: #> [Batches] [Batch Cells] [Location] [Scale] [Remove] #> 1 100 0.1 0.1 FALSE #> #> Mean: #> (RATE) (SHAPE) #> 0.5 0.5 #> #> Library size: #> (Location) (Scale) (Norm) #> 11 0.2 FALSE #> #> Exprs outliers: #> (Probability) (Location) (Scale) #> 0.05 4 0.5 #> #> Groups: #> [Groups] [Group Probs] #> 1 1 #> #> Diff expr: #> [PROBABILITY] [Down Prob] [Location] [Scale] #> 0.2 0.5 0.1 0.4 #> #> BCV: #> (Common Disp) (DoF) #> 0.1 60 #> #> Dropout: #> [Type] (Midpoint) (Shape) #> none 0 -1 #> #> Paths: #> [From] [Steps] [Skew] [Non-linear] [Sigma Factor] #> 0 100 0.5 0.1 0.8
Estimating parameters from real data
splat also allows us to estimate parameters directly from real single cell data, single cell experience (SCE) objects.
Create simulation data:
copyset.seed(1) sce <- mockSCE(ncells = 200, ngenes = 2000, nspikes = 100) > sce class: SingleCellExperiment dim: 2000 200 metadata(0): assays(1): counts rownames(2000): Gene_0001 Gene_0002 ... Gene_1999 Gene_2000 rowData names(0): colnames(200): Cell_001 Cell_002 ... Cell_199 Cell_200 colData names(3): Mutation_Status Cell_Cycle Treatment reducedDimNames(0): altExpNames(1): Spikes
splat estimates:
copy> params <- splatEstimate(sce) NOTE: Library sizes have been found to be normally distributed instead of log-normal. You may want to check this is correct. > params A Params object of class SplatParams Parameters can be (estimable) or [not estimable], 'Default' or 'NOT DEFAULT' Secondary parameters are usually set during simulation Global: (GENES) (CELLS) [Seed] 2000 200 977126 29 additional parameters Batches: [BATCHES] [BATCH CELLS] [Location] [Scale] 1 200 0.1 0.1 [Remove] FALSE Mean: (RATE) (SHAPE) 0.002962686167343 0.496997730070513 Library size: (LOCATION) (SCALE) (NORM) 357331.235 11607.2332293176 TRUE Exprs outliers: (PROBABILITY) (Location) (Scale) 0 4 0.5 Groups: [Groups] [Group Probs] 1 1 Diff expr: [Probability] [Down Prob] [Location] [Scale] 0.1 0.5 0.1 0.4 BCV: (COMMON DISP) (DOF) 0.752043426792845 11211.8933424157 Dropout: [Type] (MIDPOINT) (SHAPE) none 2.71153535179343 -1.37209356733765 Paths: [From] [Steps] [Skew] [Non-linear] 0 100 0.5 0.1 [Sigma Factor] 0.8
splat parameter estimation from real (simulated) data includes the following steps:
- Mean parameters are estimated by fitting a gamma distribution to the mean expression levels
- Library size: Library size parameters are estimated by fitting a log-normal distribution to the library sizes. (personal understanding is the library size counted by counts)
- Expression outlier parameters are estimated by determining the number of outliers and fitting a log normal distribution to their difference from the medium
- BCV parameters are estimated using the estimateDisp function from the edgeR package. (confounding variable)
- Dropout parameters are estimated by checking if dropout is present and fitting a logistic function to the relationship between mean expression and proportion of zeros.
For details, see Splat simulation parameters (bioconductor.org)[6]
3 - construct the expression matrix by using the splat parameter estimation results
After configuring the splatparames object (after setting the parameter results for simulation), you can use this parameter object for simulation, that is, the function splatSimulate.
copy> sim <- splatSimulate(params, nGenes = 1000, batchCells = rep(100,10)) Getting parameters... Creating simulation object... Simulating library sizes... Simulating gene means... Simulating BCV... Simulating counts... Simulating dropout (if needed)... Sparsifying assays... Automatically converting to sparse matrices, threshold = 0.95 Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix Skipping 'BCV': estimated sparse size 1.5 * dense matrix Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix Skipping 'TrueCounts': estimated sparse size 2.79 * dense matrix Skipping 'counts': estimated sparse size 2.79 * dense matrix Done! > sim class: SingleCellExperiment dim: 1000 200 metadata(1): Params assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts rownames(1000): Gene1 Gene2 ... Gene999 Gene1000 rowData names(4): Gene BaseGeneMean OutlierFactor GeneMean colnames(200): Cell1 Cell2 ... Cell199 Cell200 colData names(3): Cell Batch ExpLibSize reducedDimNames(0): altExpNames(0):
As you can see, we created a 200x1000 single cell object.
copy> head(rowData(sim)) DataFrame with 6 rows and 4 columns Gene BaseGeneMean OutlierFactor GeneMean <character> <numeric> <numeric> <numeric> Gene1 Gene1 367.75129 1 367.75129 Gene2 Gene2 183.59043 1 183.59043 Gene3 Gene3 460.64653 1 460.64653 Gene4 Gene4 5.16081 1 5.16081 Gene5 Gene5 90.97209 1 90.97209 Gene6 Gene6 81.68033 1 81.68033 > head(colData(sim)) DataFrame with 6 rows and 3 columns Cell Batch ExpLibSize <character> <character> <numeric> Cell1 Cell1 Batch1 364054 Cell2 Cell2 Batch1 337135 Cell3 Cell3 Batch1 353075 Cell4 Cell4 Batch1 354925 Cell5 Cell5 Batch1 357749 Cell6 Cell6 Batch1 356384
We can also visualize it. Here, refer to the visualiseDim function in the package CiteFuse:
copysim <- logNormCounts(sim) # Plot PCA sim <- runPCA(sim) visualiseDim(sim, dimNames = "PCA", colour_by = "Batch")

It can be seen that the output of splatSimulate function is an sce object, which can be used for subsequent single-cell analysis.
This function outputs the following information:
- Cell information (colData)
- Cell - Unique cell identifier.
- Group - The group or path the cell belongs to.
- ExpLibSize - The expected library size for that cell.
- Step (paths only) - How far along the path each cell is.
- Gene information (rowData)
- Gene - Unique gene identifier.
- BaseGeneMean - The base expression level for that gene.
- OutlierFactor - Expression outlier factor for that gene (1 is not an outlier).
- GeneMean - Expression level after applying outlier factors.
- DEFac[Group] - The differential expression factor for each gene in a particular group (1 is not differentially expressed).
- GeneMean[Group] - Expression level of a gene in a particular group after applying differential expression factors.
- Gene by cell information (assays)
- BaseCellMeans - The expression of genes in each cell adjusted for expected library size.
- BCV - The Biological Coefficient of Variation for each gene in each cell.
- CellMeans - The expression level of genes in each cell adjusted for BCV.
- TrueCounts - The simulated counts before dropout.
- Dropout - Logical matrix showing which counts have been dropped in which cells
In fact, it corresponds to the relevant parameters designed in the splatparames object mentioned above.
4 - two modes of simulation
In addition to the above single type cells, the splatSimulate function can also be used to generate multiple single cell clusters or trajectory data:
which simulation method to use. Options are "single" which produces a single population, "groups" which produces distinct groups (eg. cell types), or "paths" which selects cells from continuous trajectories (eg. differentiation processes).
4.1-cluster
Similar to the batchCells parameter (the number of batches cannot be specified directly), it can be used to control the number of batches and the size of each batch. group.prob can also be used to control the proportion and number of groups:
copy# group sim.groups <- splatSimulate(group.prob = c(0.5, 0.5), method = "groups", verbose = FALSE, batchCells = rep(1000,2)) sim.groups <- logNormCounts(sim.groups) sim.groups <- runPCA(sim.groups) visualiseSce(sim.groups, dimNames = "PCA", colour_by = "Group", shape_by = "Batch")

4.2-path(trajectory)
Modify the corresponding method parameter to the paths mode:
copy# path sim.paths <- splatSimulate(de.prob = 0.2, nGenes = 1000, method = "paths", verbose = FALSE, batchCells = rep(200,2)) sim.paths <- logNormCounts(sim.paths) sim.paths <- runPCA(sim.paths) tmp.list$sim.paths.df <- colData(sim.paths) tmp.list$sim.paths.df <- cbind(tmp.list$sim.paths.df, reducedDim(sim.paths, "PCA")[,1:2]) tmp.list$sim.paths.df <- as.data.frame(tmp.list$sim.paths.df) ggplot(tmp.list$sim.paths.df) + geom_point( aes(PC1, PC2, color = Step, shape = Batch) ) + viridis::scale_colour_viridis()

5 - other simulations
batch
In fact, in the above data, I have specified the batch through batchCells.
Because the difference between the group 1 and the group 2 is obviously larger than that in the group 2, it can also be seen that there is a significant difference between the group 1 and the real pca

In fact, splat method is also a suite:
Each of the Splatter simulation methods has it's own convenience function. To simulate a single population use splatSimulateSingle() (equivalent to splatSimulate(method = "single")), to simulate groups use splatSimulateGroups() (equivalent to splatSimulate(method = "groups")) or to simulate paths use splatSimulatePaths() (equivalent to splatSimulate(method = "paths")).
Other methods
Full 15 sets of methods:
copylistSims() #> Splatter currently contains 15 simulations #> #> Splat (splat) #> DOI: 10.1186/s13059-017-1305-0 GitHub: Oshlack/splatter Dependencies: #> The Splat simulation generates means from a gamma distribution, adjusts them for BCV and generates counts from a gamma-poisson. Dropout and batch effects can be optionally added. #> #> Splat Single (splatSingle) #> DOI: 10.1186/s13059-017-1305-0 GitHub: Oshlack/splatter Dependencies: #> The Splat simulation with a single population. #> #> Splat Groups (splatGroups) #> DOI: 10.1186/s13059-017-1305-0 GitHub: Oshlack/splatter Dependencies: #> The Splat simulation with multiple groups. Each group can have it's own differential expression probability and fold change distribution. #> #> Splat Paths (splatPaths) #> DOI: 10.1186/s13059-017-1305-0 GitHub: Oshlack/splatter Dependencies: #> The Splat simulation with differentiation paths. Each path can have it's own length, skew and probability. Genes can change in non-linear ways. #> #> Kersplat (kersplat) #> DOI: GitHub: Oshlack/splatter Dependencies: scuttle, igraph #> The Kersplat simulation extends the Splat model by adding a gene network, more complex cell structure, doublets and empty cells (Experimental). #> #> splatPop (splatPop) #> DOI: 10.1186/s13059-021-02546-1 GitHub: Oshlack/splatter Dependencies: VariantAnnotation, preprocessCore #> The splatPop simulation enables splat simulations to be generated for multiple individuals in a population, accounting for correlation structure by simulating expression quantitative trait loci (eQTL). #> #> Simple (simple) #> DOI: 10.1186/s13059-017-1305-0 GitHub: Oshlack/splatter Dependencies: #> A simple simulation with gamma means and negative binomial counts. #> #> Lun (lun) #> DOI: 10.1186/s13059-016-0947-7 GitHub: MarioniLab/Deconvolution2016 Dependencies: #> Gamma distributed means and negative binomial counts. Cells are given a size factor and differential expression can be simulated with fixed fold changes. #> #> Lun 2 (lun2) #> DOI: 10.1093/biostatistics/kxw055 GitHub: MarioniLab/PlateEffects2016 Dependencies: scran, scuttle, lme4, pscl, limSolve #> Negative binomial counts where the means and dispersions have been sampled from a real dataset. The core feature of the Lun 2 simulation is the addition of plate effects. Differential expression can be added between two groups of plates and optionally a zero-inflated negative-binomial can be used. #> #> scDD (scDD) #> DOI: 10.1186/s13059-016-1077-y GitHub: kdkorthauer/scDD Dependencies: scDD #> The scDD simulation samples a given dataset and can simulate differentially expressed and differentially distributed genes between two conditions. #> #> BASiCS (BASiCS) #> DOI: 10.1371/journal.pcbi.1004333 GitHub: catavallejos/BASiCS Dependencies: BASiCS #> The BASiCS simulation is based on a bayesian model used to deconvolve biological and technical variation and includes spike-ins and batch effects. #> #> mfa (mfa) #> DOI: 10.12688/wellcomeopenres.11087.1 GitHub: kieranrcampbell/mfa Dependencies: mfa #> The mfa simulation produces a bifurcating pseudotime trajectory. This can optionally include genes with transient changes in expression and added dropout. #> #> PhenoPath (pheno) #> DOI: 10.1038/s41467-018-04696-6 GitHub: kieranrcampbell/phenopath Dependencies: phenopath #> The PhenoPath simulation produces a pseudotime trajectory with different types of genes. #> #> ZINB-WaVE (zinb) #> DOI: 10.1038/s41467-017-02554-5 GitHub: drisso/zinbwave Dependencies: zinbwave #> The ZINB-WaVE simulation simulates counts from a sophisticated zero-inflated negative-binomial distribution including cell and gene-level covariates. #> #> SparseDC (sparseDC) #> DOI: 10.1093/nar/gkx1113 GitHub: cran/SparseDC Dependencies: SparseDC #> The SparseDC simulation simulates a set of clusters across two conditions, where some clusters may be present in only one condition.
For example, this scDD considers cells with different conditions:
The scDD simulation samples a given dataset and can simulate differentially expressed and differentially distributed genes between two conditions.
6 - comparison with real data sets
Respective comparison
splat provides a way to observe individual single-cell data sets.
The compareSCEs function accepts list objects:
copyset.seed(1) sce <- mockSCE(ncells = 200, ngenes = 2000, nspikes = 100) params <- splatEstimate(sce) sim1 <- splatSimulate(params, nGenes = 2000) sim2 <-splatSimulate(nGenes = 2000) sim3 <- simpleSimulate(nGenes = 2000, verbose = FALSE) comparison <- compareSCEs(list(sce = sce, sim1 = sim1, sim2 = sim2, sim3 = sim3))
For example, I compare here:
- Simulation data generated by mockSCE;
- Estimate the data generated by splat through the parameters of mockSCE simulation results;
- splat data created in two modes.
copy> head(comparison$ColData) Dataset sum detected total PctZero Cell_001 sce 392907 1502 402665 24.90 Cell_002 sce 398904 1509 405090 24.55 Cell_003 sce 358855 1503 365409 24.85 Cell_004 sce 378909 1527 386163 23.65 Cell_005 sce 384063 1532 389848 23.40 Cell_006 sce 370596 1522 378267 23.90 > head(comparison$RowData) Dataset mean detected MeanCounts VarCounts CVCounts Gene_0001 sce 11.460 48.5 11.460 950.1994 2.689817 Gene_0002 sce 78.560 92.0 78.560 9041.7049 1.210385 Gene_0003 sce 21.505 55.5 21.505 2641.8090 2.390074 Gene_0004 sce 20.780 57.0 20.780 1827.4890 2.057225 Gene_0005 sce 18.290 50.0 18.290 1979.6140 2.432633 Gene_0006 sce 191.455 99.5 191.455 46423.7367 1.125391 MedCounts MADCounts MeanCPM VarCPM CVCPM Gene_0001 0.0 0.0000 29.69738 6420.309 2.698111 Gene_0002 46.0 63.7518 205.45104 62237.453 1.214276 Gene_0003 1.0 1.4826 56.01672 17949.158 2.391687 Gene_0004 2.0 2.9652 54.13480 12419.975 2.058656 Gene_0005 0.5 0.7413 47.92973 13753.695 2.446835 Gene_0006 119.0 126.7623 500.52564 321714.428 1.133206 MedCPM MADCPM MeanLogCPM VarLogCPM CVLogCPM Gene_0001 0.000000 0.000000 2.183822 6.987662 1.2104552 Gene_0002 119.251041 163.890597 6.125446 7.454117 0.4457182 Gene_0003 2.607953 3.866551 2.792780 9.128253 1.0818252 Gene_0004 5.085582 7.539884 2.973293 9.256661 1.0232681 Gene_0005 1.231218 1.825403 2.613443 9.025811 1.1495559 Gene_0006 308.946214 330.608840 8.044234 3.727668 0.2400125 MedLogCPM MADLogCPM PctZero Gene_0001 0.0000000 0.000000 51.5 Gene_0002 6.9098774 2.455602 8.0 Gene_0003 1.8511790 2.744558 44.5 Gene_0004 2.6051412 3.862382 43.0 Gene_0005 0.8958936 1.328252 50.0 Gene_0006 8.2758656 1.782199 0.5 > table(comparison$RowData$Dataset) sce sim1 sim2 sim3 2000 2000 2000 2000 > table(comparison$ColData$Dataset) sce sim1 sim2 sim3 200 200 100 100
Detailed statistics of each datasets gene and cell information. And a variety of drawing results:
copy> names(comparison$Plots) [1] "Means" "Variances" "MeanVar" "LibrarySizes" [5] "ZerosGene" "ZerosCell" "MeanZeros" "VarGeneCor"
It's very good-looking, wooden and exquisite. Use notched box plot:


More than one
Use the function diffSCEs. Here, the dimensions of the dataset need to be the same, so reconfigure.
And the length of this ref is required to be one, so it is more than one.
Error in diffSCEs(list(sce = sce, sim1 = sim1, sim2 = sim2, sim3 = sim3), : Assertion on 'ref' failed: Must have length 1.
copy# compare some to ref set.seed(1) sce <- mockSCE(ncells = 200, ngenes = 2000, nspikes = 100) params <- splatEstimate(sce) sim1 <- splatSimulate(params) sim2 <-splatSimulate(nGenes = 2000, batchCells = 200) sim3 <- simpleSimulate(nGenes = 2000, nCells = 200, verbose = FALSE) difference <- diffSCEs(list(sce = sce, sim1 = sim1, sim2 = sim2, sim3 = sim3), ref = "sce")
We can compare it with ref:


Other contents
For example, add tpm and fpkm data. For sce objects, you can directly use the method of scater package:
copysim <- simpleSimulate(verbose = FALSE) sim <- addGeneLengths(sim) head(rowData(sim)) #> DataFrame with 6 rows and 3 columns #> Gene GeneMean Length #> <character> <numeric> <numeric> #> Gene1 Gene1 0.5641399 917 #> Gene2 Gene2 0.0764411 765 #> Gene3 Gene3 2.6791742 5972 #> Gene4 Gene4 1.3782005 3491 #> Gene5 Gene5 4.0117653 15311 #> Gene6 Gene6 0.3536760 1190 tpm(sim) <- calculateTPM(sim, rowData(sim)$Length) tpm(sim)[1:5, 1:5] #> 5 x 5 sparse Matrix of class "dgCMatrix" #> Cell1 Cell2 Cell3 Cell4 Cell5 #> Gene1 342.21897 . . 169.73637 170.06608 #> Gene2 . . . . . #> Gene3 131.36922 . 187.68277 182.44101 78.34089 #> Gene4 89.89252 . 183.46630 89.17115 . #> Gene5 30.74405 20.50798 83.66284 81.32623 40.74211
If you feel that some contents of the sce object created by splat do not need to be used, you can delete some metadata or asset to compress the object size:
copysim <- splatSimulate() #> Getting parameters... #> Creating simulation object... #> Simulating library sizes... #> Simulating gene means... #> Simulating BCV... #> Simulating counts... #> Simulating dropout (if needed)... #> Sparsifying assays... #> Automatically converting to sparse matrices, threshold = 0.95 #> Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix #> Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix #> Skipping 'BCV': estimated sparse size 1.5 * dense matrix #> Skipping 'CellMeans': estimated sparse size 1.49 * dense matrix #> Skipping 'TrueCounts': estimated sparse size 1.65 * dense matrix #> Skipping 'counts': estimated sparse size 1.65 * dense matrix #> Done! minimiseSCE(sim) #> Minimising SingleCellExperiment... #> Original size: 43.9 Mb #> Removing all rowData columns #> Removing all colData columns #> Removing all metadata items #> Keeping 1 assays: counts #> Removing 5 assays: BatchCellMeans, BaseCellMeans, BCV, CellMeans, TrueCounts #> Sparsifying assays... #> Automatically converting to sparse matrices, threshold = 0.95 #> Skipping 'counts': estimated sparse size 1.65 * dense matrix #> Minimised size: 5.3 Mb (12% of original) #> class: SingleCellExperiment #> dim: 10000 100 #> metadata(0): #> assays(1): counts #> rownames(10000): Gene1 Gene2 ... Gene9999 Gene10000 #> rowData names(0): #> colnames(100): Cell1 Cell2 ... Cell99 Cell100 #> colData names(0): #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0): minimiseSCE(sim, rowData.keep = "Gene", colData.keep = c("Cell", "Batch"), metadata.keep = TRUE) #> Minimising SingleCellExperiment... #> Original size: 43.9 Mb #> Keeping 1 rowData columns: Gene #> Removing 3 rowData columns: BaseGeneMean, OutlierFactor, GeneMean #> Keeping 2 colData columns: Cell, Batch #> Removing 1 colData columns: ExpLibSize #> Keeping 1 assays: counts #> Removing 5 assays: BatchCellMeans, BaseCellMeans, BCV, CellMeans, TrueCounts #> Sparsifying assays... #> Automatically converting to sparse matrices, threshold = 0.95 #> Skipping 'counts': estimated sparse size 1.65 * dense matrix #> Minimised size: 5.9 Mb (14% of original) #> class: SingleCellExperiment #> dim: 10000 100 #> metadata(1): Params #> assays(1): counts #> rownames(10000): Gene1 Gene2 ... Gene9999 Gene10000 #> rowData names(1): Gene #> colnames(100): Cell1 Cell2 ... Cell99 Cell100 #> colData names(2): Cell Batch #> reducedDimNames(0): #> mainExpName: NULL #> altExpNames(0):
rowData(sce), colData(sce) and metadata(sce) are deleted by default.
splat also provides functions for assembling the drawing contents in the comparison results:
copyp1 <- makeCompPanel(comparison)
y1s1, it's ugly. I won't show it.
ps: however, there seems to be a bug in the many to one diagram:
copymakeCompPanel(difference):Error in makeCompPanel(difference) : Assertion on 'comp' failed: Must have length 3, but has length 5.
Personally, I think the splat suite is quite worth playing with. Let's see if we can personalize the batch magnitude of each sce or the var size between group s.
reference material
[1]
Introduction to Splatter (bioconductor.org): https://bioconductor.org/packages/devel/bioc/vignettes/splatter/inst/doc/splatter.html
[2]
Oshlack/splatter: Simple simulation of single-cell RNA sequencing data (github.com): https://github.com/Oshlack/splatter
[3]
splatPop: simulating single-cell data for populations: http://www.bioconductor.org/packages/devel/bioc/vignettes/splatter/inst/doc/splatPop.html#2_Quick_start
[4]
gamma distribution: https://en.wikipedia.org/wiki/Gamma_distribution
[5]
Poisson distribution: https://en.wikipedia.org/wiki/Poisson_distribution
[6]
Splat simulation parameters (bioconductor.org): https://bioconductor.org/packages/devel/bioc/vignettes/splatter/inst/doc/splat_params.html#27_Dropout_parameters