BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 151113 Batch 151119 Batch 170203
Condition crowned 7 4 0
Condition worker 5 0 5

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.7837 0.6657

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0.035 0 0
1st Qu. 10.14 0.9342 5.237
Median 17.69 3.825 11.92
Mean 20.57 6.859 15.37
3rd Qu. 28.28 9.984 22.03
Max. 94.55 55.18 94.44

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 1.564e-10 0.13 0.3579 0.4013 0.6466 0.9996 0.1254
Condition P-values 3.254e-05 0.2174 0.4624 0.4757 0.7265 1 0.05915

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
CAPN5 36.97 56.05 5.59 2.88e-05 0.4163 -3.103
QPCT 154.9 279.8 4.971 0.0001058 0.4507 -3.258
SERPINH1 274.7 574.7 4.946 0.0001117 0.4507 -3.265
ASPM 39.83 62.48 4.895 0.0001247 0.4507 -3.278
HLTF 183 392.5 4.722 0.0001807 0.5222 -3.327
PAM 511.7 1233 4.638 0.0002168 0.5223 -3.351
ENDOG -92.2 168.6 -4.486 0.0003016 0.5849 -3.396
AMOT 33.83 84.33 4.454 0.0003237 0.5849 -3.406
CDK14 27.14 45.48 4.37 0.0003889 0.6246 -3.431
VPS13B 157.5 546 4.252 0.0005033 0.7275 -3.468

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 27.31 27.31 10.5 8.4 0.6145 9.1 0.8162
PC2 11.87 39.19 66.3 5.9 0.0089 49 0.00016
PC3 7.451 46.64 34.2 0 0.1437 25.1 0.02855
PC4 6.441 53.08 28.3 20.7 0.4496 25.7 0.4278
PC5 5.128 58.21 32.9 18.9 0.5688 31.6 0.2008
PC6 4.704 62.91 17.4 16.9 0.2208 9.5 0.9539
PC7 4.162 67.07 7.9 2.9 0.3864 3.6 0.6425
PC8 3.712 70.78 2.8 0.1 0.7585 2.3 0.79
PC9 3.546 74.33 7 2.2 0.9037 6.9 0.6532
PC10 3.101 77.43 4.4 1 0.6747 3.4 0.7417
PC11 2.87 80.3 7.3 1.3 0.6912 6.4 0.5838
PC12 2.776 83.08 12.3 7.1 0.2857 6.1 0.6075
PC13 2.689 85.77 19.3 0.7 0.2375 12.2 0.1719
PC14 2.524 88.29 9 0.7 0.2843 2.5 0.4763
PC15 2.285 90.58 3.4 0.7 0.614 1.9 0.7868
PC16 2.157 92.73 24.5 9.6 0.03584 1.5 0.2161
PC17 2.032 94.77 3.3 0.7 0.4858 0.4 0.7958
PC18 1.94 96.71 4.5 1 0.5155 2 0.7365
PC19 1.758 98.46 1.6 0.1 0.6896 0.6 0.882
PC20 1.536 100 3.1 1.2 0.4867 0.2 0.8471
PC21 6.374e-29 100 8.6 1.4 0.7884 8.2 0.5237

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 3 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.02601
## p-value = 6.448e-09
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1104
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 0