Tests for checking Batch Effects
Batch 151113 | Batch 151119 | Batch 170203 | |
---|---|---|---|
Condition crowned | 7 | 4 | 0 |
Condition worker | 5 | 0 | 5 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.7837 | 0.6657 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0.035 | 0 | 0 |
1st Qu. | 10.14 | 0.9342 | 5.237 |
Median | 17.69 | 3.825 | 11.92 |
Mean | 20.57 | 6.859 | 15.37 |
3rd Qu. | 28.28 | 9.984 | 22.03 |
Max. | 94.55 | 55.18 | 94.44 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 1.564e-10 | 0.13 | 0.3579 | 0.4013 | 0.6466 | 0.9996 | 0.1254 |
Condition P-values | 3.254e-05 | 0.2174 | 0.4624 | 0.4757 | 0.7265 | 1 | 0.05915 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
CAPN5 | 36.97 | 56.05 | 5.59 | 2.88e-05 | 0.4163 | -3.103 |
QPCT | 154.9 | 279.8 | 4.971 | 0.0001058 | 0.4507 | -3.258 |
SERPINH1 | 274.7 | 574.7 | 4.946 | 0.0001117 | 0.4507 | -3.265 |
ASPM | 39.83 | 62.48 | 4.895 | 0.0001247 | 0.4507 | -3.278 |
HLTF | 183 | 392.5 | 4.722 | 0.0001807 | 0.5222 | -3.327 |
PAM | 511.7 | 1233 | 4.638 | 0.0002168 | 0.5223 | -3.351 |
ENDOG | -92.2 | 168.6 | -4.486 | 0.0003016 | 0.5849 | -3.396 |
AMOT | 33.83 | 84.33 | 4.454 | 0.0003237 | 0.5849 | -3.406 |
CDK14 | 27.14 | 45.48 | 4.37 | 0.0003889 | 0.6246 | -3.431 |
VPS13B | 157.5 | 546 | 4.252 | 0.0005033 | 0.7275 | -3.468 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 27.31 | 27.31 | 10.5 | 8.4 | 0.6145 | 9.1 | 0.8162 |
PC2 | 11.87 | 39.19 | 66.3 | 5.9 | 0.0089 | 49 | 0.00016 |
PC3 | 7.451 | 46.64 | 34.2 | 0 | 0.1437 | 25.1 | 0.02855 |
PC4 | 6.441 | 53.08 | 28.3 | 20.7 | 0.4496 | 25.7 | 0.4278 |
PC5 | 5.128 | 58.21 | 32.9 | 18.9 | 0.5688 | 31.6 | 0.2008 |
PC6 | 4.704 | 62.91 | 17.4 | 16.9 | 0.2208 | 9.5 | 0.9539 |
PC7 | 4.162 | 67.07 | 7.9 | 2.9 | 0.3864 | 3.6 | 0.6425 |
PC8 | 3.712 | 70.78 | 2.8 | 0.1 | 0.7585 | 2.3 | 0.79 |
PC9 | 3.546 | 74.33 | 7 | 2.2 | 0.9037 | 6.9 | 0.6532 |
PC10 | 3.101 | 77.43 | 4.4 | 1 | 0.6747 | 3.4 | 0.7417 |
PC11 | 2.87 | 80.3 | 7.3 | 1.3 | 0.6912 | 6.4 | 0.5838 |
PC12 | 2.776 | 83.08 | 12.3 | 7.1 | 0.2857 | 6.1 | 0.6075 |
PC13 | 2.689 | 85.77 | 19.3 | 0.7 | 0.2375 | 12.2 | 0.1719 |
PC14 | 2.524 | 88.29 | 9 | 0.7 | 0.2843 | 2.5 | 0.4763 |
PC15 | 2.285 | 90.58 | 3.4 | 0.7 | 0.614 | 1.9 | 0.7868 |
PC16 | 2.157 | 92.73 | 24.5 | 9.6 | 0.03584 | 1.5 | 0.2161 |
PC17 | 2.032 | 94.77 | 3.3 | 0.7 | 0.4858 | 0.4 | 0.7958 |
PC18 | 1.94 | 96.71 | 4.5 | 1 | 0.5155 | 2 | 0.7365 |
PC19 | 1.758 | 98.46 | 1.6 | 0.1 | 0.6896 | 0.6 | 0.882 |
PC20 | 1.536 | 100 | 3.1 | 1.2 | 0.4867 | 0.2 | 0.8471 |
PC21 | 6.374e-29 | 100 | 8.6 | 1.4 | 0.7884 | 8.2 | 0.5237 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 3 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.02601
## p-value = 6.448e-09
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1104
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 0