Tests for checking Batch Effects
Batch 151110 | Batch 170217 | |
---|---|---|
Condition crowned | 12 | 0 |
Condition worker | 6 | 6 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.7071 | 0.5774 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0 | 0 | 0 |
1st Qu. | 3.291 | 0.362 | 1.009 |
Median | 7.712 | 1.564 | 4.178 |
Mean | 10.17 | 3.284 | 6.956 |
3rd Qu. | 14.62 | 4.41 | 10.36 |
Max. | 79.65 | 46.08 | 79.55 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 4.116e-07 | 0.1262 | 0.3542 | 0.4074 | 0.6641 | 1 | 0.1234 |
Condition P-values | 0.00112 | 0.3249 | 0.5559 | 0.544 | 0.7749 | 1 | 0.0241 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
SLC40A1 | 398.7 | 1190 | 3.819 | 0.0009649 | 1 | -4.593 |
INHBB | -16.42 | 27.96 | -3.712 | 0.001248 | 1 | -4.593 |
COL24A1 | 8.333 | 6.417 | 3.497 | 0.002088 | 1 | -4.593 |
PGR | 158.2 | 195.2 | 3.477 | 0.002187 | 1 | -4.593 |
FIBIN | 63.75 | 90.79 | 3.355 | 0.002922 | 1 | -4.593 |
LUZP6 | 45.33 | 342.4 | 3.339 | 0.003038 | 1 | -4.593 |
TRNAU1AP | 21.25 | 71 | 3.255 | 0.003703 | 1 | -4.594 |
EXPH5 | 20.5 | 15.71 | 3.249 | 0.003752 | 1 | -4.594 |
PAK1 | 22.83 | 37.46 | 3.197 | 0.004243 | 1 | -4.594 |
CCBE1 | 22.75 | 23.83 | 3.189 | 0.004317 | 1 | -4.594 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 18.29 | 18.29 | 19.7 | 1.3 | 0.4429 | 17.3 | 0.04004 |
PC2 | 14.93 | 33.22 | 0 | 0 | 0.9677 | 0 | 0.9432 |
PC3 | 9.113 | 42.33 | 11.1 | 0.1 | 0.4334 | 8.4 | 0.1223 |
PC4 | 7.607 | 49.94 | 11.3 | 7.4 | 0.6046 | 10.2 | 0.3466 |
PC5 | 4.373 | 54.31 | 10.9 | 3.8 | 0.9863 | 10.9 | 0.2074 |
PC6 | 4.117 | 58.43 | 18.4 | 0.1 | 0.1803 | 10.9 | 0.04181 |
PC7 | 3.886 | 62.31 | 17.4 | 3 | 0.7053 | 16.8 | 0.07013 |
PC8 | 3.551 | 65.86 | 1.5 | 0.4 | 0.9562 | 1.5 | 0.6282 |
PC9 | 3.219 | 69.08 | 13.1 | 13 | 0.1448 | 3.6 | 0.9101 |
PC10 | 2.898 | 71.98 | 16.1 | 12.4 | 0.05961 | 0.2 | 0.349 |
PC11 | 2.804 | 74.79 | 12 | 10.9 | 0.1214 | 1.1 | 0.611 |
PC12 | 2.757 | 77.54 | 12.4 | 11.3 | 0.3031 | 7.7 | 0.6208 |
PC13 | 2.559 | 80.1 | 8.2 | 8.1 | 0.253 | 2.1 | 0.9164 |
PC14 | 2.434 | 82.53 | 2.8 | 1 | 0.9795 | 2.8 | 0.5427 |
PC15 | 2.26 | 84.79 | 4.7 | 4.4 | 0.3528 | 0.6 | 0.7962 |
PC16 | 2.168 | 86.96 | 4.7 | 3.5 | 0.3205 | 0 | 0.6059 |
PC17 | 2.033 | 88.99 | 7.8 | 0.8 | 0.2945 | 2.8 | 0.2182 |
PC18 | 2.02 | 91.01 | 6.9 | 5.8 | 0.2363 | 0.3 | 0.6215 |
PC19 | 1.981 | 93 | 5.2 | 1.1 | 0.3515 | 1.1 | 0.3547 |
PC20 | 1.887 | 94.88 | 3.6 | 3.6 | 0.4844 | 1.3 | 0.9864 |
PC21 | 1.776 | 96.66 | 11.1 | 7.9 | 0.1209 | 0 | 0.394 |
PC22 | 1.734 | 98.39 | 0.8 | 0.1 | 0.7461 | 0.3 | 0.6873 |
PC23 | 1.608 | 100 | 0.4 | 0.1 | 0.8004 | 0.1 | 0.7859 |
PC24 | 1.998e-29 | 100 | 1.2 | 0.1 | 0.8695 | 1 | 0.6381 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 2 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.02067
## p-value = 6.57e-06
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1583
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 4