Tests for checking Batch Effects
Batch 151110 | Batch 170208 | Batch 170217 | |
---|---|---|---|
Condition crowned | 12 | 0 | 0 |
Condition worker | 6 | 2 | 2 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.6489 | 0.5164 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0 | 0 | 0 |
1st Qu. | 5.152 | 0.318 | 2.843 |
Median | 10.67 | 1.385 | 6.483 |
Mean | 13.74 | 2.893 | 9.768 |
3rd Qu. | 19.33 | 3.915 | 13.32 |
Max. | 98.86 | 47.04 | 98.85 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 0 | 0.2095 | 0.4751 | 0.4765 | 0.7363 | 1 | 0.07865 |
Condition P-values | 7.369e-05 | 0.3031 | 0.5689 | 0.5443 | 0.7947 | 1 | 0.03453 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
PLXNA4 | 178.3 | 181.2 | 5.177 | 5.704e-05 | 0.3429 | -4.538 |
ALDH3B1 | 173.8 | 427.1 | 5.076 | 7.122e-05 | 0.3429 | -4.539 |
SMIM10 | 121.4 | 179.5 | 5.074 | 7.148e-05 | 0.3429 | -4.539 |
ADAMTS14 | 86.42 | 124.1 | 4.84 | 0.00012 | 0.4319 | -4.541 |
HDAC5 | 219.1 | 642 | 4.485 | 0.0002653 | 0.5775 | -4.545 |
KIF5C | 54.92 | 63.68 | 4.46 | 0.0002805 | 0.5775 | -4.546 |
ITPR2 | 1548 | 3551 | 4.459 | 0.0002809 | 0.5775 | -4.546 |
DDX59 | 200.7 | 950.6 | 4.331 | 0.0003749 | 0.6603 | -4.547 |
ZCCHC12 | 286.5 | 290.7 | 4.288 | 0.0004129 | 0.6603 | -4.548 |
TNFRSF18 | 192.7 | 311.4 | 4.156 | 0.0005571 | 0.7365 | -4.549 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 28.6 | 28.6 | 6.6 | 1.9 | 0.9029 | 6.5 | 0.6412 |
PC2 | 19.04 | 47.64 | 8 | 0.5 | 0.8277 | 7.8 | 0.4908 |
PC3 | 13.05 | 60.68 | 32.6 | 4.3 | 0.04903 | 15.9 | 0.04252 |
PC4 | 6.972 | 67.66 | 14.2 | 0.4 | 0.6041 | 12.9 | 0.2604 |
PC5 | 5.97 | 73.62 | 19.2 | 0 | 0.4756 | 16.8 | 0.1474 |
PC6 | 4.073 | 77.7 | 5.1 | 3 | 0.4284 | 1.7 | 0.8167 |
PC7 | 2.898 | 80.6 | 31.1 | 0 | 0.2343 | 25.3 | 0.0352 |
PC8 | 2.355 | 82.95 | 29.5 | 26.3 | 0.02701 | 6.8 | 0.6713 |
PC9 | 1.921 | 84.87 | 24.4 | 18.9 | 0.2316 | 17.9 | 0.5355 |
PC10 | 1.712 | 86.58 | 20 | 8.2 | 0.2665 | 14.2 | 0.2892 |
PC11 | 1.663 | 88.24 | 0.7 | 0.3 | 0.91 | 0.6 | 0.9595 |
PC12 | 1.482 | 89.73 | 7.8 | 2.5 | 0.3142 | 2.3 | 0.6065 |
PC13 | 1.386 | 91.11 | 11.8 | 10.3 | 0.3462 | 7.2 | 0.8605 |
PC14 | 1.34 | 92.45 | 23.1 | 1.9 | 0.2761 | 17.7 | 0.1119 |
PC15 | 1.261 | 93.71 | 1.1 | 0 | 0.8849 | 0.9 | 0.9093 |
PC16 | 1.19 | 94.9 | 9.8 | 2.9 | 0.5892 | 8.3 | 0.5151 |
PC17 | 1.131 | 96.04 | 9.6 | 8.7 | 0.2428 | 2.3 | 0.9161 |
PC18 | 1.105 | 97.14 | 33.9 | 2.6 | 0.845 | 33.7 | 0.03051 |
PC19 | 1.063 | 98.2 | 4.8 | 3.9 | 0.3604 | 0.1 | 0.9228 |
PC20 | 0.9279 | 99.13 | 4.7 | 3 | 0.3863 | 0.5 | 0.8535 |
PC21 | 0.8687 | 100 | 2.1 | 0.5 | 0.6013 | 0.6 | 0.8639 |
PC22 | 2.461e-29 | 100 | 15.3 | 5.9 | 0.812 | 15 | 0.3892 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 3 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.01529
## p-value = 0.0024
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1781
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 0