Tests for checking Batch Effects
Batch 150629 | Batch 150722 | Batch 151119 | Batch 170217 | |
---|---|---|---|---|
Condition crowned | 4 | 3 | 5 | 0 |
Condition worker | 0 | 3 | 2 | 5 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.7956 | 0.6805 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0.39 | 0 | 0 |
1st Qu. | 22.58 | 0.267 | 19.67 |
Median | 34.47 | 1.151 | 32.19 |
Mean | 34.62 | 2.557 | 32.55 |
3rd Qu. | 46.68 | 3.355 | 45.16 |
Max. | 81.98 | 46.33 | 81.32 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 1.502e-06 | 0.01339 | 0.07447 | 0.1951 | 0.287 | 1 | 0.4321 |
Condition P-values | 0.001211 | 0.4143 | 0.5539 | 0.564 | 0.7144 | 1 | 0.004216 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
LGALS2 | -27.56 | 20.68 | -3.342 | 0.003726 | 0.9837 | -4.595 |
DDO | -10569 | 8228 | -2.834 | 0.01119 | 0.9837 | -4.595 |
PCDHB9 | -40.56 | 54.41 | -2.597 | 0.01846 | 0.9837 | -4.595 |
BIRC3 | 39.68 | 74.41 | 2.309 | 0.0333 | 0.9837 | -4.595 |
NKAIN1 | -30.27 | 51.41 | -2.255 | 0.03713 | 0.9837 | -4.595 |
EGF | 36.34 | 33.59 | 2.206 | 0.04093 | 0.9837 | -4.595 |
CENPP | -10.27 | 23.41 | -2.205 | 0.041 | 0.9837 | -4.595 |
ABI3BP | -22.02 | 50.86 | -2.166 | 0.0443 | 0.9837 | -4.595 |
CH25H | 12.37 | 17.14 | 2.149 | 0.04579 | 0.9837 | -4.595 |
HLA-DOA | 12.24 | 21.05 | 2.136 | 0.047 | 0.9837 | -4.595 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 68.1 | 68.1 | 40.4 | 1.2 | 0.5214 | 38.9 | 0.03136 |
PC2 | 7.381 | 75.48 | 36.8 | 8.1 | 0.7127 | 36.3 | 0.08777 |
PC3 | 3.66 | 79.14 | 10.1 | 3.2 | 0.5891 | 8.5 | 0.7339 |
PC4 | 2.258 | 81.39 | 13.5 | 13.2 | 0.3098 | 7.9 | 0.9964 |
PC5 | 2.026 | 83.42 | 58.8 | 14.3 | 0.8149 | 58.6 | 0.00517 |
PC6 | 1.867 | 85.29 | 34.3 | 0.9 | 0.1656 | 26.2 | 0.06646 |
PC7 | 1.707 | 86.99 | 20.3 | 2.2 | 0.3529 | 16 | 0.3114 |
PC8 | 1.44 | 88.43 | 7.6 | 0.1 | 0.4816 | 4.8 | 0.7107 |
PC9 | 1.349 | 89.78 | 9 | 0.1 | 0.5133 | 6.6 | 0.6557 |
PC10 | 1.294 | 91.08 | 2.4 | 1.7 | 0.8449 | 2.2 | 0.9873 |
PC11 | 1.226 | 92.3 | 3.1 | 0.6 | 0.834 | 2.9 | 0.9292 |
PC12 | 1.04 | 93.34 | 19.8 | 0 | 0.3823 | 16 | 0.2773 |
PC13 | 0.9501 | 94.29 | 15.5 | 4.4 | 0.1239 | 2.5 | 0.54 |
PC14 | 0.9258 | 95.22 | 5.2 | 0.5 | 0.9864 | 5.2 | 0.8394 |
PC15 | 0.8654 | 96.09 | 5.2 | 0.2 | 0.4658 | 2.1 | 0.826 |
PC16 | 0.8473 | 96.93 | 6.6 | 5.5 | 0.6177 | 5.2 | 0.9766 |
PC17 | 0.6755 | 97.61 | 53.2 | 34.5 | 0.00328 | 21 | 0.1186 |
PC18 | 0.6539 | 98.26 | 20 | 1 | 0.8669 | 19.8 | 0.2926 |
PC19 | 0.619 | 98.88 | 8.9 | 3.9 | 0.2887 | 2.4 | 0.8164 |
PC20 | 0.6021 | 99.48 | 20.2 | 0.2 | 0.4108 | 16.8 | 0.272 |
PC21 | 0.5167 | 100 | 9.2 | 4.3 | 0.2078 | 0 | 0.8206 |
PC22 | 1.143e-28 | 100 | 27.5 | 5.7 | 0.812 | 27.3 | 0.2034 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 4 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.06147
## p-value = 0
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1389
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 2