Tests for checking Batch Effects
Batch 151119 | Batch 160513 | Batch 170203 | |
---|---|---|---|
Condition crowned | 6 | 5 | 0 |
Condition worker | 3 | 2 | 5 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.7166 | 0.5878 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0.049 | 0 | 0 |
1st Qu. | 8.098 | 0.291 | 4.736 |
Median | 14.87 | 1.347 | 11.03 |
Mean | 17.77 | 3.313 | 14.16 |
3rd Qu. | 24.76 | 4.147 | 20.63 |
Max. | 94.85 | 59.48 | 94.76 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 4.125e-10 | 0.1252 | 0.3413 | 0.391 | 0.6262 | 1 | 0.1323 |
Condition P-values | 0.000442 | 0.3179 | 0.5402 | 0.5394 | 0.7711 | 1 | 0.02346 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
GNAQ | 25.01 | 37.81 | 3.763 | 0.001473 | 0.9996 | -4.593 |
CX3CR1 | 132.4 | 102.9 | 3.647 | 0.001905 | 0.9996 | -4.593 |
GAS7 | 32.58 | 36 | 3.471 | 0.002804 | 0.9996 | -4.593 |
DZIP3 | 5.458 | 4.571 | 3.463 | 0.002857 | 0.9996 | -4.593 |
CDYL2 | 7.931 | 8.524 | 3.45 | 0.00294 | 0.9996 | -4.593 |
PRDM1 | 15.35 | 16.86 | 3.446 | 0.002962 | 0.9996 | -4.593 |
CPT1C | 6.431 | 6.286 | 3.372 | 0.003483 | 0.9996 | -4.593 |
CCNDBP1 | 4.25 | 2.667 | 3.303 | 0.004051 | 0.9996 | -4.593 |
SULT4A1 | 5.167 | 2.524 | 3.262 | 0.004433 | 0.9996 | -4.593 |
SLC6A1 | 13.03 | 10.67 | 3.206 | 0.00501 | 0.9996 | -4.593 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 41.98 | 41.98 | 13.1 | 0.6 | 0.5413 | 11.1 | 0.3194 |
PC2 | 10.22 | 52.21 | 32.2 | 0.3 | 0.8691 | 32.1 | 0.03758 |
PC3 | 8.156 | 60.36 | 10.9 | 1.9 | 0.3169 | 5.3 | 0.4422 |
PC4 | 6.033 | 66.39 | 51.1 | 8.9 | 0.8939 | 51 | 0.00508 |
PC5 | 4.381 | 70.77 | 26.4 | 23 | 0.03642 | 4.1 | 0.6792 |
PC6 | 3.661 | 74.44 | 24.7 | 6.1 | 0.82 | 24.4 | 0.154 |
PC7 | 3.09 | 77.53 | 11.7 | 0.1 | 0.917 | 11.7 | 0.35 |
PC8 | 2.894 | 80.42 | 14.9 | 13 | 0.3147 | 9.6 | 0.829 |
PC9 | 2.633 | 83.05 | 29.5 | 2.3 | 0.05 | 11 | 0.06275 |
PC10 | 2.261 | 85.31 | 8.6 | 2.4 | 0.9157 | 8.5 | 0.5746 |
PC11 | 2.091 | 87.41 | 13.4 | 4.2 | 0.8512 | 13.2 | 0.4253 |
PC12 | 1.898 | 89.3 | 1.6 | 1 | 0.8644 | 1.5 | 0.9498 |
PC13 | 1.815 | 91.12 | 9.3 | 3.8 | 0.2195 | 0.6 | 0.6073 |
PC14 | 1.67 | 92.79 | 4.3 | 0.8 | 0.5149 | 1.8 | 0.737 |
PC15 | 1.487 | 94.28 | 2.5 | 0 | 0.9969 | 2.5 | 0.8095 |
PC16 | 1.44 | 95.72 | 26.4 | 21.9 | 0.02694 | 1 | 0.6001 |
PC17 | 1.295 | 97.01 | 8.2 | 2.7 | 0.5423 | 6.1 | 0.6107 |
PC18 | 1.163 | 98.17 | 1.7 | 0.1 | 0.7397 | 1 | 0.8769 |
PC19 | 1.056 | 99.23 | 6.8 | 6.1 | 0.3084 | 0.8 | 0.9363 |
PC20 | 0.7687 | 100 | 2.8 | 0.6 | 0.9201 | 2.7 | 0.8283 |
PC21 | 1.377e-28 | 100 | 19.4 | 4.3 | 0.7407 | 18.9 | 0.2319 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 3 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.04055
## p-value = 0
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.0921
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 1