Tests for checking Batch Effects
Batch 150629 | Batch 150722 | Batch 151218 | Batch 170208 | |
---|---|---|---|---|
Condition crowned | 4 | 2 | 5 | 0 |
Condition worker | 0 | 3 | 2 | 6 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.8283 | 0.7225 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0.219 | 0 | 0.067 |
1st Qu. | 21.26 | 1.149 | 17.32 |
Median | 37.07 | 4.938 | 32.74 |
Mean | 38.17 | 8.321 | 34.3 |
3rd Qu. | 54.26 | 12.81 | 49.88 |
Max. | 98.3 | 61.59 | 98.18 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 1.354e-14 | 0.009383 | 0.1059 | 0.2378 | 0.4017 | 0.9994 | 0.4033 |
Condition P-values | 6.406e-05 | 0.2082 | 0.4499 | 0.4663 | 0.7136 | 1 | 0.06121 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
CTBP2 | 359.8 | 1525 | 5.35 | 4.704e-05 | 0.4455 | -4.539 |
ALDH18A1 | 404.4 | 1443 | 4.989 | 0.000101 | 0.4455 | -4.542 |
ATAD2 | 103 | 299 | 4.984 | 0.0001022 | 0.4455 | -4.542 |
NOL8 | 168.6 | 878.5 | 4.912 | 0.0001192 | 0.4455 | -4.543 |
APLF | 148.1 | 396.7 | 4.627 | 0.0002204 | 0.6012 | -4.546 |
IFITM10 | -719.7 | 450.6 | -4.515 | 0.0002814 | 0.6012 | -4.547 |
TNFSF10 | 146.9 | 362.2 | 4.421 | 0.0003457 | 0.6012 | -4.548 |
MUT | 1119 | 2776 | 4.418 | 0.0003478 | 0.6012 | -4.548 |
EHD3 | 333.9 | 820.2 | 4.4 | 0.000362 | 0.6012 | -4.548 |
SHF | -52.5 | 110 | -4.193 | 0.0005702 | 0.8036 | -4.551 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 28.87 | 28.87 | 68.7 | 3.7 | 0.172 | 64.9 | 0.00021 |
PC2 | 16.67 | 45.53 | 59.5 | 33.2 | 0.2924 | 56.7 | 0.03318 |
PC3 | 8.366 | 53.9 | 17.5 | 0 | 0.6989 | 16.7 | 0.3395 |
PC4 | 4.781 | 58.68 | 14 | 5.8 | 0.3728 | 9.7 | 0.6637 |
PC5 | 4.628 | 63.31 | 6 | 0.2 | 0.5451 | 3.9 | 0.7919 |
PC6 | 4.283 | 67.59 | 37.8 | 7 | 0.5049 | 36.1 | 0.07058 |
PC7 | 3.588 | 71.18 | 8.3 | 6.8 | 0.5163 | 5.9 | 0.9623 |
PC8 | 2.991 | 74.17 | 32 | 3.8 | 0.03247 | 10.3 | 0.1083 |
PC9 | 2.891 | 77.06 | 14 | 4.9 | 0.2757 | 7.6 | 0.6237 |
PC10 | 2.769 | 79.83 | 14.5 | 0 | 0.9324 | 14.5 | 0.4337 |
PC11 | 2.514 | 82.34 | 10.2 | 4.7 | 0.7125 | 9.5 | 0.7898 |
PC12 | 2.346 | 84.69 | 8 | 0.8 | 0.3662 | 3.4 | 0.7225 |
PC13 | 2.17 | 86.86 | 5.4 | 3.2 | 0.4367 | 1.9 | 0.9372 |
PC14 | 1.979 | 88.84 | 19.7 | 2.5 | 0.7216 | 19.1 | 0.3351 |
PC15 | 1.842 | 90.68 | 8.2 | 3.2 | 0.2979 | 2 | 0.817 |
PC16 | 1.768 | 92.45 | 17.9 | 3.2 | 0.3397 | 13.2 | 0.4127 |
PC17 | 1.721 | 94.17 | 23.9 | 7 | 0.04968 | 3.8 | 0.3215 |
PC18 | 1.635 | 95.8 | 10.3 | 1.4 | 0.7361 | 9.7 | 0.6455 |
PC19 | 1.481 | 97.29 | 13.9 | 4 | 0.1923 | 4.6 | 0.5897 |
PC20 | 1.415 | 98.7 | 6 | 3.9 | 0.4572 | 2.8 | 0.9447 |
PC21 | 1.299 | 100 | 4.3 | 0.9 | 0.7632 | 3.7 | 0.8946 |
PC22 | 5.891e-29 | 100 | 43.8 | 17.9 | 0.8862 | 43.7 | 0.08481 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 4 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.04297
## p-value = 0
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.196
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 1