Tests for checking Batch Effects
Batch 150722 | Batch 151218 | Batch 170217 | |
---|---|---|---|
Condition crowned | 7 | 5 | 0 |
Condition worker | 3 | 2 | 5 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.7224 | 0.5942 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0.013 | 0 | 0.01 |
1st Qu. | 22.46 | 1.53 | 19.16 |
Median | 36 | 5.144 | 33.12 |
Mean | 34.97 | 6.8 | 32.19 |
3rd Qu. | 47.5 | 10.36 | 45.06 |
Max. | 79.05 | 52.94 | 78.66 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 1.537e-06 | 0.006849 | 0.03758 | 0.1453 | 0.1813 | 0.9997 | 0.5455 |
Condition P-values | 3.447e-05 | 0.3157 | 0.5097 | 0.521 | 0.7331 | 1 | 0.01712 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
P4HA3 | 8.972 | 8.318 | 4.671 | 0.0001762 | 0.9997 | -4.594 |
THBS1 | 43.54 | 70.18 | 4.002 | 0.0007939 | 0.9997 | -4.594 |
LRTOMT | 7.87 | 12.82 | 3.473 | 0.002618 | 0.9997 | -4.594 |
CIDEA | 28.91 | 51.95 | 3.47 | 0.002637 | 0.9997 | -4.594 |
PDE7A | 91.3 | 175.1 | 3.243 | 0.00438 | 0.9997 | -4.594 |
IL17B | 30.19 | 28.45 | 3.238 | 0.004428 | 0.9997 | -4.594 |
PTHLH | 6.611 | 8.409 | 3.193 | 0.004895 | 0.9997 | -4.594 |
UNC80 | 13.83 | 21.14 | 3.175 | 0.005098 | 0.9997 | -4.594 |
FAM26E | 44.36 | 79.36 | 3.14 | 0.005503 | 0.9997 | -4.594 |
SNX7 | 59.51 | 202.4 | 3.07 | 0.006426 | 0.9997 | -4.594 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 52.56 | 52.56 | 50.7 | 8.1 | 0.3948 | 48.6 | 0.00371 |
PC2 | 6.273 | 58.83 | 45.3 | 9.5 | 0.9784 | 45.3 | 0.01069 |
PC3 | 5.744 | 64.58 | 32.8 | 5.5 | 0.6045 | 31.8 | 0.04647 |
PC4 | 4.515 | 69.09 | 13.7 | 6.9 | 0.7725 | 13.3 | 0.506 |
PC5 | 3.356 | 72.45 | 1.8 | 1.7 | 0.6999 | 0.9 | 0.9922 |
PC6 | 3.068 | 75.52 | 15.8 | 0.1 | 0.674 | 14.9 | 0.2151 |
PC7 | 2.815 | 78.33 | 15.3 | 14.7 | 0.1264 | 3.3 | 0.934 |
PC8 | 2.442 | 80.77 | 15.5 | 0 | 0.725 | 14.9 | 0.2195 |
PC9 | 2.283 | 83.06 | 3.4 | 3.1 | 0.4655 | 0.4 | 0.972 |
PC10 | 2.011 | 85.07 | 3.1 | 0 | 0.7212 | 2.4 | 0.7565 |
PC11 | 1.956 | 87.02 | 12.6 | 12.6 | 0.2179 | 4.7 | 0.9986 |
PC12 | 1.717 | 88.74 | 1.9 | 1.4 | 0.6287 | 0.6 | 0.9522 |
PC13 | 1.694 | 90.43 | 10.7 | 0.3 | 0.4051 | 7 | 0.3733 |
PC14 | 1.609 | 92.04 | 15.5 | 9.4 | 0.08594 | 0 | 0.5315 |
PC15 | 1.476 | 93.52 | 8.1 | 0.1 | 0.7604 | 7.6 | 0.4756 |
PC16 | 1.296 | 94.82 | 27.5 | 15.1 | 0.01831 | 0.4 | 0.2428 |
PC17 | 1.194 | 96.01 | 18.7 | 9 | 0.06479 | 1.2 | 0.3608 |
PC18 | 1.145 | 97.15 | 3.2 | 0.3 | 0.5426 | 1.1 | 0.7673 |
PC19 | 1.041 | 98.2 | 1.1 | 0.9 | 0.8447 | 0.9 | 0.979 |
PC20 | 1.005 | 99.2 | 3 | 1.3 | 0.5004 | 0.4 | 0.8558 |
PC21 | 0.7997 | 100 | 0.4 | 0 | 0.8724 | 0.2 | 0.9666 |
PC22 | 7.629e-29 | 100 | 47 | 10.9 | 0.1551 | 40.5 | 0.00938 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 3 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.02095
## p-value = 5.146e-06
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.08435
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 2