Tests for checking Batch Effects
Batch 150629 | Batch 150722 | Batch 151119 | Batch 170217 | |
---|---|---|---|---|
Condition crowned | 4 | 2 | 5 | 0 |
Condition worker | 0 | 3 | 2 | 6 |
Standardized Pearson Correlation Coefficient | Cramer’s V | |
---|---|---|
Confounding Coefficients (0=no confounding, 1=complete confounding) | 0.8283 | 0.7225 |
Full (Condition+Batch) | Condition | Batch | |
---|---|---|---|
Min. | 0.33 | 0 | 0.068 |
1st Qu. | 22.28 | 1.469 | 19.13 |
Median | 38.26 | 5.703 | 35.73 |
Mean | 40.3 | 8.407 | 38.1 |
3rd Qu. | 58.23 | 13.14 | 56.72 |
Max. | 91.39 | 59.47 | 91.38 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Ps<0.05 | |
---|---|---|---|---|---|---|---|
Batch P-values | 6.946e-08 | 0.004556 | 0.07067 | 0.2021 | 0.3284 | 0.9994 | 0.4576 |
Condition P-values | 7.839e-05 | 0.3952 | 0.6302 | 0.5966 | 0.8207 | 1 | 0.01456 |
Boxplots for all values for each of the samples and are colored by batch membership.
Condition: worker (logFC) | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|
SPON1 | -909 | 1103 | -5.264 | 5.563e-05 | 0.8446 | -4.595 |
BCAN | -680.2 | 423.3 | -4.26 | 0.0004881 | 1 | -4.595 |
AFF2 | -952.2 | 949.3 | -4.003 | 0.0008589 | 1 | -4.595 |
MANEA | -120.3 | 499.3 | -3.928 | 0.001014 | 1 | -4.595 |
GSTM1 | -462.2 | 2043 | -3.811 | 0.001313 | 1 | -4.595 |
BIRC3 | 45.55 | 120.8 | 3.554 | 0.002318 | 1 | -4.595 |
DDC | -845.8 | 1656 | -3.466 | 0.002812 | 1 | -4.595 |
GPNMB | -64.82 | 110.4 | -3.454 | 0.002892 | 1 | -4.595 |
GPR115 | -56.76 | 59.45 | -3.442 | 0.002969 | 1 | -4.595 |
THG1L | -34.1 | 175 | -3.278 | 0.004254 | 1 | -4.595 |
This plot helps identify outlying samples.
This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.
This is a heatmap of the correlation between samples.
This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.
This is a plot of the top two principal components colored by batch to show the batch effects.
Proportion of Variance (%) | Cumulative Proportion of Variance (%) | Percent Variation Explained by Either Condition or Batch | Percent Variation Explained by Condition | Condition Significance (p-value) | Percent Variation Explained by Batch | Batch Significance (p-value) | |
---|---|---|---|---|---|---|---|
PC1 | 34 | 34 | 64.2 | 15.7 | 0.8884 | 64.1 | 0.00188 |
PC2 | 20.96 | 54.96 | 50.8 | 2.2 | 0.9594 | 50.8 | 0.00747 |
PC3 | 5.283 | 60.24 | 44.9 | 20.7 | 0.3514 | 41.9 | 0.09618 |
PC4 | 4.125 | 64.37 | 19 | 0.4 | 0.6552 | 18 | 0.306 |
PC5 | 3.983 | 68.35 | 6.1 | 1.6 | 0.5732 | 4.3 | 0.8428 |
PC6 | 3.49 | 71.84 | 35.4 | 23.3 | 0.04156 | 16.9 | 0.3932 |
PC7 | 3.197 | 75.04 | 3.3 | 0.5 | 0.8607 | 3.1 | 0.9198 |
PC8 | 2.917 | 77.95 | 7.7 | 0 | 0.4882 | 5 | 0.7039 |
PC9 | 2.631 | 80.58 | 32.5 | 0.5 | 0.2233 | 26.1 | 0.07942 |
PC10 | 2.575 | 83.16 | 0.7 | 0.1 | 0.8889 | 0.6 | 0.9895 |
PC11 | 2.198 | 85.36 | 10.3 | 5.2 | 0.1975 | 0.8 | 0.8083 |
PC12 | 1.959 | 87.31 | 6.2 | 5.5 | 0.5494 | 4.1 | 0.9882 |
PC13 | 1.793 | 89.11 | 24.1 | 6.9 | 0.1521 | 14.1 | 0.3127 |
PC14 | 1.653 | 90.76 | 1 | 0 | 0.9944 | 1 | 0.9812 |
PC15 | 1.625 | 92.39 | 7.4 | 2.1 | 0.9126 | 7.4 | 0.8042 |
PC16 | 1.479 | 93.87 | 6.1 | 2.5 | 0.4524 | 2.9 | 0.8818 |
PC17 | 1.367 | 95.23 | 17.4 | 5.9 | 0.08284 | 0.9 | 0.5172 |
PC18 | 1.27 | 96.5 | 20.1 | 2.5 | 0.09075 | 5 | 0.3242 |
PC19 | 1.205 | 97.71 | 4.2 | 1.6 | 0.4446 | 0.7 | 0.9272 |
PC20 | 1.183 | 98.89 | 33.6 | 1 | 0.3613 | 30.2 | 0.07235 |
PC21 | 1.11 | 100 | 5 | 1.8 | 0.4819 | 2.1 | 0.8989 |
PC22 | 4.684e-29 | 100 | 25.6 | 2.8 | 0.5894 | 24.3 | 0.1969 |
This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation
## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.
This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.
## Found 4 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.04356
## p-value = 0
##
##
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.152
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.
## Number of Surrogate Variables found in the given data: 0