BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 150722 Batch 151218 Batch 170217
Condition crowned 7 5 0
Condition worker 3 2 5

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.7224 0.5942

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0.013 0 0.01
1st Qu. 22.46 1.53 19.16
Median 36 5.144 33.12
Mean 34.97 6.8 32.19
3rd Qu. 47.5 10.36 45.06
Max. 79.05 52.94 78.66

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 1.537e-06 0.006849 0.03758 0.1453 0.1813 0.9997 0.5455
Condition P-values 3.447e-05 0.3157 0.5097 0.521 0.7331 1 0.01712

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
P4HA3 8.972 8.318 4.671 0.0001762 0.9997 -4.594
THBS1 43.54 70.18 4.002 0.0007939 0.9997 -4.594
LRTOMT 7.87 12.82 3.473 0.002618 0.9997 -4.594
CIDEA 28.91 51.95 3.47 0.002637 0.9997 -4.594
PDE7A 91.3 175.1 3.243 0.00438 0.9997 -4.594
IL17B 30.19 28.45 3.238 0.004428 0.9997 -4.594
PTHLH 6.611 8.409 3.193 0.004895 0.9997 -4.594
UNC80 13.83 21.14 3.175 0.005098 0.9997 -4.594
FAM26E 44.36 79.36 3.14 0.005503 0.9997 -4.594
SNX7 59.51 202.4 3.07 0.006426 0.9997 -4.594

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 52.56 52.56 50.7 8.1 0.3948 48.6 0.00371
PC2 6.273 58.83 45.3 9.5 0.9784 45.3 0.01069
PC3 5.744 64.58 32.8 5.5 0.6045 31.8 0.04647
PC4 4.515 69.09 13.7 6.9 0.7725 13.3 0.506
PC5 3.356 72.45 1.8 1.7 0.6999 0.9 0.9922
PC6 3.068 75.52 15.8 0.1 0.674 14.9 0.2151
PC7 2.815 78.33 15.3 14.7 0.1264 3.3 0.934
PC8 2.442 80.77 15.5 0 0.725 14.9 0.2195
PC9 2.283 83.06 3.4 3.1 0.4655 0.4 0.972
PC10 2.011 85.07 3.1 0 0.7212 2.4 0.7565
PC11 1.956 87.02 12.6 12.6 0.2179 4.7 0.9986
PC12 1.717 88.74 1.9 1.4 0.6287 0.6 0.9522
PC13 1.694 90.43 10.7 0.3 0.4051 7 0.3733
PC14 1.609 92.04 15.5 9.4 0.08594 0 0.5315
PC15 1.476 93.52 8.1 0.1 0.7604 7.6 0.4756
PC16 1.296 94.82 27.5 15.1 0.01831 0.4 0.2428
PC17 1.194 96.01 18.7 9 0.06479 1.2 0.3608
PC18 1.145 97.15 3.2 0.3 0.5426 1.1 0.7673
PC19 1.041 98.2 1.1 0.9 0.8447 0.9 0.979
PC20 1.005 99.2 3 1.3 0.5004 0.4 0.8558
PC21 0.7997 100 0.4 0 0.8724 0.2 0.9666
PC22 7.629e-29 100 47 10.9 0.1551 40.5 0.00938

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 3 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.02095
## p-value = 5.146e-06
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.08435
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 2