The aim of the experiment is to determine the density of Sla1-EGFP patches in yeast mutants where the central region of Ede1 protein (amino acids 366-900, PQ-rich and coiled-coil domains) was replaced by a heterologous protein domain, one of:
I acquired all data on the Olympus IX81 equipped with a 100x/1.49 objective, using the X-Cite 120PC lamp at 50% intensity and 400 ms exposure for illumination. Light was filtered through a U-MGFPHQ filter cube. I acquired stacks of 26 planes with a step size of 0.2 microns.
Individual non-budding cells were cropped from fields of view. Patch numbers were extracted using Python function count_patches from my personal package mkimage containing a set of wrappers for scikit-image functions. Briefly, the images were median-filtered with a 5 px disk brush, and the filtered images were subtracted from the originals to subtract local background. The background-subtracted images were thresholded using the Yen method. The thresholded images were eroded using the number of non-zero neighbouring pixels in 3D as the erosion criterion. The spots were counted using skimage.measure.label() function with 2-connectivity.
Cross-section area was obtained by median-filtering of the stack with 10px disk brush, calculating the maximum projection image, thresholding using Otsu’s algorithm, and using skimage.measure.regionprops() to measure area. Note that these are pixel counts of cross-section area. To determine the total surface area, I assumed that an unbudded cell is spherical (suface area is four times the cross-section area).
This call to site_counter was used to process all datasets:
process_folder(path, median_radius = 5, erosion_n = 1, con = 2,
method = Yen, mask = False, loop = False, save_images = True)
Another notebook was used to gather all output into tidy data frames available here, with no further modifications.
| strain | ede1 |
|---|---|
| MKY0140 | EDE1 |
| MKY3782 | ∆PQCC |
| MKY0654 | ede1∆ |
| MKY4617 | mCherry |
| MKY4576 | dTom |
| MKY4578 | Khc |
| MKY4580 | Eg5 |
| MKY4291 | Snf5 |
| MKY4282 | Sup35 |
| ede1 | dataset | n | patches_mean | patches_sd | patches_se | area_mean | area_sd | area_se |
|---|---|---|---|---|---|---|---|---|
| EDE1 | 1 | 63 | 27.73016 | 6.715909 | 0.8461250 | 51.94079 | 10.645161 | 1.3411642 |
| EDE1 | 2 | 58 | 29.12069 | 5.846209 | 0.7676449 | 56.44771 | 11.552946 | 1.5169761 |
| EDE1 | 3 | 64 | 29.32812 | 6.519944 | 0.8149930 | 49.56184 | 7.991786 | 0.9989733 |
| ∆PQCC | 1 | 73 | 16.41096 | 4.954896 | 0.5799267 | 52.33093 | 11.200530 | 1.3109229 |
| ∆PQCC | 2 | 74 | 18.44595 | 5.698214 | 0.6624039 | 54.32342 | 10.600985 | 1.2323395 |
| ∆PQCC | 3 | 58 | 18.29310 | 6.833890 | 0.8973338 | 54.40918 | 11.287895 | 1.4821732 |
| ede1∆ | 1 | 61 | 14.98361 | 4.801013 | 0.6147067 | 52.08469 | 7.516644 | 0.9624076 |
| ede1∆ | 2 | 65 | 15.98462 | 5.704966 | 0.7076139 | 56.48160 | 10.966704 | 1.3602522 |
| ede1∆ | 3 | 60 | 14.61667 | 5.249993 | 0.6777712 | 48.98833 | 8.084606 | 1.0437182 |
| mCherry | 4 | 54 | 17.70370 | 5.638913 | 0.7673589 | 59.99512 | 15.108871 | 2.0560569 |
| mCherry | 5 | 70 | 18.48571 | 4.760169 | 0.5689491 | 50.13624 | 7.121412 | 0.8511715 |
| mCherry | 6 | 78 | 14.21795 | 4.874123 | 0.5518858 | 49.94668 | 9.651742 | 1.0928446 |
| dTom | 4 | 68 | 15.82353 | 5.268723 | 0.6389265 | 48.63283 | 7.470632 | 0.9059472 |
| dTom | 5 | 59 | 15.71186 | 5.378921 | 0.7002758 | 53.89653 | 9.640562 | 1.2550942 |
| dTom | 6 | 42 | 18.92857 | 5.654236 | 0.8724675 | 55.65860 | 10.453794 | 1.6130555 |
| Khc | 4 | 61 | 13.62295 | 5.612973 | 0.7186675 | 47.46450 | 6.633740 | 0.8493634 |
| Khc | 5 | 64 | 14.31250 | 4.189462 | 0.5236827 | 49.50749 | 8.223165 | 1.0278956 |
| Khc | 6 | 54 | 16.62963 | 5.003004 | 0.6808226 | 53.91222 | 8.693893 | 1.1830890 |
| Eg5 | 4 | 60 | 18.81667 | 5.776902 | 0.7457949 | 50.86488 | 6.825151 | 0.8811232 |
| Eg5 | 5 | 78 | 16.06410 | 5.112602 | 0.5788881 | 50.41711 | 10.283088 | 1.1643305 |
| Eg5 | 6 | 60 | 16.26667 | 5.868233 | 0.7575856 | 51.62094 | 11.500195 | 1.4846688 |
| Snf5 | 1 | 90 | 19.34444 | 6.182907 | 0.6517356 | 45.21896 | 7.520545 | 0.7927350 |
| Snf5 | 2 | 79 | 20.75949 | 5.137133 | 0.5779726 | 56.24489 | 10.959846 | 1.2330790 |
| Snf5 | 3 | 62 | 18.59677 | 8.177257 | 1.0385127 | 48.89502 | 11.491233 | 1.4593880 |
| Sup35 | 1 | 77 | 22.02597 | 5.155411 | 0.5875136 | 51.53718 | 8.410510 | 0.9584666 |
| Sup35 | 2 | 61 | 21.21311 | 7.071338 | 0.9053921 | 55.34524 | 9.881855 | 1.2652419 |
| Sup35 | 3 | 54 | 22.11111 | 5.967417 | 0.8120626 | 52.71252 | 10.581480 | 1.4399571 |
We can combine the patch number and area into \(density = \frac{patches}{area}\), calculated individually for each cell. We can summarise the data for each Ede1 mutant in each dataset:
| ede1 | dataset | n | density_mean | density_sd | density_se | density_median | density_mad |
|---|---|---|---|---|---|---|---|
| EDE1 | 1 | 63 | 0.5481701 | 0.1446301 | 0.0182217 | 0.5645806 | 0.1440384 |
| EDE1 | 2 | 58 | 0.5277718 | 0.1198626 | 0.0157387 | 0.5120066 | 0.1172980 |
| EDE1 | 3 | 64 | 0.5948730 | 0.1110449 | 0.0138806 | 0.5869729 | 0.1016109 |
| ∆PQCC | 1 | 73 | 0.3178281 | 0.0936706 | 0.0109633 | 0.3144940 | 0.0989302 |
| ∆PQCC | 2 | 74 | 0.3463387 | 0.1057420 | 0.0122923 | 0.3237286 | 0.0907352 |
| ∆PQCC | 3 | 58 | 0.3343710 | 0.1086607 | 0.0142679 | 0.3384717 | 0.0852853 |
| ede1∆ | 1 | 61 | 0.2905893 | 0.0926398 | 0.0118613 | 0.2893412 | 0.0784925 |
| ede1∆ | 2 | 65 | 0.2812039 | 0.0815060 | 0.0101096 | 0.2864278 | 0.0766339 |
| ede1∆ | 3 | 60 | 0.2988538 | 0.1021170 | 0.0131832 | 0.3085457 | 0.1075222 |
| mCherry | 4 | 54 | 0.2960346 | 0.0649451 | 0.0088379 | 0.2913582 | 0.0658805 |
| mCherry | 5 | 70 | 0.3720760 | 0.0922860 | 0.0110303 | 0.3616950 | 0.0877807 |
| mCherry | 6 | 78 | 0.2871355 | 0.0912629 | 0.0103335 | 0.3002458 | 0.0946702 |
| dTom | 4 | 68 | 0.3240302 | 0.0977476 | 0.0118536 | 0.3172533 | 0.0954928 |
| dTom | 5 | 59 | 0.2908285 | 0.0862263 | 0.0112257 | 0.2827884 | 0.0858295 |
| dTom | 6 | 42 | 0.3464929 | 0.1032743 | 0.0159356 | 0.3458182 | 0.0946929 |
| Khc | 4 | 61 | 0.2850212 | 0.1032656 | 0.0132218 | 0.2863498 | 0.1103514 |
| Khc | 5 | 64 | 0.2929173 | 0.0839598 | 0.0104950 | 0.2984537 | 0.0814191 |
| Khc | 6 | 54 | 0.3084453 | 0.0835851 | 0.0113745 | 0.2990882 | 0.0817657 |
| Eg5 | 4 | 60 | 0.3698412 | 0.1012370 | 0.0130696 | 0.3727275 | 0.1171072 |
| Eg5 | 5 | 78 | 0.3236706 | 0.0976318 | 0.0110546 | 0.3266622 | 0.0931995 |
| Eg5 | 6 | 60 | 0.3182508 | 0.1076020 | 0.0138914 | 0.3163651 | 0.1029646 |
| Snf5 | 1 | 90 | 0.4276246 | 0.1221145 | 0.0128720 | 0.4228668 | 0.1211030 |
| Snf5 | 2 | 79 | 0.3760389 | 0.0943727 | 0.0106178 | 0.3684693 | 0.1011870 |
| Snf5 | 3 | 62 | 0.3725138 | 0.1277070 | 0.0162188 | 0.3669131 | 0.1374652 |
| Sup35 | 1 | 77 | 0.4362761 | 0.1133273 | 0.0129148 | 0.4255176 | 0.1124559 |
| Sup35 | 2 | 61 | 0.3870802 | 0.1225512 | 0.0156911 | 0.3899448 | 0.1170396 |
| Sup35 | 3 | 54 | 0.4281938 | 0.1241646 | 0.0168967 | 0.4192432 | 0.1205993 |
I have chosen to show this data using the SuperPlot style. Each point shows density of Sla1-EGFP patches in an individual cell.
Big colour points show mean measurements from three independent repeats.
Range is mean +/- SD, calculated based on the three independent repeat means.
There are 9 levels of Ede1, resulting in 36 possible pairwise comparisons. This is too unwieldy for any plot, at least a plot intended to also show the data.
As such, I propose to not even attempt this kind of illustration. An alternative is a compact letter display. In this view, groups sharing at least one letter are not significantly different at a chosen \(\alpha\) (here, 5%).
Pros:
Cons:
As this is also complicated, I propose a final solution: indicate that all groups are significantly different from wild-type in the text, and mark those with a significant improvement from ede1∆PQCC with a star, all while providing a table of p-values for each pairwise comparison in the supplement.
This might be useful in situations where this much data is too much data.
We want to consider:
The design is not ideal here:
Bad design cannot be solved in statistical analysis, but on the bright side, the data is nice and there is no reason for a systematic error to be associated with any particular dataset. Since we are measuring an absolute property (number of sites), and not something directly proportional to exposure settings like intensity, the influence of lamp power etc. should be limited
If the experiment was not disconnected, we could try to see if the effects of date / dataset can be modeled.
Because it is disconnected, it’s a bit more tricky. For example we cannot really get useful information from trying to model date as a fixed effect after ede1. What we could do for starters is to make a model which disregards the dataset information, and look if the residuals are clustered by date or dataset, informing us about whether this actually matters.
pooled_lm <- lm(density ~ ede1, data = sla1_density)
plot(as.factor(sla1_density$date), resid(pooled_lm),
xlab = '', ylab = 'Residuals', las = 2,
main = 'Univariate model residuals by date')
These seem to be very well distributed around zero. The exception is the single mCherry acquisition on 10/02/2022, which, to be honest, does seem like an outlier.
So what to do? The decision seems fairly arbitrary. I do not necessarily think that the univariate model on all cells would be wrong, but it does seem like a controversial issue. A model of replicate means (means-pooled analysis) will be conservative and lose power, but it seems appropriate for once, considering the segregated design.
We will test the null hypothesis that mean Sla1 density is the same across different Ede1 strains. We will use repeat-level data for the tests to account for experimental variability.
We can use ANOVA and a post-hoc test to find out how likely this data is to occur under the null hypothesis.
More importantly, we will try to find the effect sizes of different contrasts.
We start by making a model of density_mean ~ ede1:
##
## Call:
## lm(formula = density_mean ~ ede1, data = sla1_density_stats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.031280 -0.017512 -0.002544 0.013238 0.053661
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.55694 0.01613 34.531 < 2e-16 ***
## ede1∆PQCC -0.22409 0.02281 -9.825 1.17e-08 ***
## ede1ede1∆ -0.26672 0.02281 -11.694 7.64e-10 ***
## ede1mCherry -0.23852 0.02281 -10.457 4.47e-09 ***
## ede1dTom -0.23649 0.02281 -10.368 5.11e-09 ***
## ede1Khc -0.26148 0.02281 -11.464 1.05e-09 ***
## ede1Eg5 -0.21968 0.02281 -9.631 1.59e-08 ***
## ede1Snf5 -0.16488 0.02281 -7.229 1.01e-06 ***
## ede1Sup35 -0.13975 0.02281 -6.127 8.70e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02794 on 18 degrees of freedom
## Multiple R-squared: 0.9236, Adjusted R-squared: 0.8897
## F-statistic: 27.21 on 8 and 18 DF, p-value: 1.573e-08
## Analysis of Variance Table
##
## Response: density_mean
## Df Sum Sq Mean Sq F value Pr(>F)
## ede1 8 0.169852 0.0212316 27.207 1.573e-08 ***
## Residuals 18 0.014047 0.0007804
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
One-way ANOVA on means-pooled data rejects the null with \(p = 1.5 \times 10^{-8}\).
This is one problem with the small N of replicate means: there are definitely differences in variance for different levels, but we cannot tell whether they reflect a difference in populations or just noise. But it’s probably the lattter; if we go back to the cell-level observations modeled by pooled_lm we can see that the residuals are fine there, too:
The following are the model estimates of mean densities associates with each Ede1 level:
## ede1 emmean SE df lower.CL upper.CL
## EDE1 0.557 0.0161 18 0.523 0.591
## ∆PQCC 0.333 0.0161 18 0.299 0.367
## ede1∆ 0.290 0.0161 18 0.256 0.324
## mCherry 0.318 0.0161 18 0.285 0.352
## dTom 0.320 0.0161 18 0.287 0.354
## Khc 0.295 0.0161 18 0.262 0.329
## Eg5 0.337 0.0161 18 0.303 0.371
## Snf5 0.392 0.0161 18 0.358 0.426
## Sup35 0.417 0.0161 18 0.383 0.451
##
## Confidence level used: 0.95
We are not necessarily interested in every possible comparison, but we do want to obtain all the p-values at least against ∆PQCC (is there a rescue?) and wild-type (is it complete?). Tukey’s test for contrasts of all 9 estimates:
## contrast estimate SE df t.ratio p.value
## EDE1 - ∆PQCC 0.22409 0.0228 18 9.825 <.0001
## EDE1 - ede1∆ 0.26672 0.0228 18 11.694 <.0001
## EDE1 - mCherry 0.23852 0.0228 18 10.457 <.0001
## EDE1 - dTom 0.23649 0.0228 18 10.368 <.0001
## EDE1 - Khc 0.26148 0.0228 18 11.464 <.0001
## EDE1 - Eg5 0.21968 0.0228 18 9.631 <.0001
## EDE1 - Snf5 0.16488 0.0228 18 7.229 <.0001
## EDE1 - Sup35 0.13975 0.0228 18 6.127 0.0002
## ∆PQCC - ede1∆ 0.04263 0.0228 18 1.869 0.6407
## ∆PQCC - mCherry 0.01443 0.0228 18 0.633 0.9991
## ∆PQCC - dTom 0.01240 0.0228 18 0.543 0.9997
## ∆PQCC - Khc 0.03738 0.0228 18 1.639 0.7727
## ∆PQCC - Eg5 -0.00441 0.0228 18 -0.193 1.0000
## ∆PQCC - Snf5 -0.05921 0.0228 18 -2.596 0.2537
## ∆PQCC - Sup35 -0.08434 0.0228 18 -3.698 0.0341
## ede1∆ - mCherry -0.02820 0.0228 18 -1.236 0.9371
## ede1∆ - dTom -0.03023 0.0228 18 -1.326 0.9105
## ede1∆ - Khc -0.00525 0.0228 18 -0.230 1.0000
## ede1∆ - Eg5 -0.04704 0.0228 18 -2.062 0.5245
## ede1∆ - Snf5 -0.10184 0.0228 18 -4.465 0.0071
## ede1∆ - Sup35 -0.12697 0.0228 18 -5.567 0.0007
## mCherry - dTom -0.00204 0.0228 18 -0.089 1.0000
## mCherry - Khc 0.02295 0.0228 18 1.006 0.9803
## mCherry - Eg5 -0.01884 0.0228 18 -0.826 0.9943
## mCherry - Snf5 -0.07364 0.0228 18 -3.229 0.0847
## mCherry - Sup35 -0.09877 0.0228 18 -4.330 0.0094
## dTom - Khc 0.02499 0.0228 18 1.096 0.9675
## dTom - Eg5 -0.01680 0.0228 18 -0.737 0.9974
## dTom - Snf5 -0.07161 0.0228 18 -3.139 0.1000
## dTom - Sup35 -0.09673 0.0228 18 -4.241 0.0113
## Khc - Eg5 -0.04179 0.0228 18 -1.832 0.6627
## Khc - Snf5 -0.09660 0.0228 18 -4.235 0.0114
## Khc - Sup35 -0.12172 0.0228 18 -5.337 0.0012
## Eg5 - Snf5 -0.05480 0.0228 18 -2.403 0.3387
## Eg5 - Sup35 -0.07993 0.0228 18 -3.504 0.0500
## Snf5 - Sup35 -0.02512 0.0228 18 -1.102 0.9665
##
## P value adjustment: tukey method for comparing a family of 9 estimates
This is a lot; perhaps we can put it in a graphical matrix.
Effect sizes are perhaps more important than the p-values. Here I will calculate the classical Cohen’s d, defined as the mean differences over population standard deviation. Despite using means-pooled model to estimate contrast p-values and population means, I am confident that it is more appropriate to take the pooled observation-level SD for calculating the effect size.
Therefore \(\sigma\) will be a mean of SD for each Ede1 level, weighed by \(n-1\) (departing from emmeans default suggestion of residual SD). I am not quite sure what the appropriate degrees of freedom are, I used sample size minus Ede1 level count. In any case, it affects the 95% CI’s of effect size estimate, but not the estimate itself.
Below are effect sizes calculated for comparisons against wild-type and against ∆PQCC.
## contrast effect.size SE df lower.CL upper.CL
## ∆PQCC - EDE1 -2.11 0.217 18 -2.56 -1.651
## ede1∆ - EDE1 -2.51 0.219 18 -2.97 -2.049
## mCherry - EDE1 -2.24 0.218 18 -2.70 -1.786
## dTom - EDE1 -2.22 0.218 18 -2.68 -1.767
## Khc - EDE1 -2.46 0.219 18 -2.92 -2.000
## Eg5 - EDE1 -2.07 0.217 18 -2.52 -1.609
## Snf5 - EDE1 -1.55 0.216 18 -2.00 -1.097
## Sup35 - EDE1 -1.31 0.216 18 -1.77 -0.861
##
## sigma used for effect sizes: 0.1063
## Confidence level used: 0.95
## contrast effect.size SE df lower.CL upper.CL
## EDE1 - ∆PQCC 2.1076 0.217 18 1.651 2.5645
## ede1∆ - ∆PQCC -0.4009 0.215 18 -0.852 0.0500
## mCherry - ∆PQCC -0.1357 0.215 18 -0.586 0.3150
## dTom - ∆PQCC -0.1166 0.215 18 -0.567 0.3341
## Khc - ∆PQCC -0.3516 0.215 18 -0.802 0.0993
## Eg5 - ∆PQCC 0.0415 0.215 18 -0.409 0.4921
## Snf5 - ∆PQCC 0.5569 0.215 18 0.106 1.0080
## Sup35 - ∆PQCC 0.7932 0.215 18 0.342 1.2448
##
## sigma used for effect sizes: 0.1063
## Confidence level used: 0.95
The p-values for different contrasts above are a result of a quite conservative model, which itself is a result of suboptimal design. In a pooled observation model, for example, we would find a couple more significant effects due to larger sample size:
## EDE1 ∆PQCC ede1∆ mCherry dTom Khc Eg5 Snf5 Sup35
## EDE1 [0.558] <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
## ∆PQCC [0.333] 0.0024 0.9274 0.9199 0.0149 1.0000 <.0001 <.0001
## ede1∆ [0.290] 0.1551 0.2413 1.0000 0.0008 <.0001 <.0001
## mCherry [0.319] 1.0000 0.4044 0.8021 <.0001 <.0001
## dTom [0.318] 0.5249 0.7960 <.0001 <.0001
## Khc [0.295] 0.0057 <.0001 <.0001
## Eg5 [0.336] <.0001 <.0001
## Snf5 [0.395] 0.3849
## Sup35 [0.418]
##
## Row and column labels: ede1
## Upper triangle: P values adjust = "tukey"
## Diagonal: [Estimates] (emmean)
Here we get low p-values for the differences between Snf5 and everything else, as well as ∆PQCC and actually less dense Khc/ede1∆.
This model would generate group means and effect sizes pretty much the same as the means-pooled model, but drastically different p-values. This serves to underscore that p-values should be treated with healthy skepticism, and play a minor role in the interpretation of the results.
Some would say that this model is an example of pseudoreplication, but this is a bit of a grey area. Yeast cells from different datasets grow in the same medium, incubator and so on, but they are individual organisms. Even if a ‘date effect’ exists beyond a simple sampling variation, there is no reason to think that the effect is consistent across mutants: after all, each mutant grows in a separate tube. If we try to look at an interaction plot, we cannot see a consistent date effect:
Of course this is undermined by the disconnect in date / mutant combinations.
Can we generate a linear mixed model for this, using date as a random effect? Because there does not seem to be any consistent effect of date in itself, let’s add an interaction term.
date_lmm1 <- lmer(density ~ ede1 + (1|date), data = sla1_density)
date_lmm2 <- lmer(density ~ ede1 + (1|ede1:date), data = sla1_density)
date_lmm3 <- lmer(density ~ ede1 + (1|date) + (1|ede1:date), data = sla1_density)
## Data: sla1_density
## Models:
## date_lmm1: density ~ ede1 + (1 | date)
## date_lmm2: density ~ ede1 + (1 | ede1:date)
## date_lmm3: density ~ ede1 + (1 | date) + (1 | ede1:date)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## date_lmm1 11 -2879.1 -2818.9 1450.5 -2901.1
## date_lmm2 11 -2887.4 -2827.2 1454.7 -2909.4 8.2988 0
## date_lmm3 12 -2885.4 -2819.8 1454.7 -2909.4 0.0017 1 0.967
Using the anova table for mixed models from lmerTest we actually find the interaction-only model date_lmm2 has the lowest AIC.
Single-term deletions of effects from the main + interaction model date_lmm3 suggest that the interaction model is significant and the main date effect is not.
## ANOVA-like table for random-effects: Single term deletions
##
## Model:
## density ~ ede1 + (1 | date) + (1 | ede1:date)
## npar logLik AIC LRT Df Pr(>Chisq)
## <none> 12 1424.9 -2825.8
## (1 | date) 11 1424.9 -2827.8 0.0007 1 0.9796497
## (1 | ede1:date) 11 1417.8 -2813.6 14.1967 1 0.0001647 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What does this mean? I am getting out of my depth here, but I think that we simply cannot assign a meaningful change of intercept based on date because the effect is different for every Ede1 level (hence the interaction). Again, this makes sense because while theoretically there could be some issues affecting all strains on a given day (say, incubator failure), this just did not happen and so it’s not reflected in the data beyond a simple sampling error and that is going to be unique to each sample.
Note that this is very unlike the dataset in Figure 3, where any change in the microscope will affect intensity readings.
Let’s look at what the interaction-only model says about ede1 effects on density. First, how do model residuals look? Pretty good, but it’s not like the pooled observation-only model had any problems.
Even if we accept the interaction-only model, the random factor does not explain much of the variance anyway:
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: density ~ ede1 + (1 | ede1:date)
## Data: sla1_density
##
## REML criterion at convergence: -2849.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6089 -0.6576 -0.0241 0.6473 3.8921
##
## Random effects:
## Groups Name Variance Std.Dev.
## ede1:date (Intercept) 0.0006184 0.02487
## Residual 0.0108813 0.10431
## Number of obs: 1747, groups: ede1:date, 27
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.55716 0.01628 18.50247 34.224 < 2e-16 ***
## ede1∆PQCC -0.22433 0.02291 18.12580 -9.793 1.15e-08 ***
## ede1ede1∆ -0.26700 0.02302 18.47895 -11.601 6.37e-10 ***
## ede1mCherry -0.23854 0.02294 18.20417 -10.401 4.33e-09 ***
## ede1dTom -0.23743 0.02318 18.98038 -10.241 3.62e-09 ***
## ede1Khc -0.26183 0.02307 18.64776 -11.350 8.20e-10 ***
## ede1Eg5 -0.22015 0.02295 18.26477 -9.592 1.48e-08 ***
## ede1Snf5 -0.16452 0.02279 17.73636 -7.220 1.12e-06 ***
## ede1Sup35 -0.13977 0.02300 18.39854 -6.078 8.74e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) e1∆PQC ed1d1∆ ed1mCh ed1dTm ed1Khc ed1Eg5 ed1Sn5
## ede1∆PQCC -0.711
## ede1ede1∆ -0.707 0.503
## ede1mCherry -0.710 0.504 0.502
## ede1dTom -0.702 0.499 0.497 0.498
## ede1Khc -0.706 0.502 0.499 0.501 0.496
## ede1Eg5 -0.709 0.504 0.502 0.503 0.498 0.501
## ede1Snf5 -0.714 0.508 0.505 0.507 0.502 0.504 0.507
## ede1Sup35 -0.708 0.503 0.501 0.503 0.497 0.500 0.502 0.506
We have 5% of variance explained by the 1|ede1:date term in the model.
Let’s quickly generate group means (diagonal) and p-values for contrasts, with Tukey adjustment:
## EDE1 ∆PQCC ede1∆ mCherry dTom Khc Eg5 Snf5 Sup35
## EDE1 [0.557] <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0003
## ∆PQCC [0.333] 0.6443 0.9992 0.9996 0.7755 1.0000 0.2402 0.0347
## ede1∆ [0.290] 0.9359 0.9265 1.0000 0.5368 0.0070 0.0008
## mCherry [0.319] 1.0000 0.9795 0.9952 0.0820 0.0099
## dTom [0.320] 0.9748 0.9971 0.0935 0.0113
## Khc [0.295] 0.6746 0.0112 0.0012
## Eg5 [0.337] 0.3189 0.0500
## Snf5 [0.393] 0.9687
## Sup35 [0.417]
##
## Row and column labels: ede1
## Upper triangle: P values adjust = "tukey"
## Diagonal: [Estimates] (emmean)
The values are strikingly similar to what we saw with the means-pooled model. Why? I suspect that these are actually equivalent because the mixed model we ended up with creates 27 unique groups with no shared information by ede1 / date combination just as we have 27 ede1 / dataset unique group means in the replicate_lm model.
Final estimates with lower / upper 95% confidence intervals, and a common-sense comparison to wild type (in %). half_ci is half of the confidence interval, for writing CI ranges in the format mean +/- error.
| ede1 | mean | lower | upper | proc_wt | half_ci |
|---|---|---|---|---|---|
| EDE1 | 0.557 | 0.471 | 0.642 | 100 | 0.085 |
| ∆PQCC | 0.333 | 0.297 | 0.368 | 60 | 0.036 |
| ede1∆ | 0.290 | 0.268 | 0.312 | 52 | 0.022 |
| mCherry | 0.318 | 0.202 | 0.434 | 57 | 0.116 |
| dTom | 0.320 | 0.251 | 0.390 | 58 | 0.070 |
| Khc | 0.295 | 0.266 | 0.325 | 53 | 0.030 |
| Eg5 | 0.337 | 0.267 | 0.408 | 61 | 0.070 |
| Snf5 | 0.392 | 0.315 | 0.469 | 70 | 0.077 |
| Sup35 | 0.417 | 0.352 | 0.483 | 75 | 0.066 |
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] emmeans_1.7.2 lmerTest_3.1-3 lme4_1.1-27.1 Matrix_1.3-3
## [5] multcompView_0.1-8 ggsignif_0.6.2 ggbeeswarm_0.6.0 knitr_1.36
## [9] broom_0.7.9 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
## [13] purrr_0.3.4 readr_2.0.1 tidyr_1.1.3 tibble_3.1.3
## [17] ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] TH.data_1.1-0 minqa_1.2.4 colorspace_2.0-2
## [4] ellipsis_0.3.2 estimability_1.3 htmlTable_2.2.1
## [7] base64enc_0.1-3 fs_1.5.0 rstudioapi_0.13
## [10] farver_2.1.0 fansi_0.5.0 mvtnorm_1.1-3
## [13] lubridate_1.7.10 xml2_1.3.2 codetools_0.2-18
## [16] splines_4.1.0 Formula_1.2-4 jsonlite_1.7.2
## [19] nloptr_1.2.2.2 pbkrtest_0.5.1 cluster_2.1.2
## [22] dbplyr_2.1.1 png_0.1-7 compiler_4.1.0
## [25] httr_1.4.2 backports_1.2.1 assertthat_0.2.1
## [28] cli_3.1.0 htmltools_0.5.1.1 tools_4.1.0
## [31] gtable_0.3.0 glue_1.4.2 Rcpp_1.0.7
## [34] cellranger_1.1.0 jquerylib_0.1.4 vctrs_0.3.8
## [37] nlme_3.1-152 xfun_0.25 rvest_1.0.1
## [40] mime_0.11 lifecycle_1.0.0 MASS_7.3-54
## [43] zoo_1.8-9 scales_1.1.1 hms_1.1.0
## [46] parallel_4.1.0 sandwich_3.0-1 RColorBrewer_1.1-2
## [49] yaml_2.2.1 gridExtra_2.3 rpart_4.1-15
## [52] latticeExtra_0.6-29 stringi_1.7.3 highr_0.9
## [55] checkmate_2.0.0 boot_1.3-28 rlang_0.4.11
## [58] pkgconfig_2.0.3 evaluate_0.14 lattice_0.20-44
## [61] htmlwidgets_1.5.3 labeling_0.4.2 tidyselect_1.1.1
## [64] magrittr_2.0.1 R6_2.5.1 generics_0.1.0
## [67] Hmisc_4.5-0 multcomp_1.4-18 DBI_1.1.1
## [70] pillar_1.6.2 haven_2.4.3 foreign_0.8-81
## [73] withr_2.4.2 survival_3.2-11 nnet_7.3-16
## [76] modelr_0.1.8 crayon_1.4.1 utf8_1.2.2
## [79] tzdb_0.1.2 rmarkdown_2.11 jpeg_0.1-9
## [82] grid_4.1.0 readxl_1.3.1 data.table_1.14.0
## [85] reprex_2.0.1 digest_0.6.27 xtable_1.8-4
## [88] numDeriv_2016.8-1.1 munsell_0.5.0 beeswarm_0.4.0
## [91] vipor_0.4.5