logistic regression example - con’t

Author

Stephen Skalicky

Published

April 22, 2026

Reminder of the data:

This notebook uses data from Kim et al. (2019)

Kim, Y., Jung, Y., & Skalicky, S. (2019). Linguistic alignment, learner characteristics, and the production of stranded prepositions in relative clauses. Studies in Second Language Acquisition, 41(5), 937–969.

This research compared the degree of primed production of English stranded prepositions among Korean learners of English. Learners completed an alignment session, where half of their input trials included a stranded preposition, and half did not. After each trial, we assessed whether the participants produced (or did not produce) a stranded preposition. If a participant produced a stranded preposition after an input trial containing a stranded preposition, this was taken as evidence of alignment. We also measured the degree of learning of stranded prepositions from the alignment sessions using a pre/immediate/delayed posttest design. These two questions are further nested within a comparison of modality: half of the participants completed the alignment session in a face-to-fact (FTF) context, whereas the other half completed the session in a synchronous computer-mediated context (SCMC). A separate control condition only completed the pre/immediate/delayed posttests and did not participate in the alignment sessions.

As such there are two main analyses:

1. What is the degree of linguistic alignment, and are there differences between FTF/SCMC modality?
2. Does the alignment session lead to learning of stranded prepositions, and are there differences between FTF/SCMC modality?

We answered both questions using logistic regression, which determines the probability of a binary outcome (yes/no, accurate/innacurate, etc) based on independent variables (modality, pre/post, etc.).

Primed production data

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(emmeans) # model posthoc

Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'

library(performance) # check models

dat <- read_csv('sp-priming.csv')

Rows: 4608 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): verb, modality, test, session, trial_type
dbl (7): subject, score, trial_order, cloze, wmc, prod_pre, rec_pre

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(dat)

Rows: 4,608
Columns: 12
$ subject     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ verb        <chr> "pet", "organize", "burn", "stack", "toast", "need", "cut"…
$ score       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ modality    <chr> "FTF", "FTF", "FTF", "FTF", "FTF", "FTF", "FTF", "FTF", "F…
$ test        <chr> "priming1", "priming1", "priming1", "priming1", "priming1"…
$ trial_order <dbl> 12, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,…
$ session     <chr> "session1", "session1", "session1", "session1", "session1"…
$ trial_type  <chr> "non-prime", "prime", "non-prime", "prime", "prime", "non-…
$ cloze       <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40…
$ wmc         <dbl> 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74…
$ prod_pre    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ rec_pre     <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6…

Previous glm model with one predictor:

m1 <- glm(score ~ trial_type, data = dat, family = binomial(link = 'logit'))

summary(m1)


Call:
glm(formula = score ~ trial_type, family = binomial(link = "logit"), 
    data = dat)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -2.82497    0.09062  -31.18   <2e-16 ***
trial_typeprime  2.34918    0.10024   23.44   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4851.5  on 4607  degrees of freedom
Residual deviance: 4061.6  on 4606  degrees of freedom
AIC: 4065.6

Number of Fisher Scoring iterations: 5

adding another predictor - group

We can see what looks like a group difference in the descriptives:

# wrapped factor() around score to make ggplot not apply continuous fill.
ggplot(dat, aes(fill = factor(score),x = trial_type)) + 
  facet_wrap(. ~ modality) + 
  geom_bar(position = 'dodge') + 
  # aesthetics and styling
  theme_bw() + 
  scale_fill_grey(start = .4, end = .75) +
  labs(fill = 'production:\n 0 = no, 1 = yes', title = 'frequency of producing a stranded prepsosition', subtitle = 'prime = previous sentence had a stranded preposition', x = "Trial Type", caption = 'FTF = face to face; SCMC = synchronous computer mediated chat')

This was part of our hypothesis - face to face is spoken and relies more on memory, whereas face to face leaves the text on the screen and could potentially prime more and/or reduce cognitive burdens during communication

First let’s fit and observe a main effects only model:

m2 <- glm(score ~ trial_type + modality, data = dat, family = binomial(link = 'logit'))
summary(m2)


Call:
glm(formula = score ~ trial_type + modality, family = binomial(link = "logit"), 
    data = dat)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -3.26718    0.10385  -31.46   <2e-16 ***
trial_typeprime  2.40373    0.10150   23.68   <2e-16 ***
modalitySCMC     0.81463    0.07929   10.27   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4851.5  on 4607  degrees of freedom
Residual deviance: 3953.6  on 4605  degrees of freedom
AIC: 3959.6

Number of Fisher Scoring iterations: 5

model_performance(m2)

# Indices of model performance

AIC    |   AICc |    BIC | Tjur's R2 |  RMSE | Sigma | Log_loss | Score_log
---------------------------------------------------------------------------
3959.6 | 3959.6 | 3978.9 |     0.187 | 0.373 |     1 |    0.429 |  -278.677

AIC    | Score_spherical |   PCP
--------------------------------
3959.6 |           0.003 | 0.721

We can obtain the model predictions for each modality / trial type combination.

How do you interpret this output?

# logit scale
emmeans(m2,   ~ modality|trial_type)

trial_type = non-prime:
 modality  emmean     SE  df asymp.LCL asymp.UCL
 FTF      -3.2672 0.1040 Inf    -3.471    -3.064
 SCMC     -2.4525 0.0956 Inf    -2.640    -2.265

trial_type = prime:
 modality  emmean     SE  df asymp.LCL asymp.UCL
 FTF      -0.8635 0.0589 Inf    -0.979    -0.748
 SCMC     -0.0488 0.0591 Inf    -0.165     0.067

Results are given on the logit (not the response) scale. 
Confidence level used: 0.95

# response scale
emmeans(m2,   ~ modality|trial_type, type = 'response' )

trial_type = non-prime:
 modality   prob      SE  df asymp.LCL asymp.UCL
 FTF      0.0367 0.00367 Inf    0.0302    0.0446
 SCMC     0.0793 0.00698 Inf    0.0666    0.0940

trial_type = prime:
 modality   prob      SE  df asymp.LCL asymp.UCL
 FTF      0.2966 0.01230 Inf    0.2731    0.3213
 SCMC     0.4878 0.01480 Inf    0.4589    0.5167

Confidence level used: 0.95 
Intervals are back-transformed from the logit scale

pairwise - fail

If we ask for pairwise contrasts between the groups at each trial type, we get the same numbers/results. why?

emmeans(m2, pairwise ~ modality|trial_type)$contrast

trial_type = non-prime:
 contrast   estimate     SE  df z.ratio p.value
 FTF - SCMC   -0.815 0.0793 Inf -10.274  <.0001

trial_type = prime:
 contrast   estimate     SE  df z.ratio p.value
 FTF - SCMC   -0.815 0.0793 Inf -10.274  <.0001

Results are given on the log odds ratio (not the response) scale.

The problem is the terms are not fit as an interaction in the model - yet emmeans has no real problem showing us what we ask for. But it will not post-hoc fit the model to the interaction. An example of the code doing exactly what you ask for, but you have to know what it is you want and why it does/does not work

fit the interaction

Can you interpret this output?

Note

intercept: log odds for non-prime, FTF
trial_typeprime: difference in log odds of prime:FTF to non-prime:FTF
modalitySCMC: differences in log odds of non-prime:SCMS to non-prime:FTF
trial_typeprime:modalitySCMC: differences of differences - how much more is effect of prime for SCMC when compared to FTF

m3 <- glm(score ~ trial_type * modality, data = dat, family = binomial(link = 'logit'))
summary(m3)


Call:
glm(formula = score ~ trial_type * modality, family = binomial(link = "logit"), 
    data = dat)

Coefficients:
                             Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   -2.9613     0.1302 -22.738  < 2e-16 ***
trial_typeprime                2.0372     0.1444  14.104  < 2e-16 ***
modalitySCMC                   0.2815     0.1814   1.552  0.12076    
trial_typeprime:modalitySCMC   0.6541     0.2016   3.244  0.00118 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4851.5  on 4607  degrees of freedom
Residual deviance: 3943.2  on 4604  degrees of freedom
AIC: 3951.2

Number of Fisher Scoring iterations: 5

We can use a table to help us wrap our heads around the interaction term again:

	FTF	SCMC	Difference (SCMC−FTF)
Non-prime	−2.9613	−2.9613 + 0.2815 = −2.6798	+0.2815
Prime	−2.9613 + 2.0372 = −0.9241	−2.9613 + 2.0372 + 0.2815 + 0.6541 = −0.0285	+0.2815 + 0.6541 = +0.9356
Difference (prime−non-prime)	+2.0372	+2.0372 + 0.6541 = +2.6913	+0.6541

model comparisons

anova(m2, m3)

Analysis of Deviance Table

Model 1: score ~ trial_type + modality
Model 2: score ~ trial_type * modality
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
1      4605     3953.6                        
2      4604     3943.2  1     10.4 0.001261 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_performance(m2)

# Indices of model performance

AIC    |   AICc |    BIC | Tjur's R2 |  RMSE | Sigma | Log_loss | Score_log
---------------------------------------------------------------------------
3959.6 | 3959.6 | 3978.9 |     0.187 | 0.373 |     1 |    0.429 |  -278.677

AIC    | Score_spherical |   PCP
--------------------------------
3959.6 |           0.003 | 0.721

model_performance(m3)

# Indices of model performance

AIC    |   AICc |    BIC | Tjur's R2 |  RMSE | Sigma | Log_loss | Score_log
---------------------------------------------------------------------------
3951.2 | 3951.2 | 3976.9 |     0.191 | 0.372 |     1 |    0.428 |  -280.541

AIC    | Score_spherical |   PCP
--------------------------------
3951.2 |           0.002 | 0.723

pairwise - no fail

Can you interpret this interaction?

# logit scale
emmeans(m3, revpairwise ~ modality|trial_type)$contrast

trial_type = non-prime:
 contrast   estimate     SE  df z.ratio p.value
 SCMC - FTF    0.281 0.1810 Inf   1.552  0.1208

trial_type = prime:
 contrast   estimate     SE  df z.ratio p.value
 SCMC - FTF    0.936 0.0879 Inf  10.639  <.0001

Results are given on the log odds ratio (not the response) scale.

# response scale (odds ratio and not probability - why?)
emmeans(m3, revpairwise ~ modality|trial_type, type = 'response')$contrast

trial_type = non-prime:
 contrast   odds.ratio    SE  df null z.ratio p.value
 SCMC / FTF       1.33 0.240 Inf    1   1.552  0.1208

trial_type = prime:
 contrast   odds.ratio    SE  df null z.ratio p.value
 SCMC / FTF       2.55 0.224 Inf    1  10.639  <.0001

Tests are performed on the log odds ratio scale

plot the significant interaction

# save the emmeans result to a dataframe

interaction_dat <- as.data.frame(emmeans(m3, ~ modality|trial_type, type = 'response'))

It’s the emmeans represented as a data frame with confidence limits, etc.

glimpse(interaction_dat)

Rows: 4
Columns: 7
$ modality   <fct> FTF, SCMC, FTF, SCMC
$ trial_type <fct> non-prime, non-prime, prime, prime
$ prob       <dbl> 0.04920635, 0.06417625, 0.28412698, 0.50287356
$ SE         <dbl> 0.006093165, 0.007584584, 0.012705418, 0.015474356
$ df         <dbl> Inf, Inf, Inf, Inf
$ asymp.LCL  <dbl> 0.0385482, 0.0508197, 0.2598955, 0.4725710
$ asymp.UCL  <dbl> 0.06261942, 0.08074457, 0.30967239, 0.53315505

ggplot(interaction_dat, aes( y = prob, x = trial_type, shape = modality, lty = modality, group = modality)) + 
  geom_point() +
  theme_bw() + 
  geom_line() + 
  geom_errorbar(aes(ymin = asymp.LCL, ymax = asymp.UCL), width = .1) + 
  labs(title = 'Probability of primed production', y = 'Probablity', x = '') + 
  theme(legend.title = element_blank())

check_model(m3)

individual variation:

What do we think about individual variation?

dat %>%
  group_by(subject, modality) %>%
  summarise(accuracy = mean(score, na.rm = TRUE)) %>%
  ggplot(aes(x = reorder(factor(subject), accuracy), y = accuracy)) +
  facet_wrap(. ~ modality, scales = 'free_x') + 
  geom_col() +
  geom_hline(yintercept = mean(dat$score, na.rm = TRUE), linetype = "dashed", color = "red") +
  labs(x = "Subject", y = "Production of Stranded Preposition", title = "Production Score by Subject") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

`summarise()` has grouped output by 'subject'. You can override using the
`.groups` argument.