Evidence vs p-values: the importance of Bayes Factors.

Applied in a case where I analyzed the radiofrequency of electromagnetic fields in the human brain.

22 min readFeb 12, 2023

Those of you who have read some of my posts know that this is not the first post on Bayesian analysis, nor the first post showing how p-values do not offer much information in terms of decision making and modelling. Despite being trained in the ‘classical statistics’, I am now actively avoiding p-values as much as possible sticking as much as I can with Likelihood Ratios (LRs) or Bayes Factors (BFs). For those of you who have no idea what a LR or a BF is, let me try and show you using a real example estimating the effects of an intervention on the radiofrequency of electromagnetic fields measured in the human brain. The dataset carries information for 32 patients, and for 6 brain locations, resulting in 192 data points. These datapoints are the differences in the ON-OFF parts of each intervention, of which we have two. So, we have two groups, six locations, and 32 patients. Leading tot 192 difference scores which can then be compared between groups and / or between brain locations.

#### CLEAR ENVIRONMENT ####
rm(list = ls())

#### LOAD LIBRARIES ####
library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(brms)
library(tidybayes)
library(emmeans)
library(bayesplot)
library(bayesrules)
library(rethinking)
library(forcats)
library(lme4)
library(bayestestR)
library(modelr)

#### LOAD & WRANGLE DATA ####
df <- read_excel("xxxx.xlsx", sheet = "xxx.xlsx (2)")
df_long<-df%>%
  tidyr::pivot_longer(!c(Patient, Group))%>%
  tidyr::separate(., col=name,into=c('location', 'time'),sep=" ")
df2 <- read_excel("xxxx.xlsx")
df2<-df2%>%dplyr::mutate(bin = ifelse(Difference>0, 1, 0))
df2_1<-df2%>%dplyr::filter(Session==1)
df2_2<-df2%>%dplyr::filter(Session==2)
df_bayes<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  dplyr::mutate(binair = if_else(value<0, -1, 1))
write.csv(df_bayes, "xxxxx.csv")

df_bayes%>%print(n=100)

# A tibble: 192 x 6
    Patient Group location      time       value binair
      <dbl> <chr> <chr>         <chr>      <dbl>  <dbl>
  1       1 A     Central       overall -2.23        -1
  2       1 A     Frontal       overall -3.14        -1
  3       1 A     Occipital     overall -3.39        -1
  4       1 A     Parietal      overall -3.11        -1
  5       1 A     Temporalleft  overall -2.95        -1
  6       1 A     Temporalright overall -2.96        -1
  7       2 A     Central       overall  0.452        1
  8       2 A     Frontal       overall  0.936        1
  9       2 A     Occipital     overall  0.463        1
 10       2 A     Parietal      overall  0.308        1
 11       2 A     Temporalleft  overall  0.669        1
 12       2 A     Temporalright overall  0.443        1
 13       3 A     Central       overall -0.0113      -1
 14       3 A     Frontal       overall -0.103       -1
 15       3 A     Occipital     overall  0.574        1
 16       3 A     Parietal      overall -0.00773     -1
 17       3 A     Temporalleft  overall  0.0191       1
 18       3 A     Temporalright overall  0.297        1
 19       4 A     Central       overall -0.229       -1
 20       4 A     Frontal       overall  0.0976       1
 21       4 A     Occipital     overall -0.296       -1
 22       4 A     Parietal      overall -0.323       -1
 23       4 A     Temporalleft  overall -0.210       -1
 24       4 A     Temporalright overall -0.257       -1
 25       5 A     Central       overall  0.246        1
 26       5 A     Frontal       overall  0.926        1
 27       5 A     Occipital     overall  0.314        1
 28       5 A     Parietal      overall  0.478        1
 29       5 A     Temporalleft  overall  0.337        1
 30       5 A     Temporalright overall  0.389        1
 31       6 A     Central       overall  0.395        1
 32       6 A     Frontal       overall  0.445        1
 33       6 A     Occipital     overall  0.746        1
 34       6 A     Parietal      overall  0.809        1
 35       6 A     Temporalleft  overall  0.576        1
 36       6 A     Temporalright overall  0.469        1
 37       7 A     Central       overall -0.186       -1
 38       7 A     Frontal       overall -0.599       -1
 39       7 A     Occipital     overall -0.700       -1
 40       7 A     Parietal      overall  0.0267       1

df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(., aes(x=time, 
                y=value, 
                col=factor(Group)))+
  geom_boxplot()+
  theme_bw()

df_long%>%
  dplyr::filter(!time %in% c('overall'))%>%
  ggplot(., aes(x=value, 
                fill=factor(Group)))+
  geom_density(alpha=0.5)+
  theme_bw()+
  facet_wrap(~location)

df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(., aes(x=location, 
                y=value, 
                col=factor(Group)))+
  geom_boxplot()+
  theme_bw()

Values between group, and group * location.

Based on the data, there are multiple models that can be made, and for each of them we will see to what extent the model is supported by the data. In other words: how much evidence there is in favor of the model compared to another model given the data. The following models can be made, although not all of them really make sense. But in model land, you can pretty much make what you like:

Estimating overall sigma — mu is set to zero
Estimating grand mu and overall sigma
Estimating mu per group and overall sigma
Estimating mu per location and overall sigma
Estimating mu per group and mu per location and overall sigma

The latter model can either be modelled as a main effect, or as interaction effects. Below you will see a graphical depiction of what the models will look for in each of the situations I put them in, which is basically splitting the data more and more looking if that makes sense.

g1<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(., aes(
    y=value, 
    x=Patient))+
  geom_point()+
  theme_bw()+
  labs(title="Raw data")

g2<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(., aes(
    y=value, 
    x=Patient))+
  geom_point()+
  geom_hline(yintercept = 0, col="red", lty=2)+
  theme_bw()+
  labs(title="Estimating sigma")

g3<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(., aes(
    y=value, 
    x=Patient))+
  geom_point()+
  geom_line(aes(y=mean(value)), col="red", lty=2)+
  theme_bw()+
  labs(title="Estimating Location (grand mean) + sigma")

group_mean<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  group_by(Group)%>%
  summarise(mean=mean(value))
g4<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(.)+
  geom_point(aes(y=value,x=Patient, col=Group))+
  geom_hline(data=group_mean, aes(yintercept=mean, 
                                  group=Group), col="black", lty=2)+
  facet_grid(~Group, scales="free")+
  theme_bw()+theme(legend.position = "none")+
  labs(title="Estimating group + sigma")

location_mean<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  group_by(location)%>%
  summarise(mean=mean(value))
g5<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(.)+
  geom_point(aes(y=value,x=Patient, col=location))+
  geom_hline(data=location_mean, aes(yintercept=mean, 
                                  group=location), col="black", lty=2)+
  facet_grid(~location, scales="free")+
  theme_bw()+theme(legend.position = "none")+
  labs(title="Estimating location + sigma")

location_group_mean<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  group_by(location, Group)%>%
  summarise(mean=mean(value))
g6<-df_long%>%
  dplyr::filter(time %in% c('overall'))%>%
  ggplot(.)+
  geom_point(aes(y=value,x=Patient,col=location))+
  geom_hline(data=location_group_mean, aes(yintercept=mean, 
                                     group=interaction(location, Group)), 
             col="black", lty=2)+
  facet_grid(Group~location, scales="free")+
  theme_bw()+theme(legend.position = "none")+
  labs(title="Estimating group + location + sigma")

gridExtra::grid.arrange(g1,g2,g3,g4,g5,g6,ncol=2)

The raw data and the five models depicted — each model splits the data more and more.

We can now actually start modelling and this is where the Bayesian part comes in. I will not talk about Bayes theorem or conditional probabilities, nor about priors, but what I will do is show how models are assessed and compared (LRs), and how posterior density estimates are used for BFs.

The first three models you see below are the overall sigma, grand mu (mean) + overal sigma, and the mu per group and overall sigma. I have given all the models the same priors conditional on whether the parameter is a location parameter or a sigma. Hence, each mu gets a normal distribution (0,0.5). Each sigma also gets a normal distribution but with different characteristics (1,1). We could easily fill a complete book on the adequacy of these priors, but lets not. This example is just to highlight how we can use Bayesian analysis to compare models to assess evidence, and use the most supported model to get estimates. Do take note that the results are posterior density estimates conditional on the prior(s) and the data. This is different in classical statistics in which models are only compared based on the data.

Now, the last model is actually a bit of a strange model. Its the model in which there is no intercept. Normally, this should be exactly the same as the model before that (model 2), but because of the lack of a prior it will not. Hence, we actually have four models. For this exercise it does not matter which models we made, all that matters are the examples.

model0<-
  brms::brm(Difference ~ 0,
            data=df2,
            prior=c(
              prior(normal(1,1),        class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Difference ~ 0 
   Data: df2 (Number of observations: 32) 
  Draws: 4 chains, each with iter = 10000; warmup = 3000; thin = 1;
         total post-warmup draws = 28000

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.92      0.12     0.72     1.19 1.00     9819    13301

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

model1<-
  brms::brm(Difference ~ 1,
            data=df2,
            prior=c(
              prior(normal(0, 0.5),     class = Intercept),
              prior(normal(1,1),        class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 40000,
            warmup = 3000)

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.42      0.14     0.14     0.69 1.00   108272    93022

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.81      0.11     0.63     1.05 1.00   108530    91236

model2<-
  brms::brm(Difference ~ Session,
            data=df2,
            prior=c(
              prior(normal(0, 0.5),     class = Intercept),
              prior(normal(0, 0.5),     class = b,  coef=Session),
              prior(normal(1,1),        class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 40000,
            warmup = 3000)

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.13      0.17    -0.20     0.46 1.00   126456   101150
Session       0.59      0.23     0.13     1.03 1.00   128125   101105

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.72      0.10     0.56     0.94 1.00   117116   100458

model3<-
  brms::brm(Difference ~ 0 + Session,
            data=df2,
            prior=c(
              prior(normal(0, 0.5),     class = b,  coef=Session),
              prior(normal(1,1),        class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)

Population-Level Effects: 
        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Session     0.73      0.17     0.38     1.05 1.00    18100    14890

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.71      0.10     0.55     0.93 1.00    17467    16282

So, we have four models which we can now compare. Do take note that this form of comparison requires a LOT of samples. More then I have allowed here (around 40k minimum, instead of the 10k I used). But in the end, we do get the Bayes Factor which tells us that, compared to model 0, the marginal likelihood is strongest in the model in which I canceled the intercept.

comparison <- bayesfactor_models(model2,model3, model1, denominator = model0)

Warning: Bayes factors might not be precise.
For precise Bayes factors, sampling at least 40,000 posterior samples is recommended.
Computation of Bayes factors: estimating marginal likelihood, please wait...
Warning message:
effective sample size cannot be calculated, has been replaced by number of samples. 

> comparison
Bayes Factors for Model Comparison

    Model                BF
[1] Session          208.53
[2] 0 + Session      613.86
[3] (Intercept only)  19.50

* Against Denominator: [4] 0
* Bayes Factor Type: marginal likelihoods (bridgesampling)

as.matrix(comparison)
# Bayes Factors for Model Comparison 

           Numerator
Denominator          |    [1] |    [2] |   [3] |   [4]
------------------------------------------------------
[1] Session          |      1 |   2.94 | 0.094 | 0.005
[2] 0 + Session      |  0.340 |      1 | 0.032 | 0.002
[3] (Intercept only) |  10.69 |  31.48 |     1 | 0.051
[4] 0                | 208.53 | 613.86 | 19.50 |     1

You can see the matrix above. Now, the Bayes Factor resembles the Likelihood Ratio which is the Ratio of Likelihoods across models, given the prior and the data. It shows that the ‘0 + Session’ model has the highest support. But what is a Bayes Factor now exactly? Well, basically, it IS a Likelihood Ratio Test but computationally much heavier. It will get the most out of your data, because essentially you are comparing two competing models to the same dataset and their respective priors. It is truly a different way of looking at inference, from a philosophical and statistical point of view.

The model formula, as depicted in Wikipedia.

So, from the above, we know that of the four models, one is supported the most by the data. From that posterior density, we can do more actually. We can do hypothesis testing, which seems to resemble classical statistical tests.

BF <- bayesfactor_parameters(model3,null = 0)
plot(BF)

Here, you can see, from model 3, the distribution of the prior and posterior density. They differ, substantially. From that difference, we can get evidence for point estimates (zero), and ranges.

>BF 
Sampling priors, please wait...
Warning message:
Bayes factors might not be precise.
For precise Bayes factors, sampling at least 40,000 posterior samples is recommended. 
Bayes Factor (Savage-Dickey density ratio) 

Parameter |     BF
------------------
Session   | 311.82

* Evidence Against The Null: 0> plot(BF)

From the above you can see that the evidence against the zero hypothesis is large, which makes sense. The posterior density has zero in its tail, but the bulk of the mass is not there. For the prior, the mode is actually zero. Hence, the large value for the evidence against zero. We can estimate the evidence in mulitple ways.

> B2<-hypothesis(model3, hypothesis = 'Session = 0')
> B2
Hypothesis Tests for class b:
     Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (Session) = 0     0.73      0.17     0.38     1.05          0         0    *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

> 1/B2$hypothesis$Evid.Ratio
[1] 358.0798

We can see almost the exact heavy Bayes Factor. Do take note that this exercise DOES NOT look for a comparison between a null hypothesis and an alternative hypothesis. What it does is:

Build a model
Get the posterior density from that model using the prior and the data.
Assess where a point estimate (zero) can be found in the prior distribution and the posterior distribution.
Estimate the ratio of those estimates to obtain the Bayes Factor.

Hence, the Bayes Factor here is the evidence against the zero (or any other value) under the prior and under the posterior density, within the same model.

The above should highlight the basics, which now enables us to move further. Lets do that, because as always, there are different ways of analyzing and modelling data in R, and it is not a given that the methods will provide directly comparable results (although they should).

So, below we have model 2 from the previous set, but with different priors. I will giveeach parameter a uniform (0,1) prior. And to make sure I have enough samples, I specified 40k iterations.

fit<-df2%>%
  brms::brm(Difference ~ Session,
            data=.,
            prior = c(
              prior(uniform(0,1),    class = Intercept),
              prior(uniform(0,1),    class = sigma), 
              prior(uniform(0,1),    class = b, coef=Session)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 40000,
            warmup = 3000)

> fit
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Difference ~ Session 
   Data: . (Number of observations: 32) 
  Draws: 4 chains, each with iter = 40000; warmup = 3000; thin = 1;
         total post-warmup draws = 148000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.11      0.16    -0.19     0.44 1.00    66161    69189
Session       0.67      0.20     0.23     0.98 1.00    51647    62557

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.71      0.09     0.55     0.91 1.00    74519    73493

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Lets check the evidence against the null. Do take note here that we are not comparing models, we are looking at the evidence of a point estimate by comparing its placement at the prior and at the posterior density. And when we do that, given the data and the prior, we find that the evidence against zero is largest in the Session (group) parameter.

bfp_model2.2<-bayesfactor_parameters(fit,null = 0)
Bayes Factor (Savage-Dickey density ratio) 

Parameter   |    BF
-------------------
(Intercept) | 0.465
Session     | 16.93

* Evidence Against The Null: 0


plot(bfp_model2.2)

One can also easily see that the prior is not following a complete uniform probability, and that the method in itself relies HEAVILY on computations.

Now, for those who missed it the first time. The evidence of the zero is calculated given the data and the prior, but that is because the distribution under the prior and the posterior ARE compared. If you look closely you can see two little dots — a blue one and a red one. Those dots are the estimates of zero on the prior and on the posterior density and those point estimates are compared. That comparison gives a ratio and that ratio gives the Bayes Factor.

For those that want to take a closer look at the posterior densities of the model I would advice sampling from the posterior. You will see that the mode falls exactly on the estimates coming from the model. Although estimates are nice, posterior density plots are way more informative and thus should always be drawn. In the end, we are seldom interested in the mode alone, but in the complete density.


Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.11      0.16    -0.19     0.44 1.00    66161    69189
Session       0.67      0.20     0.23     0.98 1.00    51647    62557

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.71      0.09     0.55     0.91 1.00    74519    73493



g1<-posterior_samples(fit)%>%
  dplyr::mutate(a = b_Intercept, 
                s = sigma)%>%
  dplyr::select(a,s)%>%
  ggplot(., aes(x=a))+
  geom_density()+
  theme_bw()+
  xlim(0,1)+
  labs(x="Session 1 - a")
g2<-posterior_samples(fit)%>%
  dplyr::mutate(a = b_Intercept + b_Session,
                s = sigma)%>%
  dplyr::select(a,s)%>%
  ggplot(., aes(x=a))+
  geom_density()+
  theme_bw()+
  xlim(0,1)+
  labs(x="Session 2 - a")
g3<-posterior_samples(fit)%>%
  dplyr::mutate(a = b_Intercept + b_Session, 
                s = sigma)%>%
  dplyr::select(a,s)%>%
  ggplot(., aes(x=s))+
  geom_density()+
  theme_bw()+
  xlim(0,1)+
  labs(x="Session 1 & 2 - s")
gridExtra::grid.arrange(g1,g2,g3)

The posterior densities from the model. The posterior density in the Bayes Factor plot is actually the second plot here.

When I received the data, there was actually another level included which I have kept out of the analysis until now. Because for each of the previous samples, we have five replications. Meaning that for each subject, we have five measurements per location. Meaning that the sample size, but not effective sample size, is five times larger. I will use THAT data now to compare model comparison using Linear Mixed Models and Bayesian Mixed Models.

Here, you can see the make up of the data.

df_bayes<-df_long%>%
  dplyr::filter(!time %in% c('overall'))
str(df_bayes)

tibble [960 x 5] (S3: tbl_df/tbl/data.frame)
 $ Patient : num [1:960] 1 1 1 1 1 1 1 1 1 1 ...
 $ Group   : chr [1:960] "A" "A" "A" "A" ...
 $ location: chr [1:960] "Central" "Central" "Central" "Central" ...
 $ time    : chr [1:960] "1" "2" "3" "4" ...
 $ value   : num [1:960] -4.62 -0.72 2.66 -3.31 -4.34 ...

# We have data, measured at five times, per location, per patient
table(df_bayes$Patient, df_bayes$location)
    Central Frontal Occipital Parietal Temporalleft Temporalright
  1        5       5         5        5            5             5
  2        5       5         5        5            5             5
  3        5       5         5        5            5             5
  4        5       5         5        5            5             5
  5        5       5         5        5            5             5
  6        5       5         5        5            5             5
  7        5       5         5        5            5             5
  8        5       5         5        5            5             5
  9        5       5         5        5            5             5
  10       5       5         5        5            5             5
  11       5       5         5        5            5             5
  12       5       5         5        5            5             5
  13       5       5         5        5            5             5
  14       5       5         5        5            5             5
  15       5       5         5        5            5             5
  16       5       5         5        5            5             5
  17       5       5         5        5            5             5
  18       5       5         5        5            5             5
  19       5       5         5        5            5             5
  20       5       5         5        5            5             5
  21       5       5         5        5            5             5
  22       5       5         5        5            5             5
  23       5       5         5        5            5             5
  24       5       5         5        5            5             5
  25       5       5         5        5            5             5
  26       5       5         5        5            5             5
  27       5       5         5        5            5             5
  28       5       5         5        5            5             5
  29       5       5         5        5            5             5
  30       5       5         5        5            5             5
  31       5       5         5        5            5             5
  32       5       5         5        5            5             5
, ,  = A

Lets start building some models. What is nice about the lme4 package is that it will also not provide p-values. I will first run two models, and then compare the models using Chi² tests.

lme4_fit0<-lme4::lmer(value ~ Group + location + (1 | Patient), 
                      data=df_bayes)
summary(lme4_fit0)
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ Group + location + (1 | Patient)
   Data: df_bayes

REML criterion at convergence: 2991

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.8253 -0.4621 -0.0009  0.4365  6.0430 

Random effects:
 Groups   Name        Variance Std.Dev.
 Patient  (Intercept) 0.510    0.7141  
 Residual             1.198    1.0944  
Number of obs: 960, groups:  Patient, 32

Fixed effects:
                       Estimate Std. Error t value
(Intercept)           -0.016346   0.201510  -0.081
GroupB                 0.828458   0.262177   3.160
locationFrontal        0.037093   0.122354   0.303
locationOccipital     -0.144730   0.122354  -1.183
locationParietal       0.052821   0.122354   0.432
locationTemporalleft   0.042578   0.122354   0.348
locationTemporalright  0.002071   0.122354   0.017

Correlation of Fixed Effects:
            (Intr) GroupB lctnFr lctnOc lctnPr lctnTmprll
GroupB      -0.651                                       
locatnFrntl -0.304  0.000                                
loctnOccptl -0.304  0.000  0.500                         
locatinPrtl -0.304  0.000  0.500  0.500                  
lctnTmprllf -0.304  0.000  0.500  0.500  0.500           
lctnTmprlrg -0.304  0.000  0.500  0.500  0.500  0.500

lme4_fit1<-lme4::lmer(value ~ Group + (1 | Patient), 
                      data=df_bayes)
summary(lme4_fit1)
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ Group + (1 | Patient)
   Data: df_bayes

REML criterion at convergence: 2981.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.7933 -0.4701  0.0066  0.4324  6.0883 

Random effects:
 Groups   Name        Variance Std.Dev.
 Patient  (Intercept) 0.510    0.7142  
 Residual             1.196    1.0935  
Number of obs: 960, groups:  Patient, 32

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.01804    0.18539  -0.097
GroupB       0.82846    0.26218   3.160

Correlation of Fixed Effects:
       (Intr)
GroupB -0.707

anova(lme4_fit1, lme4_fit0)
refitting model(s) with ML (instead of REML)
Data: df_bayes
Models:
lme4_fit1: value ~ Group + (1 | Patient)
lme4_fit0: value ~ Group + location + (1 | Patient)
          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
lme4_fit1    4 2986.0 3005.4 -1489.0   2978.0                     
lme4_fit0    9 2992.3 3036.2 -1487.2   2974.3 3.6079  5     0.6071

You can see that lme4_fit1 is best supported by the data (no priors here). You can see that from the AIC, BIC, and LogLikelihood. Also the Chi² test is pretty straightforward. Lets see what happened if we add another model which is far more complex.

lme4_fit2<-lme4::lmer(value ~ Group*location + 
               (1|Patient), 
             data=df_bayes)
summary(lme4_fit2)
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ Group * location + (1 | Patient)
   Data: df_bayes

REML criterion at convergence: 2994.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.7773 -0.4597 -0.0016  0.4354  6.0323 

Random effects:
 Groups   Name        Variance Std.Dev.
 Patient  (Intercept) 0.5099   0.7141  
 Residual             1.2005   1.0957  
Number of obs: 960, groups:  Patient, 32

Fixed effects:
                              Estimate Std. Error t value
(Intercept)                  -0.028749   0.216504  -0.133
GroupB                        0.853263   0.306183   2.787
locationFrontal               0.002779   0.173242   0.016
locationOccipital            -0.016827   0.173242  -0.097
locationParietal             -0.006247   0.173242  -0.036
locationTemporalleft          0.058280   0.173242   0.336
locationTemporalright         0.026264   0.173242   0.152
GroupB:locationFrontal        0.068628   0.245001   0.280
GroupB:locationOccipital     -0.255806   0.245001  -1.044
GroupB:locationParietal       0.118135   0.245001   0.482
GroupB:locationTemporalleft  -0.031404   0.245001  -0.128
GroupB:locationTemporalright -0.048385   0.245001  -0.197

Correlation of Fixed Effects:
                (Intr) GroupB lctnFr lctnOc lctnPr lctnTmprll lctnTmprlr GrpB:F GrpB:O GrpB:P GrpB:lctnTmprll
GroupB          -0.707                                                                                       
locatnFrntl     -0.400  0.283                                                                                
loctnOccptl     -0.400  0.283  0.500                                                                         
locatinPrtl     -0.400  0.283  0.500  0.500                                                                  
lctnTmprllf     -0.400  0.283  0.500  0.500  0.500                                                           
lctnTmprlrg     -0.400  0.283  0.500  0.500  0.500  0.500                                                    
GrpB:lctnFr      0.283 -0.400 -0.707 -0.354 -0.354 -0.354     -0.354                                         
GrpB:lctnOc      0.283 -0.400 -0.354 -0.707 -0.354 -0.354     -0.354      0.500                              
GrpB:lctnPr      0.283 -0.400 -0.354 -0.354 -0.707 -0.354     -0.354      0.500  0.500                       
GrpB:lctnTmprll  0.283 -0.400 -0.354 -0.354 -0.354 -0.707     -0.354      0.500  0.500  0.500                
GrpB:lctnTmprlr  0.283 -0.400 -0.354 -0.354 -0.354 -0.354     -0.707      0.500  0.500  0.500  0.500

anova(lme4_fit2, lme4_fit1, lme4_fit0)
refitting model(s) with ML (instead of REML)
Data: df_bayes
Models:
lme4_fit1: value ~ Group + (1 | Patient)
lme4_fit0: value ~ Group + location + (1 | Patient)
lme4_fit2: value ~ Group * location + (1 | Patient)
          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
lme4_fit1    4 2986.0 3005.4 -1489.0   2978.0                     
lme4_fit0    9 2992.3 3036.2 -1487.2   2974.3 3.6079  5     0.6071
lme4_fit2   14 2999.5 3067.7 -1485.8   2971.5 2.8162  5     0.7283

The additional model does not impress. In fact, it performs worse. Using the ANOVA method, I am applying the classical / frequentist way of comparing (nested) models which looks at the change in LogLikelihood and the worth of additional parameters.

Last, but not least I will run all five models discussed earlier, plus two extra, using the same priors for the mu parameters and the variance parameters. Below you can see some examples of a Uniform, Cauchy, Normal, and Gamma distribution using different characteristics.


par(mfrow = c(2, 3))
p1<-plot(density(runif(1000,-8,8)))
p2<-plot(density(rcauchy(1000,0,1)))
p3<-plot(density(rnorm(1000,0,1)))
p4<-plot(density(runif(1000,0,5)))
p5<-plot(density(rgamma(1000,2,2)))
p6<-plot(density(rnorm(1000,0,2)))

Lets run the different models. I only used 10k iterations, which makes Bayes Factors less accurate than desired, but for now it is really about showing what can be done using Bayesian analysis. You will learn soon enough that requesting a high number of iterations can extent procedure time significantly.

bayes_fit0.0<-
  brms::brm(value ~ 0,
            data=df_bayes,
            prior = c(prior(gamma(2,2),        class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)
bayes_fit0.1<-
  brms::brm(value ~ 0 + (1|Patient),
            data=df_bayes,
            prior = c(
              prior(gamma(2, 2),      class = sd, coef=Intercept, group=Patient),
              prior(gamma(2,2),       class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)
bayes_fit0.2<-
  brms::brm(value ~ 1 + (1|Patient),
            data=df_bayes,
            prior = c(
              prior(cauchy(0, 1),     class = Intercept),
              prior(gamma(2, 2),      class = sd, coef=Intercept, group=Patient),
              prior(gamma(2,2),       class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)
bayes_fit1<-
  brms::brm(value ~ location + (1|Patient),
            data=df_bayes,
            prior = c(
              prior(cauchy(0, 1),    class = Intercept),
              prior(cauchy(0, 1),    class = b,  coef=locationFrontal),
              prior(cauchy(0, 1),    class = b,  coef=locationOccipital),
              prior(cauchy(0, 1),    class = b,  coef=locationParietal),
              prior(cauchy(0, 1),    class = b,  coef=locationTemporalleft),
              prior(cauchy(0, 1),    class = b,  coef=locationTemporalright),
              prior(gamma(2, 2),     class = sd, coef=Intercept, group=Patient),
              prior(gamma(2, 2),     class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)
bayes_fit2<-
  brms::brm(value ~ Group + location + (1|Patient),
            data=df_bayes,
            prior = c(
              prior(cauchy(0, 1),    class = Intercept),
              prior(cauchy(0, 1),    class = b,  coef=GroupB),
              prior(cauchy(0, 1),    class = b,  coef=locationFrontal),
              prior(cauchy(0, 1),    class = b,  coef=locationOccipital),
              prior(cauchy(0, 1),    class = b,  coef=locationParietal),
              prior(cauchy(0, 1),    class = b,  coef=locationTemporalleft),
              prior(cauchy(0, 1),    class = b,  coef=locationTemporalright),
              prior(gamma(2, 2),     class = sd, coef=Intercept, group=Patient),
              prior(gamma(2, 2),     class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)
bayes_fit3<-
  brms::brm(value ~ Group*location + (1|Patient),
            data=df_bayes,
            prior = c(
              prior(cauchy(0, 1),    class = Intercept),
              prior(cauchy(0, 1),    class = b,  coef=GroupB),
              prior(cauchy(0, 1),    class = b,  coef=locationFrontal),
              prior(cauchy(0, 1),    class = b,  coef=locationOccipital),
              prior(cauchy(0, 1),    class = b,  coef=locationParietal),
              prior(cauchy(0, 1),    class = b,  coef=locationTemporalleft),
              prior(cauchy(0, 1),    class = b,  coef=locationTemporalright),
              prior(cauchy(0, 1),    class = b,  coef=GroupB:locationFrontal),
              prior(cauchy(0, 1),    class = b,  coef=GroupB:locationOccipital),
              prior(cauchy(0, 1),    class = b,  coef=GroupB:locationParietal),
              prior(cauchy(0, 1),    class = b,  coef=GroupB:locationTemporalleft),
              prior(cauchy(0, 1),    class = b,  coef=GroupB:locationTemporalright),
              prior(gamma(2, 2),     class = sd, coef=Intercept, group=Patient),
              prior(gamma(2, 2),     class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)
bayes_fit4<-
  brms::brm(value ~ Group + (1|Patient),
            data=df_bayes,
            prior = c(
              prior(normal(0, 0.5),    class = Intercept),
              prior(normal(0, 0.5),    class = b,  coef=GroupB),
              prior(normal(1, 1),      class = sd, coef=Intercept, group=Patient),
              prior(normal(1,1),       class = sigma)),
            sample_prior = TRUE,
            save_pars = save_pars(all = TRUE),
            chains=4, 
            cores=6,
            iter = 10000,
            warmup = 3000)

After running all the models (or any one of the models seperately) there is whole set of checks you can perform. Perhaps one of the nicest is to take a look at the NUTS (No-U-Turn-Sampling) parameter and if we can find any divergent samples. Here, I have colored them green, so no green is actually good. You will find below three scatterplots connecting, from a single model, the different parameter estimates (via sampling).

color_scheme_set("darkgray")
get_variables(bayes_fit3)
g1<-mcmc_scatter(
  as.matrix(bayes_fit3),
  pars = c("b_Intercept", "b_GroupB"), 
  np = nuts_params(bayes_fit3), 
  np_style = scatter_style_np(div_color = "green", div_alpha = 0.8))
g2<-mcmc_scatter(
  as.matrix(bayes_fit3),
  pars = c("sd_Patient__Intercept", 
           "b_Intercept"), 
  np = nuts_params(bayes_fit3), 
  np_style = scatter_style_np(div_color = "green", div_alpha = 0.8))
g3<-mcmc_scatter(
  as.matrix(bayes_fit3),
  pars = c("b_GroupB", "sigma"), 
  np = nuts_params(bayes_fit3), 
  np_style = scatter_style_np(div_color = "green", div_alpha = 0.8))
gridExtra::grid.arrange(g1,g2,g3)

As you can see, no green. Also the intercept and the group estimate are connected. The rest is not.

Since we have the models, and thus their posterior densities, we can now compare models. Although I can specify a certain denominator, I can also ask for a matrix in which all models are compared against all models. What I am looking for is the highest number, and how a particular model is doing against all other models, since the highest number can be a model comparison between a sensible and a non-sensible model. That will make the comparison also non-sensible.

So, the model I am most interested in is the model just including the group. That would be bayes_fit4, which is the first model in the row*column intersection. Hence, the first column and the first row will have my attention first. And what I am seeing is that this model performs better than all the other models, considering the positive values. Some of which are extreme!

In the first column all values are positive. In the first row, they are all negative or very close to zero (which make sense, since the matrix is a negative mirror). Also, the first column contains the highest value on the sixth row. Hence, this is the model I want and need. And from that model I can do further estimates of evidence which I have shown above.

comparison <- bayesfactor_models(bayes_fit4,
                                 bayes_fit3, 
                                 bayes_fit2,
                                 bayes_fit1,
                                 bayes_fit0.1,
                                 bayes_fit0.0,
                                 denominator = bayes_fit0.2)
as.matrix(comparison)

# Bayes Factors for Model Comparison 

                   Numerator
Denominator                          |      [1] |      [2] |      [3] |      [4] |      [5] |      [6] |      [7]
-----------------------------------------------------------------------------------------------------------------
[1] Group + (1 | Patient)            |        1 | 4.18e-09 | 1.23e-05 | 1.14e-06 |    0.020 | 9.81e-88 |    0.055
[2] Group * location + (1 | Patient) | 2.39e+08 |        1 | 2.94e+03 |   273.49 | 4.71e+06 | 2.35e-79 | 1.31e+07
[3] Group + location + (1 | Patient) | 8.14e+04 | 3.41e-04 |        1 |    0.093 | 1.60e+03 | 7.99e-83 | 4.45e+03
[4] location + (1 | Patient)         | 8.75e+05 |    0.004 |    10.74 |        1 | 1.72e+04 | 8.58e-82 | 4.78e+04
[5] 0 + (1 | Patient)                |    50.77 | 2.12e-07 | 6.23e-04 | 5.80e-05 |        1 | 4.98e-86 |     2.78
[6] 0                                | 1.02e+87 | 4.26e+78 | 1.25e+82 | 1.17e+81 | 2.01e+85 |        1 | 5.58e+85
[7] 1 + (1 | Patient)                |    18.29 | 7.65e-08 | 2.25e-04 | 2.09e-05 |    0.360 | 1.79e-86 |        1

Now, something I have not showed, is how each part of the model is adding evidence. Clearly, the random effect of Patient is most informative, and then we have Group. The importance of (1|Patient) makes a lot of sense, considering the nested structure of the data. In fact, including it in the model is more of a hygiene factor then of true importance. Only if all five replicates per location per patient would have been EXACTLY the same would the inclusion of the (1|Patient) make no sense. But if you have nested data, and considering that replicates will almost always vary, such specification must be included.

> bayesfactor_inclusion(comparison)
Inclusion Bayes Factors (Model Averaged)

               P(prior) P(posterior) Inclusion BF
1:Patient          0.86         1.00     1.83e+86
Group              0.43         0.93        17.93
location           0.43     1.25e-05     1.67e-05
Group:location     0.14     3.89e-09     2.33e-08

* Compared among: all models
*    Priors odds: uniform-equal

If we only look at matched models, we can see the same hierarchy, but slightly different numbers. That is fine. The difference between all models and matched models only is that for matched models only the comparison is restricted to models that do not include any interactions with the term of interest. For interaction terms, averaging is done only across models that contain the main effect terms from which the interaction term is comprised.

bayesfactor_inclusion(comparison, match_models = TRUE)
Inclusion Bayes Factors (Model Averaged)

               P(prior) P(posterior) Inclusion BF
1:Patient          0.86         1.00     1.83e+86
Group              0.29         0.93        26.89
location           0.29     1.25e-05     2.50e-05
Group:location     0.14     3.89e-09     3.41e-04

* Compared among: matched models only
* Priors odds: uniform-equal

From here, we could do further analysis looking at the prior and posterior densities and then seeing what the evidence is for a point estimate or a range of estimates. But I will leave that up to the reader to apply this on their own set of data.

I hope you enjoyed this blogpost and please do let me know if something is amiss, plain wrong of if you have a question!

BECOME a WRITER at MLearning.ai

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com

Evidence vs p-values: the importance of Bayes Factors.

Applied in a case where I analyzed the radiofrequency of electromagnetic fields in the human brain.

BECOME a WRITER at MLearning.ai

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Dr. Marc Jacobs