fddm
Function ddm()
fits the 5-parameter Ratcliff diffusion
decision model (DDM) via maximum likelihood estimation. The model for
each DDM parameter can be specified symbolically using R’s formula
interface. With the exception of the drift rate (which is always
estimated) all parameters can be either fixed or estimated. As maximum
likelihood estimation is based on evaluating the probability density
function (PDF), other vignettes in the fddm
package address
the properties of the function dfddm()
that are used to
evaluate the PDF. An overview of the mathematical details of the
different PDF approximations is provided in the Math
Vignette. An empirical validation of the implemented PDF methods is
provided in the Validity Vignette. Timing
benchmarks for the present PDF methods and comparison with existing
methods are provided in the Benchmark
Vignette.
Our implementation of the DDM has the following parameters: a ∈ (0, ∞) (threshold separation),
v ∈ (−∞, ∞) (drift rate),
t0 ∈ [0, ∞)
(non-decision time/response time constant), w ∈ (0, 1) (relative starting
point), sv ∈ (0, ∞)
(inter-trial-variability of drift), and σ ∈ (0, ∞) (diffusion coefficient of
the underlying Wiener Process).
This vignette contains examples of two different ways to use
fddm
in fitting the DDM to user-supplied real-world data.
First, we will demonstrate the use of the ddm()
function;
we suggest this as the preferred method for most use cases. The
ddm()
function allows the user to specify which model
parameters they want to be estimated and which model parameters they
want to remain fixed. Should the user desire more minute control over
the fitting procedure, we will show a second method of fitting the DDM
that utilizes the exposed likelihood function, dfddm()
.
This method uses the dfddm()
function to construct a
log-likelihood function that will be supplied to an optimization routine
to estimate each model parameter. The examples that we provide are meant
for illustrative purposes, and as such, we will provide a sample
analysis for each example.
We will load the fddm::med_dec
dataset that is included
in the fddm
package, and we will use this dataset to fit
the Ratcliff DDM in both fitting procedures. This dataset contains the
accuracy condition reported in Trueblood et al.
(2018), which investigates medical decision making among medical
professionals (pathologists) and novices (i.e., undergraduate students).
The task of participants was to judge whether pictures of blood cells
show cancerous cells (i.e., blast cells) or non-cancerous cells (i.e.,
non-blast cells). The dataset contains 200 decisions per participant,
based on pictures of 100 true cancerous cells and pictures of 100 true
non-cancerous cells.
Before doing any fitting, we must first load the fddm
package, read the data, and clean the data by removing any invalid
responses from the data (i.e., have negative or non-numeric response
times).
As we will be demonstrating simple fitting procedures involving only
one participant from the fddm::med_dec
dataset, we subset
the data to select the individual whose data will be used for these
fitting procedures.
onep <- med_dec[med_dec[["id"]] == "2" & med_dec[["group"]] == "experienced", ]
str(onep)
#> 'data.frame': 200 obs. of 9 variables:
#> $ id : Factor w/ 37 levels "1","2","3","4",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ group : Factor w/ 3 levels "experienced",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ block : int 3 3 3 3 3 3 3 3 3 3 ...
#> $ trial : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ classification: Factor w/ 2 levels "blast","non-blast": 1 2 2 2 1 1 1 1 2 1 ...
#> $ difficulty : Factor w/ 2 levels "easy","hard": 1 1 2 2 1 1 2 2 1 2 ...
#> $ response : Factor w/ 2 levels "blast","non-blast": 1 2 1 2 1 1 1 1 2 1 ...
#> $ rt : num 0.853 0.575 1.136 0.875 0.748 ...
#> $ stimulus : Factor w/ 312 levels "blastEasy/AuerRod.jpg",..: 7 167 246 273 46 31 132 98 217 85 ...
We can see the structure of the data for this participant contains some identifying information (e.g., the ID number and experience level), the enumerated and paired responses and response times, and some information about the stimuli shown to the participant (e.g., the correct classification of the stimulus and the difficulty of correctly identifying the stimulus). We will leverage this extra information about the stimuli in our example fitting procedures.
ddm()
The ddm()
function fits the 5-parameter DDM to the
user-supplied data via maximum likelihood estimation (MLE). Using R’s
formula interface, the user can specify which model parameters should be
estimated and which should remain fixed; however, the drift rate is
always estimated. For the model parameters that the user wishes to be
estimated, R’s formula interface allows each parameter to be fit with a
single coefficient (i.e., ~ 1
) or with multiple
coefficients (e.g., ~ condition1*condition2
). Should the
user desire a model parameter to be fixed, this can be done by writing
the model parameter equal to a scalar (e.g., ndt = 0.39
).
See the documentation for the ddm()
function for more
details regarding the formula notation. Since this example is simple, we
only use formula notation for the drift rate and leave the other
parameters to their default settings.
First we will show a quick fitting using only one participant from
the fddm::med_dec
dataset. Because we use an ANOVA approach
in the analysis of this example, we set orthogonal sum-to-zero
contrasts. This means that the “Intercept” coefficient will correspond
to the grand mean, and the other coefficients will correspond to the
differences from the grand mean.
Now we can use the ddm()
function to fit the DDM to the
data. The first argument of the ddm()
function is the
formula indicating how the drift rate should be modeled. As the dataset
contains both the correct identification of the cell (“classification”
column of the dataset, with each row containing the text either “blast”
or “non-blast”) and the difficulty of the identification (“difficulty”
column of the dataset, with each row containing the text either “easy”
or “hard”), we will incorporate this information into how we will model
the drift rate. In this case, we will model the drift rate using the
grand mean (labeled “Intercept”) with the additional coefficients
corresponding to the differences from the grand mean, according to their
classification (“blast” or “non-blast”), difficulty (“easy” or “hard”),
and the interaction between classification and difficulty. The remaining
arguments of the ddm()
function contain the formulas (or
scalars) for how the other model parameters should be modeled; by
default, the boundary separation and non-decision time are estimated by
a single coefficient each, and the initial bias and inter-trial
variability in the drift rate are held fixed at 0.5 and 0, respectively.
Following the model parameters, the remaining arguments are various
optimization settings that we will leave as their default values for
this example. Note that since we are using the default formulas and
scalars for the model paramaters except the drift rate we are not
required to explicitly write these formulas and scalars, but we do so in
this example for illustrative purposes. We do, however, always need to
include the data
argument.
fit0 <- ddm(drift = rt + response ~ classification*difficulty,
boundary = ~ 1,
ndt = ~ 1,
bias = 0.5,
sv = 0,
data = onep)
#> [1] "number of times we redid the initial parameters:"
#> [1] 0
#> t0[0] = 1.00169 >= 0.853
summary(fit0)
#>
#> Call:
#> ddm(drift = rt + response ~ classification * difficulty, boundary = ~1, ndt = ~1, bias = 0.5,
#> sv = 0, data = onep)
#>
#> DDM fit with 3 estimated and 2 fixed distributional parameters.
#> Fixed: bias = 0.5, sv = 0
#> Error in x$link$drift: $ operator is invalid for atomic vectors
This output first shows the input to the function call and which
parameters are held fixed; this information is useful to verify that the
formula inputs to the ddm()
function call were interpreted
as intended by the user. The next line confirms that our model held two
parameters fixed (the inital bias and the inter-trial variability in the
drift rate) and estimated the other three model parameters. For the
model of the drift rate that we input, we can see the estimates and
summary statistics for the grand mean (called “Intercept” here) and each
difference coefficient (one for the “classification” condition, one for
the “difficulty” condition, and one for the interaction between the
“classification” and “difficulty” conditions). Below this, we can see
the single-coefficient estimates for the boundary separation and
non-decision time (default behavior).
For each estimated coefficient, we get the estimate itself and the standard error of the estimate. For the boundary and non-decision time, we can see that the standard errors are pretty small relative to the estimates, so the estimates should be pretty accurate. For the drift rate coefficients, we get additional statistics about the significance of the coefficients- in this case, the grand mean and the differences from the grand mean. From this data, we can see that the effects of “classification”, “difficulty”, and their interaction all meet the common significance requirement P = .05.
We can reset the contrasts after fitting.
For a quick analysis on the drift rate fits, we will do a two-way ANOVA because we are comparing the effect of two categorical variables (classification and density) on a quantitative continuous variable (drift rate). The two-way ANOVA will identify if the drift rate is significantly different among the various classifications and densities; this is a common practice for comparing the effects of categorical variables when two or more groups exist in the data. To be rigorous, we should check that: the data are independent both within groups and across groups; the data are approximately normally distributed, or there are enough observations per group so that we do not need to show normality; the variances across groups are equal; there are no significant outliers; and the data are evenly split among the groups. We will forgo formally verifying these checks for the sake of brevity. Note that the interaction between classification and difficulty is significant, so this term is included in the model.
We will use the emmeans
package to do some simple post
hoc comparisons across the groups that we modeled; this package has some
useful functions to do ANOVA and related stuff. First, we’ll produce an
ANOVA-like table that displays the degrees of freedom, F-value, and
P-value for each main effect (categorical variable) and their
interaction. If we want to show just the conditional main effects, then
we can run the second line of code to produce two separate tables with
the same summary statistics.
emmeans::joint_tests(fit0)
#> model term df1 df2 F.ratio p.value
#> classification 1 194 512.900 <.0001
#> difficulty 1 194 6.100 0.0142
#> classification:difficulty 1 194 164.700 <.0001
emmeans::joint_tests(fit0, by = "classification")
#> classification = blast:
#> model term df1 df2 F.ratio p.value
#> difficulty 1 194 46.660 <.0001
#>
#> classification = non-blast:
#> model term df1 df2 F.ratio p.value
#> difficulty 1 194 137.870 <.0001
From these results, we can see that both of the categorical variables (classification and difficulty) and their interaction term have a significant effect on the drift rate, at the P = 0.05 level.
If we want to see the mean drift rate for each combination of
classification and difficulty, the titular emmeans
function
call will display a table for each classification that contains the
standard summary statistics for each difficulty. The following code
shows the mean drift rate for each condition in addition to the standard
errors, degrees of freedom, and the 95% confidence intervals.
em1 <- emmeans::emmeans(fit0, "difficulty", by = "classification")
em1
#> classification = blast:
#> difficulty emmean SE df lower.CL upper.CL
#> easy -4.447 0.294 194 -5.026 -3.868
#> hard -2.027 0.198 194 -2.418 -1.636
#>
#> classification = non-blast:
#> difficulty emmean SE df lower.CL upper.CL
#> easy 3.840 0.273 194 3.302 4.378
#> hard 0.265 0.135 194 -0.002 0.531
#>
#> Confidence level used: 0.95
These means seem to be different; however, we need to formally compare them to be sure. To compare the means across the two difficulty groups (“easy” and “hard”), we can find the pairwise differences between the means (one difference for each classification). The first and second lines of code yield the same information, but the second line condenses the first line from two tables to one table. These tables include the differences between the means as well as the standard summary statistics: standard errors, degrees of freedom, T-statistic, and P-value.
pairs(em1)
#> classification = blast:
#> contrast estimate SE df t.ratio p.value
#> easy - hard -2.42 0.354 194 -6.831 <.0001
#>
#> classification = non-blast:
#> contrast estimate SE df t.ratio p.value
#> easy - hard 3.58 0.304 194 11.742 <.0001
update(pairs(em1), by = NULL, adjust = "holm")
#> contrast classification estimate SE df t.ratio p.value
#> easy - hard blast -2.42 0.354 194 -6.831 <.0001
#> easy - hard non-blast 3.58 0.304 194 11.742 <.0001
#>
#> P value adjustment: holm method for 2 tests
As we suspected, the differences between the means across the two difficulty groups (“easy” and “hard”) are indeed significantly different (at the P = 0.05 level).
dfddm()
Although we strongly recommend using the ddm()
function
for fitting the DDM to data, we will also show an example of how to use
the PDF to manually perform the optimization. Note that this method will
be slower and less convenient than using the ddm()
function, but it is possible if the user wants a particular likelihood
function or optimization routine.
Our approach will be a straightforward maximum likelihood estimation (MLE). We are going to be fitting the parameters v, a, t0, w, and sv; however, we want to fit two distinct drift rates, one for the upper boundary (vu) and one for the lower boundary (vℓ). In order to make this distinction, we require the input of the truthful classification of each decision (i.e. what the correct response is for each entry).
Since we will be using the optimization function
stats::nlminb()
, we must write an objective function for it
to optimize; this objective function will be the log-likelihood function
that we discuss in the next section.
By default stats::nlminb()
finds the minimum of the
objective function instead of the maximum, so we will simply negate our
likelihood function. In addition, we will employ the common practice of
using the log-likelihood as this tends to be more stable while still
maintaining the same minima (negated maxima) as the regular likelihood
function. Note that our log-likelihood function depends on the number of
response times, the number of responses, and the number of truthful
classifications all being equal.
As we are using the optimization function
stats::nlminb()
, the first argument to our log-likelihood
function needs to be a vector of the initial values of the six
parameters that are being optimized: vu, vℓ, a, t0, w, and sv. The rest of the
arguments will be the other necessary inputs to dfddm()
that are not optimized: the vector of response times, the vector of
responses, the vector of the truthful classifications, and the allowable
error tolerance for the density function (optional). Details on all of
these inputs can be found in the dfddm()
documentation.
Upon being called, the log-likelihood function first separates the input response times and responses by their truthful classification to yield two new response time vectors and two new response vectors. The response times and responses are then input into separate density functions using a separate v parameter, vu or vℓ. These separate densities are then combined, and the log-likelihood function heavily penalizes any combination of parameters that returns a log-density of −∞ (equivalent to a regular density of 0). Lastly, the actual log-likelihood is returned as the negative of the sum of all of the log-densities.
ll_fun <- function(pars, rt, resp, truth, err_tol) {
v <- numeric(length(rt))
# the truth is "upper" so use vu
v[truth == "upper"] <- pars[[1]]
# the truth is "lower" so use vl
v[truth == "lower"] <- pars[[2]]
dens <- dfddm(rt = rt, response = resp, a = pars[[3]], v = v, t0 = pars[[4]],
w = pars[[5]], sv = pars[[6]], err_tol = 1e-6, log = TRUE)
return( ifelse(any(!is.finite(dens)), 1e6, -sum(dens)) )
}
First we will fit the DDM to only one participant from the
fddm::med_dec
dataset. To the single participant’s data we
add a column each for converting the responses and their respective
classifications to either “upper” or “lower”.
onep[["resp"]] <- ifelse(onep[["response"]] == "blast", "upper", "lower")
onep[["truth"]] <- ifelse(onep[["classification"]] == "blast", "upper", "lower")
str(onep)
#> 'data.frame': 200 obs. of 11 variables:
#> $ id : Factor w/ 37 levels "1","2","3","4",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ group : Factor w/ 3 levels "experienced",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ block : int 3 3 3 3 3 3 3 3 3 3 ...
#> $ trial : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ classification: Factor w/ 2 levels "blast","non-blast": 1 2 2 2 1 1 1 1 2 1 ...
#> $ difficulty : Factor w/ 2 levels "easy","hard": 1 1 2 2 1 1 2 2 1 2 ...
#> $ response : Factor w/ 2 levels "blast","non-blast": 1 2 1 2 1 1 1 1 2 1 ...
#> $ rt : num 0.853 0.575 1.136 0.875 0.748 ...
#> $ stimulus : Factor w/ 312 levels "blastEasy/AuerRod.jpg",..: 7 167 246 273 46 31 132 98 217 85 ...
#> $ resp : chr "upper" "lower" "upper" "lower" ...
#> $ truth : chr "upper" "lower" "lower" "lower" ...
We pass the data and log-likelihood function with the necessary
additional arguments to an optimization function. As we are using the
optimization function stats::nlminb()
for this example, we
must input as the first argument the initial values of our DDM
parameters that we want optimized. These are input in the order: vu, vℓ, a, t0, w, and sv; we also need to define
upper and lower bounds for each parameters.
fit <- nlminb(c(0, 0, 1, 0, 0.5, 0), objective = ll_fun,
rt = onep[["rt"]], resp = onep[["resp"]], truth = onep[["truth"]],
# limits: vu, vl, a, t0, w, sv
lower = c(-Inf, -Inf, .01, 0, 0, 0),
upper = c( Inf, Inf, Inf, Inf, 1, Inf))
fit
#> $par
#> [1] 5.6813 -2.1887 2.7909 0.3764 0.4010 2.2813
#>
#> $objective
#> [1] 42.47
#>
#> $convergence
#> [1] 0
#>
#> $iterations
#> [1] 129
#>
#> $evaluations
#> function gradient
#> 154 905
#>
#> $message
#> [1] "relative convergence (4)"
Here we will run a more rigorous fitting on the entire
fddm::med_dec
dataset to obtain parameter estimates for
each participant in the study. To do this, we define a function to run
the data fitting for us; we want it to output a dataframe containing the
parameter estimates for each individual in the data. The inputs will be
the dataset, the allowable error tolerance for the density function, how
the “upper” response is presented in the dataset, and indices of the
columns in the dataset containing: identification of the individuals in
the dataset, the response times, the responses, and the truthful
classifications.
After some data checking, the fitting function will extract the unique individuals from the dataset and run the parameter optimization for the responses and response times for each individual. The optimizations themselves are initialized with random initial parameter values to aid in the avoidance of local minima in favor of global minima. Moreover, the optimization will run 5 times for each individual, with 5 different sets of random initial parameter values. The value of the minimized log-likelihood function will be compared across all 5 runs, and the smallest such value will indicate the best fit. The parameter estimates, convergence code, and minimized value of the log-likelihood function produced by this best fit will be saved for that individual.
rt_fit <- function(data, id_idx = NULL, rt_idx = NULL, response_idx = NULL,
truth_idx = NULL, response_upper = NULL) {
# Format data for fitting
if (all(is.null(id_idx), is.null(rt_idx), is.null(response_idx),
is.null(truth_idx), is.null(response_upper))) {
df <- data # assume input data is already formatted
} else {
if(any(data[,rt_idx] < 0)) {
stop("Input data contains negative response times; fit will not be run.")
}
if(any(is.na(data[,response_idx]))) {
stop("Input data contains invalid responses (NA); fit will not be run.")
}
nr <- nrow(data)
df <- data.frame(id = character(nr),
rt = double(nr),
response = character(nr),
truth = character(nr),
stringsAsFactors = FALSE)
if (!is.null(id_idx)) { # relabel identification tags
for (i in seq_along(id_idx)) {
idi <- unique(data[,id_idx[i]])
for (j in seq_along(idi)) {
df[["id"]][data[,id_idx[i]] == idi[j]] <- paste(
df[["id"]][data[,id_idx[i]] == idi[j]], idi[j], sep = " ")
}
}
df[["id"]] <- trimws(df[["id"]], which = "left")
}
df[["rt"]] <- as.double(data[,rt_idx])
df[["response"]] <- "lower"
df[["response"]][data[,response_idx] == response_upper] <- "upper"
df[["truth"]] <- "lower"
df[["truth"]][data[,truth_idx] == response_upper] <- "upper"
}
# Preliminaries
ids <- unique(df[["id"]])
nids <- max(length(ids), 1) # if inds is null, there is only one individual
ninit_vals <- 5
# Initilize the output dataframe
cnames <- c("ID", "Convergence", "Objective",
"vu_fit", "vl_fit", "a_fit", "t0_fit", "w_fit", "sv_fit")
out <- data.frame(matrix(ncol = length(cnames), nrow = nids))
colnames(out) <- cnames
temp <- data.frame(matrix(ncol = length(cnames)-1, nrow = ninit_vals))
colnames(temp) <- cnames[-1]
# Loop through each individual and starting values
for (i in 1:nids) {
out[["ID"]][i] <- ids[i]
# extract data for id i
dfi <- df[df[["id"]] == ids[i],]
rti <- dfi[["rt"]]
respi <- dfi[["response"]]
truthi <- dfi[["truh"]]
# starting value for t0 must be smaller than the smallest rt
min_rti <- min(rti)
# create initial values for this individual
init_vals <- data.frame(vu = rnorm(n = ninit_vals, mean = 4, sd = 2),
vl = rnorm(n = ninit_vals, mean = -4, sd = 2),
a = runif(n = ninit_vals, min = 0.5, max = 5),
t0 = runif(n = ninit_vals, min = 0, max = min_rti),
w = runif(n = ninit_vals, min = 0, max = 1),
sv = runif(n = ninit_vals, min = 0, max = 5))
# loop through all of the starting values
for (j in 1:ninit_vals) {
mres <- nlminb(init_vals[j,], ll_fun,
rt = rti, resp = respi, truth = truthi,
# limits: vu, vl, a, t0, w, sv
lower = c(-Inf, -Inf, .01, 0, 0, 0),
upper = c( Inf, Inf, Inf, Inf, 1, Inf))
temp[["Convergence"]][j] <- mres[["convergence"]]
temp[["Objective"]][j] <- mres[["objective"]]
temp[j, -c(1, 2)] <- mres[["par"]]
}
# determine best fit for the individual
min_idx <- which.min(temp[["Objective"]])
out[i, -1] <- temp[min_idx,]
}
return(out)
}
We run the fitting, and the dataframe of the fitting results is output below.
fit <- rt_fit(med_dec, id_idx = c(2,1), rt_idx = 8, response_idx = 7,
truth_idx = 5, response_upper = "blast")
fit
#> ID Convergence Objective vu_fit vl_fit a_fit t0_fit w_fit sv_fit
#> 1 experienced 2 0 154.157 4.7824 -5.3206 2.6092 0.4151 0.4993 5.805e+00
#> 2 experienced 6 0 119.675 7.1413 -5.5424 2.3638 0.3739 0.5930 5.380e+00
#> 3 experienced 7 0 74.397 3.5273 -2.2192 1.3876 0.4535 0.4499 1.945e+00
#> 4 experienced 9 0 162.764 -0.2429 -0.8518 2.5723 0.4724 0.4730 5.316e+00
#> 5 experienced 12 0 122.762 -0.2232 -4.3231 2.1233 0.3871 0.5431 4.559e+00
#> 6 experienced 14 0 107.097 5.3401 -4.0695 1.5003 0.4060 0.5481 1.785e+00
#> 7 experienced 16 0 316.742 4.6100 -5.7504 2.0504 0.5071 0.4397 2.766e-01
#> 8 experienced 17 0 238.977 3.7166 -3.6443 2.2852 0.4859 0.4117 2.462e+00
#> 9 inexperienced 3 0 90.848 3.3008 -3.9810 1.2960 0.4461 0.6715 4.400e-08
#> 10 inexperienced 4 0 186.478 7.2200 -1.4689 1.5964 0.4145 0.4630 7.141e-01
#> 11 inexperienced 5 0 182.976 4.5027 -6.5950 2.8352 0.4174 0.3807 5.718e+00
#> 12 inexperienced 8 0 219.081 4.5769 -5.6095 2.0509 0.1464 0.6034 7.728e-01
#> 13 inexperienced 10 0 70.452 2.5643 -6.6794 1.4830 0.4132 0.4507 3.113e+00
#> 14 inexperienced 11 0 268.962 2.1552 -3.4421 2.4425 0.1317 0.5135 1.300e+00
#> 15 inexperienced 13 0 129.581 6.0410 -7.9211 2.0976 0.3946 0.6284 4.207e+00
#> 16 inexperienced 15 0 83.415 5.5422 -5.5610 1.2326 0.3726 0.5849 0.000e+00
#> 17 inexperienced 18 0 119.633 0.2478 -6.6351 1.4366 0.5222 0.4980 1.559e+00
#> 18 inexperienced 19 0 233.401 2.4364 -3.5876 2.6755 0.3876 0.4712 3.926e+00
#> 19 novice 1 0 169.743 3.9138 -5.2579 1.6938 0.3987 0.4472 1.132e+00
#> 20 novice 2 0 234.286 7.5677 -0.3879 1.7176 0.5541 0.3470 0.000e+00
#> 21 novice 3 0 157.031 2.5595 -1.5601 1.3439 0.4973 0.5263 4.879e-07
#> 22 novice 4 0 89.457 4.8517 -1.7942 1.3713 0.3775 0.4943 1.274e+00
#> 23 novice 5 0 173.613 5.1285 -3.0520 1.4204 0.5130 0.4725 1.312e-07
#> 24 novice 6 0 142.353 6.9721 -4.7218 1.5024 0.4409 0.5888 8.112e-01
#> 25 novice 7 0 372.724 0.7643 -4.0841 2.4687 0.4413 0.3790 3.957e-07
#> 26 novice 8 0 77.527 1.5504 -0.9834 1.4522 0.1090 0.6044 8.038e-01
#> 27 novice 9 0 62.579 -0.8371 -6.7819 1.2418 0.3777 0.4889 1.193e+00
#> 28 novice 10 0 100.287 3.0842 -6.3314 1.2963 0.4663 0.4521 1.270e+00
#> 29 novice 11 0 77.262 3.7821 -7.2619 1.4512 0.3841 0.6106 2.072e+00
#> 30 novice 12 0 85.624 6.3412 -4.2730 1.4676 0.3523 0.3428 1.772e-07
#> 31 novice 13 0 84.946 4.1232 -3.5744 1.4569 0.3979 0.6826 1.265e+00
#> 32 novice 14 0 72.945 1.1076 -4.5517 1.6538 0.3755 0.4775 4.044e+00
#> 33 novice 15 0 149.422 2.7804 -6.9324 1.9703 0.4065 0.4891 3.531e+00
#> 34 novice 16 0 150.412 1.5875 -2.1442 1.7086 0.3357 0.6148 1.435e+00
#> 35 novice 17 0 264.390 3.7406 -3.9580 2.3208 0.1914 0.4617 1.132e+00
#> 36 novice 18 0 104.866 3.9550 -9.0636 1.3891 0.4650 0.4954 1.399e+00
#> 37 novice 19 0 209.592 2.8599 -1.4803 2.1568 0.4349 0.5002 2.472e+00
#> 38 novice 20 0 234.731 2.7647 -6.2902 2.0437 0.0000 0.5734 6.434e-01
#> 39 novice 21 0 8.786 4.1200 -1.4348 1.0759 0.4179 0.5128 1.384e+00
#> 40 novice 22 0 57.884 3.2060 -7.0517 1.3349 0.3313 0.5455 2.219e+00
#> 41 novice 23 0 112.185 3.3166 -4.6924 1.3942 0.4165 0.5527 1.358e+00
#> 42 novice 24 0 98.604 3.7895 -4.7004 1.6113 0.3361 0.6214 1.385e+00
#> 43 novice 25 0 101.399 3.4714 -5.2470 1.5317 0.4189 0.4984 2.184e+00
#> 44 novice 26 0 187.189 2.1372 -5.1359 1.6987 0.4637 0.3486 6.712e-01
#> 45 novice 27 0 64.849 1.8941 -3.4199 1.2238 0.5150 0.4185 1.101e+00
#> 46 novice 28 0 143.266 3.2389 -4.5177 1.6261 0.1316 0.4763 9.851e-01
#> 47 novice 29 0 -8.025 4.5492 -4.2795 0.9979 0.3695 0.5096 1.222e+00
#> 48 novice 30 0 131.602 1.1259 -2.5363 1.5847 0.3331 0.5513 1.135e+00
#> 49 novice 31 0 232.037 6.7351 -5.1275 1.8749 0.4700 0.4490 1.079e+00
#> 50 novice 32 0 101.678 3.7374 -6.5701 1.3497 0.5102 0.4355 1.523e+00
#> 51 novice 33 0 111.875 0.3616 -3.1717 1.2291 0.3395 0.5337 5.007e-07
#> 52 novice 34 0 99.261 4.9051 -4.1352 1.2717 0.4964 0.5752 5.818e-01
#> 53 novice 35 0 176.715 4.5738 -3.8058 2.1075 0.5099 0.4705 3.545e+00
#> 54 novice 36 0 92.863 5.0963 -4.3012 1.3239 0.5260 0.4143 1.049e+00
#> 55 novice 37 0 101.640 3.3130 -1.9483 1.4562 0.3345 0.6031 9.149e-01
To show some basic results of our fitting, we will plot the fitted values of vu and vℓ grouped by the experience level of the participant to demonstrate how these parameters differ among novices, inexperienced professionals, and experienced professionals.
library("reshape2")
library("ggplot2")
fitp <- data.frame(fit[, c(1, 4, 5)]) # make a copy to manipulate for plotting
colnames(fitp)[-1] <- c("vu", "vl")
for (i in seq_along(unique(fitp[["ID"]]))) {
first <- substr(fitp[["ID"]][i], 1, 1)
if (first == "n") {
fitp[["ID"]][i] <- "novice"
} else if (first == "i") {
fitp[["ID"]][i] <- "inexperienced"
} else {
fitp[["ID"]][i] <- "experienced"
}
}
fitp <- melt(fitp, id.vars = "ID", measure.vars = c("vu", "vl"),
variable.name = "vuvl", value.name = "estimate")
ggplot(fitp, aes(x = factor(ID, levels = c("novice", "inexperienced", "experienced")),
y = estimate,
color = factor(vuvl, levels = c("vu", "vl")))) +
geom_point(alpha = 0.4, size = 4) +
labs(title = "Parameter Estimates for vu and vl",
x = "Experience Level", y = "Parameter Estimate",
color = "Drift Rate") +
theme_bw() +
theme(panel.border = element_blank(),
plot.title = element_text(size = 23),
plot.subtitle = element_text(size = 16),
axis.text.x = element_text(size = 16),
axis.text.y = element_text(size = 16),
axis.title.x = element_text(size = 20,
margin = margin(10, 5, 5, 5, "pt")),
axis.title.y = element_text(size = 20),
legend.title = element_text(size = 20),
legend.text = element_text(size = 16))
Before we begin analysis of this plot, note that the drift rate corresponding to the upper threshold should always be positive, and the drift rate corresponding to the lower threshold should always be negative. Since there are a few fitted values that switch this convention, the novice participants show evidence of consistently responding incorrectly to the stimulus. In contrast, both the inexperienced and experienced participants show a clean division of drift rates around zero.
In addition, we notice that the more experienced participants tend to have higher fitted drift rates in absolute value. A more extreme drift rate means that the participant receives and processes information more efficiently than a more mild drift rate. The overall pattern is that the novices are on average the worst at receiving information, the experienced professionals are the best, and the inexperienced professionals are somewhere in the middle. This pattern indicates that experienced professionals are indeed better at their job than untrained undergraduate students!
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
#> [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggforce_0.4.2 ggplot2_3.5.1 reshape2_1.4.4 microbenchmark_1.5.0
#> [5] RWiener_1.3-3 rtdists_0.11-5 fddm_1.0-2 rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 stringi_1.8.4 lattice_0.22-6
#> [6] digest_0.6.37 magrittr_2.0.3 estimability_1.5.1 evaluate_1.0.1 grid_4.4.2
#> [11] mvtnorm_1.3-2 fastmap_1.2.0 plyr_1.8.9 jsonlite_1.8.9 Matrix_1.7-1
#> [16] ggnewscale_0.5.0 Formula_1.2-5 survival_3.7-0 fansi_1.0.6 scales_1.3.0
#> [21] tweenr_2.0.3 codetools_0.2-20 jquerylib_0.1.4 cli_3.6.3 rlang_1.1.4
#> [26] expm_1.0-0 polyclip_1.10-7 gsl_2.1-8 munsell_0.5.1 splines_4.4.2
#> [31] withr_3.0.2 cachem_1.1.0 yaml_2.3.10 tools_4.4.2 colorspace_2.1-1
#> [36] buildtools_1.0.0 vctrs_0.6.5 R6_2.5.1 emmeans_1.10.5 lifecycle_1.0.4
#> [41] stringr_1.5.1 MASS_7.3-61 pkgconfig_2.0.3 pillar_1.9.0 bslib_0.8.0
#> [46] gtable_0.3.6 glue_1.8.0 Rcpp_1.0.13-1 tidyselect_1.2.1 xfun_0.49
#> [51] tibble_3.2.1 sys_3.4.3 knitr_1.49 msm_1.8.2 farver_2.1.2
#> [56] htmltools_0.5.8.1 labeling_0.4.3 maketools_1.3.1 compiler_4.4.2 evd_2.3-7.1