| Title: | Estimation and Diagnostics for Many-Facet Measurement Models |
|---|---|
| Description: | Native R implementation of many-facet measurement models with arbitrary facet counts, rating-scale, partial-credit, and bounded generalized partial-credit parameterizations, and both marginal and joint maximum likelihood estimation. The package provides a fit / diagnose / report pipeline covering anchoring, linking, bias and DFF screening, and publication-oriented APA summaries, with reproducibility manifests for replay. See 'Andrich' (1978) <doi:10.1007/BF02293814>, 'Masters' (1982) <doi:10.1007/BF02296272>, and 'Muraki' (1992) <doi:10.1177/014662169201600206> for the underlying rating-scale, partial-credit, and generalized partial-credit models. |
| Authors: | Ryuya Komuro [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-9205-0926>) |
| Maintainer: | Ryuya Komuro <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-05-17 08:23:06 UTC |
| Source: | https://github.com/ryuya-dot-com/mfrmr |
mfrmr provides estimation, diagnostics, and reporting utilities for
many-facet Rasch-family models and bounded generalized partial-credit
model workflows using a native R implementation.
If you are new to the package, read the next four steps first and ignore the
longer GPCM, simulation, and planning notes until the basic route works:
Fit with fit_mfrm() using method = "MML"
For RSM / PCM, run diagnose_mfrm() with
diagnostic_mode = "both"
Read summary(fit) and summary(diag) before branching
Use plot_qc_dashboard() and reporting_checklist() as the first visual
and reporting screens
Recommended workflow:
Fit model with fit_mfrm()
For RSM / PCM, compute diagnostics with
diagnose_mfrm() and prefer diagnostic_mode = "both" when you want
legacy residual continuity plus the newer strict marginal-fit screen
For RSM / PCM, run residual PCA with analyze_residual_pca() and
parallel-analysis checks with check_residual_dimensionality() if needed
For RSM / PCM, estimate interactions with estimate_bias()
For RSM / PCM, choose a downstream branch:
reporting_checklist() for manuscript/report preparation, or
build_misfit_casebook() / build_linking_review() for operational
misfit or anchor/drift review. After
build_misfit_casebook(), inspect casebook$group_view_index before
moving to source-specific plots.
For RSM / PCM, build narrative/report outputs with
build_apa_outputs() and build_visual_summaries()
Treat GPCM, prediction, and planning helpers as advanced scope after
the basic RSM / PCM route is working cleanly.
Guide pages:
Companion vignettes:
vignette("mfrmr-workflow", package = "mfrmr")
vignette("mfrmr-mml-and-marginal-fit", package = "mfrmr")
vignette("mfrmr-visual-diagnostics", package = "mfrmr")
vignette("mfrmr-reporting-and-apa", package = "mfrmr")
vignette("mfrmr-linking-and-dff", package = "mfrmr")
A two-page landscape cheatsheet of the public API ships at
system.file("cheatsheet", "mfrmr-cheatsheet.pdf", package = "mfrmr")
(pre-rendered) and system.file("cheatsheet", "mfrmr-cheatsheet.Rmd", package = "mfrmr") (source). Open the PDF directly for a printable
reference card, or knit the source with rmarkdown::render() when
you want a customised version.
Use this order before exploring the broader feature surface:
fit_mfrm() with method = "MML"
diagnose_mfrm() with diagnostic_mode = "both" for RSM / PCM
summary(fit) and summary(diag)
plot_qc_dashboard() for first-pass triage
Choose the next branch:
reporting_checklist() for reporting,
build_weighting_audit() for Rasch-versus-GPCM weighting review,
build_misfit_casebook() for operational case review, or
build_linking_review() for operational linking review
After the basic route above:
the package now includes a first-version latent-regression MML branch
for ordered-response RSM / PCM models with a one-dimensional
conditional-normal population model and explicit one-row-per-person
covariates expanded through stats::model.matrix()
bounded GPCM support is summarized by gpcm_capability_matrix()
bounded GPCM supports the core fit/summary/scoring/information
path, direct Wright/pathway/CCC plots, residual-PCA follow-up, and the
residual-based diagnostics tables/plots as exploratory tools
posterior-predictive computation, MCMC engines, and Docker-based
advanced runtimes are future extensions rather than requirements for the
current bounded GPCM route
direct GPCM data generation through build_mfrm_sim_spec(),
extract_mfrm_sim_spec(), and simulate_mfrm_data() is available when
the specification carries both thresholds and slopes
fair-average, APA writer, package-native export/replay, and role-based
planning/forecasting routes are available for bounded GPCM with
explicit screening-tier caveats
predict_mfrm_population() remains a scenario-level forecast helper and
should not be described as the latent-regression estimator itself
the role-based simulation/planning layer remains the PCM/GPCM route for
two non-person facets, while build_mfrm_arbitrary_sim_spec(),
extract_mfrm_arbitrary_sim_spec(), simulate_mfrm_arbitrary_data(),
summarize_mfrm_sim_design(), plot_mfrm_sim_design(),
summarize_mfrm_sim_grid(), plot_mfrm_sim_grid(),
list_mfrm_sim_metrics(), plot_mfrm_sim_dashboard(), and
evaluate_mfrm_bias_detection() provide a first RSM-based
arbitrary-facet design, multi-metric dashboard, and bias-sensitivity
branch
latent-class mixture models and response-time / careless-rating adjustment are not estimated by mfrmr; use residual, person-fit, local-dependence, and rater-drift diagnostics as screening layers rather than as mixture-model substitutes
The package's operational reference route is still the Rasch-family
RSM / PCM branch. That route enforces fixed discrimination and therefore
preserves an equal-weighting scoring interpretation across observed ratings.
bounded GPCM is supported because some users want a slope-aware model-
comparison or sensitivity layer inside the same many-facet workflow. However,
the package does not treat bounded GPCM as a universal replacement for the
Rasch-family route. A better fit under GPCM should be read as evidence
about discrimination-based reweighting, not as an automatic reason to
discard the equal-weighting model.
Observation weights are a different concept again. Optional Weight
columns change how observed rating events enter estimation and summaries, but
they do not create a free-form facet-weighting scheme and do not alter the
fixed-discrimination meaning of RSM / PCM.
Function families:
Model fitting: fit_mfrm(), summary.mfrm_fit(), plot.mfrm_fit()
Legacy-compatible workflow wrapper: run_mfrm_facets(), mfrmRFacets()
Diagnostics: diagnose_mfrm(), summary(diag),
analyze_residual_pca(), check_residual_dimensionality(),
plot_residual_pca(), plot_residual_dimensionality()
Bias and interaction: estimate_bias(), estimate_all_bias(),
summary(bias), bias_interaction_report(), plot_bias_interaction()
Differential functioning: analyze_dff(), analyze_dif(),
dif_interaction_table(), plot_dif_heatmap(), dif_report()
Design simulation: build_mfrm_sim_spec(), extract_mfrm_sim_spec(),
simulate_mfrm_data(), evaluate_mfrm_design(),
evaluate_mfrm_signal_detection(), build_mfrm_arbitrary_sim_spec(),
extract_mfrm_arbitrary_sim_spec(), simulate_mfrm_arbitrary_data(),
summarize_mfrm_sim_design(), plot_mfrm_sim_design(),
summarize_mfrm_sim_grid(), plot_mfrm_sim_grid(),
list_mfrm_sim_metrics(), plot_mfrm_sim_dashboard(),
evaluate_mfrm_bias_detection(),
predict_mfrm_population(),
predict_mfrm_units(), sample_mfrm_plausible_values() (including
fit-derived empirical / resampled / skeleton-based simulation
specifications; fixed-calibration unit scoring supports MML fits
directly, latent-regression MML fits through the fitted population
model when scored units also provide one-row-per-person background data,
and JML fits through a post hoc reference-prior EAP layer;
fit-derived simulation specifications, planning/forecasting helpers,
curve reports, and graph-only exports are also available for bounded
GPCM with the caveats documented in gpcm_capability_matrix())
Reporting: build_apa_outputs(), build_visual_summaries(),
reporting_checklist(), apa_table() for the full RSM / PCM route;
bounded GPCM currently stays on the checklist / visual-summary / QC /
direct-table / direct-plot side instead of the narrative layer
Weighting review: compare_mfrm(), build_weighting_audit(),
compute_information(), plot_information()
Case review: build_misfit_casebook(), plot_unexpected(),
plot_displacement(), plot_marginal_fit(), plot_marginal_pairwise()
Linking and scale maintenance: audit_mfrm_anchors(),
detect_anchor_drift(), build_equating_chain(),
build_linking_review(), plot_anchor_drift()
Dashboards and fit-direction rates: facet_quality_dashboard(),
plot_facet_quality_dashboard(), fit_direction_summary(),
plot_fit_direction_summary(), summarize_simulation_misfit(),
plot_simulation_misfit_rates()
Export / reproducibility: build_mfrm_manifest(), build_mfrm_replay_script(),
build_conquest_overlap_bundle(), normalize_conquest_overlap_files(),
normalize_conquest_overlap_tables(),
audit_conquest_overlap(),
export_mfrm_bundle() for the diagnostics-compatible Rasch-family route;
bounded GPCM remains outside the current manifest/replay/bundle layer
Equivalence: analyze_facet_equivalence(), plot_facet_equivalence()
Data and anchors: describe_mfrm_data(), audit_mfrm_anchors(),
make_anchor_table(), load_mfrmr_data()
Data interface:
Input analysis data is long format (one row per observed rating).
Required columns are one person column, one ordered score column, and one
or more non-person facet columns named in facets = c(...).
Score values should be ordered integer categories. Binary 0/1 or 1/2
input is supported as the two-category Rasch-family special case; by
contrast, fractional score values should be recoded before fitting rather
than relying on automatic coercion.
If keep_original = FALSE, unused intermediate categories are collapsed
to a contiguous internal scale and the mapping is stored in
fit$prep$score_map.
If the intended scale has unused boundary categories, such as a 1-5 scale
with only 2-5 observed, set rating_min = 1, rating_max = 5 so the
zero-count boundary category remains in the fitted support. If unused
intermediate categories should also remain in the original scale, set
keep_original = TRUE.
summary(describe_mfrm_data(...)) reports retained zero-count categories
in Notes, printed Caveats, and $caveats; summary(fit) carries full
structured rows into printed Caveats and $caveats, with Key warnings
as a short triage subset. Summary-table exports route those rows through
score_category_caveats or analysis_caveats. Treat adjacent thresholds
as weakly identified when an intermediate category is unobserved.
Optional columns such as Subset, Weight, and Group support linking,
weighted analysis, and fairness-focused follow-up workflows.
Packaged simulation data is available via load_mfrmr_data() or data().
Core object classes are:
mfrm_fit: fitted model parameters and metadata.
mfrm_diagnostics: fit, facet-level reliability, and flag diagnostics,
plus inter-rater agreement when one facet is treated as a rater facet.
mfrm_bias: interaction bias estimates.
mfrm_dff / mfrm_dif: differential-functioning contrasts and screening summaries.
mfrm_population_prediction: scenario-level forecast summaries for one
future design.
mfrm_unit_prediction: posterior summaries for future or partially
observed persons under the fitted scoring basis.
mfrm_plausible_values: posterior draws for future or partially observed
persons under the fitted scoring basis.
mfrm_bundle families: summary/report bundles and plotting payloads.
Prepare long-format data.
Fit with fit_mfrm().
For RSM / PCM, diagnose with diagnose_mfrm() and prefer
diagnostic_mode = "both" for final MML runs.
For RSM / PCM, run analyze_dff() or estimate_bias() when
fairness or interaction questions matter.
For RSM / PCM, report with build_apa_outputs() and
build_visual_summaries().
For design planning, move to build_mfrm_sim_spec(),
evaluate_mfrm_design(), and predict_mfrm_population(). bounded
GPCM also supports direct simulation via
extract_mfrm_sim_spec() / simulate_mfrm_data(), but not the broader
planning helpers. Those helpers still assume two non-person facet roles
even though the estimation core supports arbitrary facet counts. Treat
evaluate_mfrm_design() as Monte Carlo design evaluation rather than
a closed-form generalizability-theory D-study planner.
predict_mfrm_population() remains the scenario-level forecast helper,
not the latent-regression estimator.
For future-unit scoring, retain an MML calibration when you want the
fitted marginal model directly, use an active latent-regression MML
fit when scored units also provide one-row-per-person background data, or
use a JML calibration when a post hoc fixed-calibration EAP layer is
acceptable; then score with
predict_mfrm_units() or sample_mfrm_plausible_values().
For bounded GPCM, use summary.mfrm_fit(),
diagnose_mfrm(), analyze_residual_pca(),
check_residual_dimensionality(),
predict_mfrm_units(), sample_mfrm_plausible_values(),
compute_information(), plot_qc_dashboard(), plot.mfrm_fit(),
category_structure_report(), category_curves_report(),
fair_average_table(), estimate_bias(), build_visual_summaries(),
run_qc_pipeline(),
graph-only facets_output_file_bundle(), direct simulation-spec
generation/data generation, caveated APA/export/replay bundles,
caveated role-based planning/forecasting, and the residual-based table
helpers while FACETS compatibility score exports remain blocked. Use
gpcm_capability_matrix() as the formal boundary statement.
The many-facet Rasch model (MFRM; Linacre, 1989) extends the basic Rasch model by incorporating multiple measurement facets into a single linear model on the log-odds scale.
General MFRM equation
For an observation where person with ability is
rated by rater with severity on criterion
with difficulty , the probability of observing category
(out of ordered categories) is:
where are the Rasch-Andrich threshold (step) parameters and
by convention. Additional facets
enter as additive terms in the linear predictor
.
This formulation generalises to any number of facets; the
facets argument to fit_mfrm() accepts an arbitrary-length
character vector.
Rating Scale Model (RSM)
Under the RSM (Andrich, 1978), all levels of the step facet share a
single set of threshold parameters .
Partial Credit Model (PCM)
Under the PCM (Masters, 1982), each level of the designated step_facet
has its own threshold vector on the package's common observed score scale.
In the current implementation, threshold locations may vary by step-facet
level, but the fitted score range is still defined by one global category
set taken from the observed data.
Ordered-response scope
The current public response-model scope is ordered categorical only.
Binary responses are the special case of the same formulation,
so they are handled through the ordinary ordered-score interface. This means
mfrmr supports ordered binary and ordered polytomous data under RSM and
PCM, plus a narrow bounded GPCM branch with one designated
slope_facet that currently must equal step_facet. Unordered
nominal/multinomial response models are not yet implemented.
Marginal Maximum Likelihood (MML)
MML integrates over the person ability distribution using Gauss-Hermite quadrature, in the broader marginal-likelihood framework introduced by Bock & Aitkin (1981) for IRT:
where is the assumed normal prior and
are quadrature nodes and weights. Person
estimates are obtained post-hoc via Expected A Posteriori (EAP):
MML avoids the incidental-parameter problem and is generally preferred for smaller samples.
Note: Bock & Aitkin (1981) is the canonical citation for the
Gauss-Hermite-quadrature MML framework. The default mfrmr engine
(mml_engine = "direct") optimises this marginal log-likelihood by
direct gradient methods (BFGS / L-BFGS-B), not by Bock & Aitkin's
signature EM algorithm. The "em" and "hybrid" engines do follow
the EM template but use a BFGS M-step rather than B&A's probit IRLS,
because the target is the polytomous Rasch family rather than B&A's
2PL probit model.
Joint Maximum Likelihood (JML)
JML estimates all person and facet parameters simultaneously as fixed
effects by maximising the joint log-likelihood
directly. It does not assume a parametric person distribution, which
can be advantageous when the population shape is strongly non-normal,
but parameter estimates are known to be biased when the number of
persons is small relative to the number of items (Neyman & Scott, 1948).
The package still accepts "JMLE" as a backward-compatible alias, but
user-facing summaries and documentation use "JML" as the public label.
See fit_mfrm() for practical guidance on choosing between the two.
For RSM / PCM, diagnose_mfrm(..., diagnostic_mode = "both")
returns two complementary targets: the legacy residual / EAP
diagnostics and a marginal_fit layer whose expected counts and
pairwise summaries are integrated over the posterior quadrature
bundle rather than plugged in at the EAP point. The screen is
structured as limited-information evidence (Orlando & Thissen,
2000; Haberman & Sinharay, 2013; Sinharay & Monroe, 2025), not as
an omnibus accept / reject test, and it complements rather than
replaces separation / reliability and inter-rater agreement
summaries. The full derivation, with notation and pairwise
local-dependence events, lives in
vignette("mfrmr-mml-and-marginal-fit", package = "mfrmr").
Key statistics reported throughout the package:
Infit (Information-Weighted Mean Square)
Weighted average of squared standardized residuals, where weights are the model-based variance of each observation:
Expected value is 1.0 under model fit. Values below 0.5 suggest overfit (Mead-style responses); values above 1.5 suggest underfit (noise or misfit). Infit is most sensitive to unexpected patterns among on-target observations (Wright & Masters, 1982).
Note: The 0.5–1.5 range is the general "productive for measurement" band given by Linacre (2002, RMT 16(2), 878). Context-specific bands come from Wright & Linacre (1994, RMT 8(3), 370): 0.8–1.2 for high-stakes MCQ, 0.7–1.3 for run-of-the-mill MCQ, 0.6–1.4 for rating-scale surveys, 0.5–1.7 for clinical observation, and 0.4–1.2 for judged performance. See also Bond & Fox (2015) for textbook summaries of these conventions.
Outfit (Unweighted Mean Square)
Simple average of squared standardized residuals:
Same expected value and flagging thresholds as Infit, but more sensitive to extreme off-target outliers (e.g., a high-ability person scoring the lowest category).
ZSTD (Standardized Fit Statistic)
Wilson-Hilferty (1931) cube-root transformation that converts the mean-square chi-square ratio to an approximate standard normal deviate:
Values near 0 indicate expected fit; flags
potential misfit at the 5\
1\
ZSTD is reported alongside every Infit and Outfit value.
PTMEA (Point-Measure Correlation)
Pearson correlation between observed scores and estimated person measures within each facet level. Positive values indicate that scoring aligns with the latent trait dimension; negative values suggest reversed orientation or scoring errors.
Separation
Package-reported separation is the ratio of adjusted true standard deviation to root-mean-square measurement error:
where . Higher values
indicate the facet discriminates more statistically distinct levels along the
measured variable. In mfrmr, Separation is the model-based value and
RealSeparation provides a more conservative companion based on RealSE.
Reliability
Analogous to Cronbach's alpha or KR-20 for the reproducibility of element
ordering. In mfrmr, Reliability is the model-based value and
RealReliability gives the conservative companion based on RealSE. For
MML, these are anchored to observed-information ModelSE
estimates for non-person facets; JML keeps them as exploratory summaries.
This is a Rasch/FACETS-style separation reliability on the fitted logit
scale, not an intra-class correlation. Use compute_facet_icc() only when
you want the complementary random-effects variance-share view on the
observed-score scale; for non-person facets, large ICC values indicate
systematic facet variance rather than desirable measurement reliability.
Strata
Number of statistically distinguishable groups of elements:
Three or more strata are commonly used as a practical target (Wright & Masters, 1982), but in this package the estimate inherits the same approximation limits as the separation index.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model (3rd ed.). Routledge.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and
multimodel inference: A practical information-theoretic approach
(2nd ed.). Springer. (AIC / BIC weights and Delta-IC bands used
by compare_mfrm().)
Drasgow, F., Levine, M. V., & Williams, E. A. (1985).
Appropriateness measurement with polychotomous item response
models and standardized indices. British Journal of Mathematical
and Statistical Psychology, 38(1), 67–86. (Source for the lz
person-fit statistic implemented in compute_person_fit_indices().)
Haberman, S. J., & Sinharay, S. (2013). Generalized residuals for general models for contingency tables with application to item response theory. Journal of the American Statistical Association, 108, 1435–1444.
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2, 197–221.
Muraki, E. (1992). A generalized partial credit model: Application of
an EM algorithm. Applied Psychological Measurement, 16(2),
159–176. (Source for the bounded GPCM extension used in
fit_mfrm(model = "GPCM"), fair_average_table(), and
estimate_bias().)
Muraki, E. (1993). Information functions of the generalized partial
credit model. Applied Psychological Measurement, 17(4),
351–363. (Companion paper to Muraki 1992 that derives the GPCM
item information identity via Samejima's (1974) polytomous
information formula. This is the canonical reference for
compute_information() under bounded GPCM.)
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39, 111–121. (General polytomous information formula that Muraki 1993 specializes to the GPCM.)
Snijders, T. A. B. (2001). Asymptotic null distribution of person
fit statistics with estimated person parameter. Psychometrika,
66(3), 331–342. (compute_person_fit_indices() reports a
Snijders-style score-projection lz_star for JML fits, conditional
on the fitted non-person parameters. For MML/EAP fits lz_star is
unavailable because EAP posterior means do not satisfy the ML
person-score estimating equation; the old finite-N screen is
reported separately as lz_finite_n.)
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
Linacre, J. M. (2002). What do Infit and Outfit, mean-square and standardized mean? Rasch Measurement Transactions, 16(2), 878.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27, 289–298.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321.
Sinharay, S., & Monroe, S. (2025). Assessment of fit of item response theory models: A critical review of the status quo and some future directions. British Journal of Mathematical and Statistical Psychology, 78, 711–733.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. MESA Press.
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.
Wilson, E. B., & Hilferty, M. M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences of the United States of America, 17(12), 684-688.
RSM vs PCM
The Rating Scale Model (RSM; Andrich, 1978) assumes all levels of the
step facet share identical threshold parameters. The Partial Credit
Model (PCM; Masters, 1982) allows each level of the step_facet to have
its own set of thresholds on the package's shared observed score scale.
Use RSM when the rating rubric is identical across all items/criteria;
use PCM when category boundaries are expected to vary by item or criterion.
In the current implementation, PCM still assumes one common observed score
support across the fitted data, so it should not be described as a fully
mixed-category model with arbitrary item-specific category counts.
MML vs JML
Marginal Maximum Likelihood (MML) integrates over the person ability
distribution using Gauss-Hermite quadrature and does not directly estimate
person parameters; person estimates are computed post-hoc via Expected A
Posteriori (EAP). Joint Maximum Likelihood (JML) estimates all person
and facet parameters simultaneously as fixed effects; "JMLE" remains a
backward-compatible alias.
MML is generally preferred for smaller samples because it avoids the incidental-parameter problem of JML. JML does not assume a normal person distribution and can be lighter computationally in some settings, which may be an advantage when the population shape is strongly non-normal.
See fit_mfrm() for usage.
Fixed-calibration scoring after fitting
predict_mfrm_units() and sample_mfrm_plausible_values() score future or
partially observed persons on a quadrature grid under the fitted scoring
basis. For ordinary MML fits, these summaries inherit the fitted marginal
calibration directly. For latent-regression MML fits, they use the fitted
one-dimensional conditional normal population model and therefore require
one-row-per-person background data for the scored units when the fitted
population model includes covariates. Intercept-only latent-regression fits
(population_formula = ~ 1) can reconstruct that minimal person table from
the scored person IDs. For JML fits, mfrmr uses the fitted facet and
step parameters together with a standard normal reference prior introduced
only for the post hoc scoring layer. This is useful for practical
fixed-scale scoring, but it should still be described as a limited
approximation rather than as full ConQuest-style population modeling.
Current ConQuest overlap
The package now includes a first-version latent-regression MML branch, but
the overlap with ConQuest should still be described conservatively. The
documented overlap is:
ordered-response RSM / PCM, one latent dimension, a conditional-normal
person population model, and person covariates supplied through an explicit
one-row-per-person table and expanded through the package-built model
matrix. Categorical person covariates carry fitted levels and contrasts into
scoring. This is a scoped overlap, not a claim of broad ConQuest numerical
equivalence for arbitrary imported design matrices, multidimensional models,
imported design specifications, or the full plausible-values workflow.
Maintainer: Ryuya Komuro [email protected] (ORCID) [copyright holder]
Useful links:
mfrm_threshold_profiles() list_mfrmr_data() toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", model = "RSM", quad_points = 7 ) diag <- diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none") summary(diag)mfrm_threshold_profiles() list_mfrmr_data() toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", model = "RSM", quad_points = 7 ) diag <- diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none") summary(diag)
Tests whether the difficulty of facet levels differs across a grouping variable (e.g., whether rater severity differs for male vs. female examinees, or whether item difficulty differs across rater subgroups).
analyze_dif() is retained for compatibility with earlier package versions.
In many-facet workflows, prefer analyze_dff() as the primary entry point.
analyze_dff( fit, diagnostics, facet, group, data = NULL, focal = NULL, method = c("residual", "refit"), min_obs = 10, p_adjust = "holm" ) analyze_dif(...)analyze_dff( fit, diagnostics, facet, group, data = NULL, focal = NULL, method = c("residual", "refit"), min_obs = 10, p_adjust = "holm" ) analyze_dif(...)
fit |
Output from |
diagnostics |
Output from |
facet |
Character scalar naming the facet whose elements are tested
for differential functioning (for example, |
group |
Character scalar naming the column in the data that
defines the grouping variable (e.g., |
data |
Optional data frame containing at least the group column
and the same person/facet/score columns used to fit the model. If
|
focal |
Optional character vector of group levels to treat as focal.
If |
method |
Analysis method: |
min_obs |
Minimum number of observations per cell (facet-level x
group). Cells below this threshold are flagged as sparse and their
statistics set to |
p_adjust |
Method for multiple-comparison adjustment, passed to
|
... |
Passed directly to |
Differential facet functioning (DFF) occurs when the difficulty or severity of a facet element differs across subgroups of the population, after controlling for overall ability. In an MFRM context this generalises classical DIF (which applies to items) to any facet: raters, criteria, tasks, etc.
Differential functioning is a threat to measurement fairness: if Criterion 1 is harder for Group A than Group B at the same ability level, the measurement scale is no longer group-invariant.
Two methods are available:
Residual method (method = "residual"): Uses the existing fitted
model's observation-level residuals. For each facet-level
group cell, the observed and expected score sums are aggregated and
a standardized residual is computed as:
Pairwise contrasts between groups compare the mean observed-minus-expected difference for each facet level, with uncertainty summarized by a Welch/Satterthwaite approximation. This method is fast, stable with small subsets, and does not require re-estimation. Because the resulting contrast is not a logit-scale parameter difference, the residual method is treated as a screening procedure rather than an ETS-style classifier.
Refit method (method = "refit"): Subsets the data by group, refits
the MFRM model within each subset, anchors all non-target facets back to
the baseline calibration when possible, and compares the resulting
facet-level estimates using a Welch t-statistic:
This provides group-specific parameter estimates on a common scale when linking anchors are available, but is slower and may encounter convergence issues with small subsets. ETS categories are reported only for contrasts whose subgroup calibrations retained enough linking anchors to support a common-scale interpretation and whose subgroup precision remained on the package's model-based MML path.
When facet refers to an item-like facet (for example Criterion), this
recovers the familiar DIF case. When facet refers to raters or
prompts/tasks, the same machinery supports DRF/DPF-style analyses.
For the refit method only, effect size is classified following the ETS (Educational Testing Service) DIF guidelines when subgroup calibrations are both linked and eligible for model-based inference:
A (Negligible): 0.43 logits
B (Moderate): 0.43 0.64 logits
C (Large): 0.64 logits
Multiple comparisons are adjusted using Holm's step-down procedure by
default, which controls the family-wise error rate without assuming
independence. Alternative methods (e.g., "BH" for false discovery
rate) can be specified via p_adjust.
An object of class mfrm_dff (with compatibility class mfrm_dif) with:
dif_table: data.frame of differential-functioning contrasts.
cell_table: (residual method) per-cell detail table.
summary: counts by screening or ETS classification.
group_fits: (refit method) per-group facet estimates.
config: list with facet, group, method, min_obs, p_adjust settings.
In most first-pass DFF screening, start with method = "residual". It is
faster, reuses the fitted model, and is less fragile in smaller subsets.
Use method = "refit" when you specifically want group-specific parameter
estimates and can tolerate extra computation. Both methods should yield
similar conclusions when sample sizes are adequate ( per
group is a useful guideline for stable differential-functioning detection).
$dif_table: one row per facet-level x group-pair with contrast,
SE, t-statistic, p-value, adjusted p-value, effect metric, and
method-appropriate classification. Includes Method, N_Group1,
N_Group2, EffectMetric, ClassificationSystem, ContrastBasis,
SEBasis, StatisticLabel, ProbabilityMetric, DFBasis,
ReportingUse, PrimaryReportingEligible, and sparse columns.
$cell_table: (residual method only) per-cell detail with N,
ObsScore, ExpScore, ObsExpAvg, StdResidual.
$summary: counts by screening result (method = "residual") or ETS
category plus linked-screening and insufficient-linking rows
(method = "refit").
$group_fits: (refit method only) list of per-group facet estimates and
subgroup linking diagnostics.
Fit a model with fit_mfrm(). For RSM / PCM fairness review, prefer
method = "MML".
Run diagnose_mfrm() and, for RSM / PCM, prefer
diagnostic_mode = "both" so legacy and strict marginal screens remain
visible together.
Run analyze_dff(fit, diagnostics, facet = "Criterion", group = "Gender", data = my_data).
Inspect $dif_table for flagged levels and $summary for counts.
Use dif_interaction_table() when you need cell-level diagnostics.
Use plot_dif_heatmap() or dif_report() for communication.
fit_mfrm(), estimate_bias(), compare_mfrm(),
dif_interaction_table(), plot_dif_heatmap(), dif_report(),
subset_connectivity_report(), mfrmr_linking_and_dff
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") dff <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy) dff$summary # Look for: a small `FlaggedPairs` count relative to `Pairs`. Under # method = "residual", `ClassificationSystem` is "screening", not # ETS. "Screen positive" rows are prompts for substantive review. head(dff$dif_table[, c("Level", "Group1", "Group2", "Contrast", "Classification", "ClassificationSystem")]) # The residual contrast is an observed-minus-expected average contrast # between groups. It is useful for screening, but it is not an ETS # A/B/C logit-delta classification. dff_refit <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy, method = "refit") unique(dff_refit$dif_table$ClassificationSystem) # Look for: "ETS" only when subgroup calibration, linking, and precision # checks all support a common-scale model-based contrast. sc <- subset_connectivity_report(fit, diagnostics = diag) plot(sc, type = "design_matrix", draw = FALSE) if ("ScaleLinkStatus" %in% names(dff_refit$dif_table)) { unique(dff_refit$dif_table$ScaleLinkStatus) } # Look for: "linked" in `ScaleLinkStatus` confirms the focal and # reference groups share enough common elements for a comparable # contrast; "demoted_*" rows lose linking under the refit branch # and should be read as exploratory.toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") dff <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy) dff$summary # Look for: a small `FlaggedPairs` count relative to `Pairs`. Under # method = "residual", `ClassificationSystem` is "screening", not # ETS. "Screen positive" rows are prompts for substantive review. head(dff$dif_table[, c("Level", "Group1", "Group2", "Contrast", "Classification", "ClassificationSystem")]) # The residual contrast is an observed-minus-expected average contrast # between groups. It is useful for screening, but it is not an ETS # A/B/C logit-delta classification. dff_refit <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy, method = "refit") unique(dff_refit$dif_table$ClassificationSystem) # Look for: "ETS" only when subgroup calibration, linking, and precision # checks all support a common-scale model-based contrast. sc <- subset_connectivity_report(fit, diagnostics = diag) plot(sc, type = "design_matrix", draw = FALSE) if ("ScaleLinkStatus" %in% names(dff_refit$dif_table)) { unique(dff_refit$dif_table$ScaleLinkStatus) } # Look for: "linked" in `ScaleLinkStatus` confirms the focal and # reference groups share enough common elements for a comparable # contrast; "demoted_*" rows lose linking under the refit branch # and should be read as exploratory.
Runs limited classical DIF screens on a long-format score table. The
Mantel-Haenszel route uses a generalized Cochran-Mantel-Haenszel test over
ordered score categories and total-score strata. The logistic route is a
binary logistic-regression screen; for polytomous data it runs only when
logistic_threshold is supplied, making the dichotomization explicit.
analyze_dif_classical( x, facet, group, data = NULL, score = NULL, person = NULL, methods = c("mantel_haenszel", "logistic"), focal = NULL, min_obs = 10L, match_bins = 5L, p_adjust = "holm", logistic_threshold = NULL )analyze_dif_classical( x, facet, group, data = NULL, score = NULL, person = NULL, methods = c("mantel_haenszel", "logistic"), focal = NULL, min_obs = 10L, match_bins = 5L, p_adjust = "holm", logistic_threshold = NULL )
x |
An |
facet |
Facet/item column to screen level by level. |
group |
Grouping column. |
data |
Optional original data when |
score |
Score column. Inferred from |
person |
Person/respondent column. Inferred from |
methods |
One or more of |
focal |
Optional focal group level(s). If omitted, all group pairs are compared. |
min_obs |
Minimum person-level observations per group and facet level. |
match_bins |
Number of total-score strata used by the generalized Mantel-Haenszel route when the matching score has many distinct values. |
p_adjust |
Adjustment method passed to |
logistic_threshold |
Numeric threshold for binary logistic DIF on polytomous scores. Scores greater than or equal to this value are coded 1. |
Rows are first collapsed to one mean score per person x group x
facet level. The matching variable for a target level is the person's
total observed score across the screened facet minus the target-level
score, so the target response is not used to condition on itself. When the
matching score has more distinct values than match_bins, quantile bins are
used as score strata.
The Mantel-Haenszel option forms a
table and calls
stats::mantelhaen.test() without continuity correction. This is a
generalized Cochran-Mantel-Haenszel screening p value for ordered score
categories. The reported Contrast remains the simple Group2-minus-Group1
mean score difference for direction; it is not a Mantel-Haenszel common odds
ratio.
The logistic option fits binary models
The logistic_uniform row is the likelihood-ratio comparison of the first
two models. The logistic_nonuniform row is the likelihood-ratio comparison
of the second and third models. For polytomous scores, Y is defined as
; no implicit dichotomization is used.
An object of class mfrm_dff / mfrm_dif with dif_table,
cell_table, summary, and config fields.
This is a classical screening helper, not a replacement for
analyze_dff() or SIBTEST. It does not estimate MFRM subgroup parameters,
does not use anchors, and does not claim ETS A/B/C classifications.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
toy <- load_mfrmr_data("example_bias") cls <- analyze_dif_classical( toy, facet = "Criterion", group = "Group", person = "Person", score = "Score", methods = "mantel_haenszel" ) cls$summarytoy <- load_mfrmr_data("example_bias") cls <- analyze_dif_classical( toy, facet = "Criterion", group = "Group", person = "Person", score = "Score", methods = "mantel_haenszel" ) cls$summary
Analyze practical equivalence within a facet
analyze_facet_equivalence( fit, diagnostics = NULL, facet = NULL, equivalence_bound = 0.5, ci_level = 0.95, conf_level = NULL )analyze_facet_equivalence( fit, diagnostics = NULL, facet = NULL, equivalence_bound = 0.5, ci_level = 0.95, conf_level = NULL )
fit |
Output from |
diagnostics |
Optional output from |
facet |
Character scalar naming the non-person facet to evaluate. If
|
equivalence_bound |
Practical-equivalence bound in logits. Default
|
ci_level |
Confidence level used for the forest-style interval
view. Default |
conf_level |
Deprecated alias for |
This function tests whether facet elements (e.g., raters) are similar enough to be treated as practically interchangeable, rather than merely testing whether they differ significantly. This is the key distinction from a standard chi-square heterogeneity test: absence of evidence for difference is not evidence of equivalence.
The function uses existing facet estimates and their standard errors
from diagnostics$measures; no re-estimation is performed.
The bundle combines four complementary views:
Fixed chi-square test: tests : all element measures
are equal. A non-significant result is necessary but not
sufficient for interchangeability. It is reported as context, not
as direct evidence of equivalence.
Pairwise TOST (Two One-Sided Tests): for each pair of
elements, tests whether the difference falls within
equivalence_bound. The TOST procedure (Schuirmann,
1987) rejects the null hypothesis of non-equivalence when both
one-sided tests are significant at level . A pair is
declared "Equivalent" when the TOST p-value < 0.05.
BIC-based Bayes-factor heuristic: an approximate screening
tool (not full Bayesian inference) that compares the evidence for
a common-facet model (all elements equal) against a heterogeneity
model (elements differ) via
(Kass & Raftery, 1995). Values > 3
favour the common-facet model; < 1/3 favour heterogeneity.
ROPE-style grand-mean proximity: the proportion of each
element's normal-approximation confidence distribution that falls
within equivalence_bound of the weighted grand mean.
This is a descriptive proximity summary, not a Bayesian ROPE
decision rule around a prespecified null value.
Choosing equivalence_bound: the default of 0.5 logits is a
moderate criterion. For high-stakes certification, 0.3 logits may
be appropriate; for exploratory or low-stakes contexts, 1.0 logits
may suffice. The bound should reflect the smallest difference that
would be practically meaningful in your application.
A named list with class mfrm_facet_equivalence.
analyze_facet_equivalence() is a practical-interchangeability screen. It
asks whether facet levels are close enough, under a user-defined logit
bound, to be treated as practically similar for the current use case.
A non-significant chi-square result is not evidence of equivalence.
Forest/ROPE displays are descriptive and do not replace the pairwise TOST decision rule.
The BIC-based Bayes-factor summary is a heuristic screen, not a full Bayesian equivalence analysis.
Start with summary$Decision, which is a conservative summary of the
pairwise TOST results. Then use the remaining tables as context:
chi_square: is there broad heterogeneity in the facet?
pairwise: which specific pairs meet the practical-equivalence bound?
rope / forest: how close is each level to the facet grand mean?
Smaller equivalence_bound values make the criterion stricter. If the
decision is "partial_pairwise_equivalence", that means some pairwise
contrasts satisfy the practical-equivalence bound but not all of them do.
The final Decision is a pairwise TOST summary rather than a global
equivalence proof. If all pairwise contrasts satisfy the practical-
equivalence bound, the facet is labeled "all_pairs_equivalent". If at
least one, but not all, pairwise contrasts are equivalent, the facet is
labeled "partial_pairwise_equivalence". If no pairwise contrasts meet the
practical-equivalence bound, the facet is labeled
"no_pairwise_equivalence_established". The chi-square, Bayes-factor, and
grand-mean proximity summaries are reported as descriptive context.
summary: one-row pairwise-TOST decision summary and aggregate context.
pairwise: pair-level TOST detail; use this for the primary inferential
read.
chi_square: broad heterogeneity screen.
rope / forest: level-wise proximity to the weighted grand mean.
If the result is borderline or high-stakes, re-run the analysis with a
tighter or looser equivalence_bound, then inspect pairwise and
plot_facet_equivalence() before deciding how strongly to claim
interchangeability.
Fit a model with fit_mfrm().
Run analyze_facet_equivalence() for the facet you want to screen.
Read summary and chi_square first.
Use plot_facet_equivalence() to inspect which levels drive the result.
The returned bundle has class mfrm_facet_equivalence and includes:
summary: one-row overview with convergent decision
chi_square: fixed chi-square / separation summary
pairwise: pairwise TOST detail table
rope: element-wise ROPE probabilities around the weighted grand mean
forest: element-wise estimate, confidence interval, and ROPE status
settings: applied facet and threshold settings
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773-795.
Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15(6), 657-680.
facets_chisq_table(), fair_average_table(), plot_facet_equivalence()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) eq <- analyze_facet_equivalence(fit, facet = "Rater") eq$summary[, c("Facet", "Elements", "Decision", "MeanROPE")] head(eq$pairwise[, c("ElementA", "ElementB", "Equivalent")])toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) eq <- analyze_facet_equivalence(fit, facet = "Rater") eq$summary[, c("Facet", "Elements", "Decision", "MeanROPE")] head(eq$pairwise[, c("ElementA", "ElementB", "Equivalent")])
One-stop audit that combines the nesting, cross-tabulation, ICC, and
design-effect reports into a single object. Designed to be reused by
the publication-workflow surface: its summary feeds into
reporting_checklist(), and its tables are picked up by
build_mfrm_manifest() for reproducibility bundles.
analyze_hierarchical_structure( data, facets = NULL, person = "Person", score = "Score", compute_icc = TRUE, ci_method = c("none", "profile", "boot"), ci_level = 0.95, ci_boot_reps = 1000L, ci_boot_seed = NULL, igraph_layout = TRUE, icc_ci_method = NULL, icc_ci_level = NULL, icc_ci_boot_reps = NULL, icc_ci_boot_seed = NULL )analyze_hierarchical_structure( data, facets = NULL, person = "Person", score = "Score", compute_icc = TRUE, ci_method = c("none", "profile", "boot"), ci_level = 0.95, ci_boot_reps = 1000L, ci_boot_seed = NULL, igraph_layout = TRUE, icc_ci_method = NULL, icc_ci_level = NULL, icc_ci_boot_reps = NULL, icc_ci_boot_seed = NULL )
data |
Data frame in long format, or an |
facets |
Character vector of facet column names. When |
person |
Person column name. Defaults to |
score |
Score column name. Defaults to |
compute_icc |
Logical; if |
ci_method |
ICC confidence-interval method passed through to
|
ci_level |
Confidence level when |
ci_boot_reps |
Number of bootstrap replicates when
|
ci_boot_seed |
Optional RNG seed for reproducible bootstrap
CIs. Deprecated alias: |
igraph_layout |
Logical; if |
icc_ci_method, icc_ci_level, icc_ci_boot_reps, icc_ci_boot_seed
|
Deprecated spellings of the |
A list of class mfrm_hierarchical_structure with:
nesting: output of detect_facet_nesting().
crosstabs: list of pairwise observation-count data.frames (long
format, suitable for heatmap plotting).
icc: output of compute_facet_icc() when requested.
design_effect: output of compute_facet_design_effect() when
requested.
connectivity: named list with bipartite-graph component summary
when igraph is available.
summary: one-row summary used by downstream reporting helpers.
facets: character vector of facet names that were audited
(echoed for downstream reporting helpers that need to label rows
by audit scope).
nesting: a
detect_facet_nesting() object with every facet pair classified
as Crossed / Partially / Near-perfectly / Fully nested.
crosstabs: list of (LevelA, LevelB, N) long-format tables,
one per facet pair. Plot via plot(x, type = "crosstab", pair = "FacetA__FacetB").
icc: per-facet variance shares. See
compute_facet_icc() for the two-scale interpretation.
design_effect: Kish (1965) Deff and EffectiveN.
connectivity: number of bipartite components linking
Person x facet levels. A single component is required for a
common measurement scale; multiple components indicate a
disconnected design.
Optional: fit the MFRM with fit_mfrm().
Call analyze_hierarchical_structure(fit) (or on the raw data).
Read summary(x) for the condensed view.
Feed the object to reporting_checklist() and
build_mfrm_manifest() to record the audit in publication
bundles. build_apa_outputs() uses the fit-level
FacetSampleSizeFlag to add a Methods sentence automatically.
McEwen, M. R. (2018). The effects of incomplete rating designs on results from many-facets-Rasch model analyses (Doctoral thesis, Brigham Young University). https://scholarsarchive.byu.edu/etd/6689/
Linacre, J. M. (2026). A User's Guide to FACETS, Version 4.5.0. Winsteps.com. https://www.winsteps.com/facets.htm
Kish, L. (1965). Survey Sampling. New York: Wiley.
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163.
detect_facet_nesting(), facet_small_sample_audit(),
compute_facet_icc(), compute_facet_design_effect(),
reporting_checklist(), build_mfrm_manifest(), fit_mfrm().
toy <- load_mfrmr_data("example_core") hs <- analyze_hierarchical_structure(toy, facets = c("Rater", "Criterion"), compute_icc = FALSE, igraph_layout = FALSE) summary(hs) # Full audit when lme4 and igraph are available. if (requireNamespace("lme4", quietly = TRUE) && requireNamespace("igraph", quietly = TRUE)) { hs_full <- analyze_hierarchical_structure(toy, facets = c("Rater", "Criterion")) summary(hs_full) plot(hs_full, type = "icc") }toy <- load_mfrmr_data("example_core") hs <- analyze_hierarchical_structure(toy, facets = c("Rater", "Criterion"), compute_icc = FALSE, igraph_layout = FALSE) summary(hs) # Full audit when lme4 and igraph are available. if (requireNamespace("lme4", quietly = TRUE) && requireNamespace("igraph", quietly = TRUE)) { hs_full <- analyze_hierarchical_structure(toy, facets = c("Rater", "Criterion")) summary(hs_full) plot(hs_full, type = "icc") }
Legacy-compatible residual diagnostics can be inspected in two ways:
overall residual PCA on the person x combined-facet matrix
facet-specific residual PCA on person x facet-level matrices
analyze_residual_pca( diagnostics, mode = c("overall", "facet", "both"), facets = NULL, pca_max_factors = 10L )analyze_residual_pca( diagnostics, mode = c("overall", "facet", "both"), facets = NULL, pca_max_factors = 10L )
diagnostics |
Output from |
mode |
|
facets |
Optional subset of facets for facet-specific PCA. |
pca_max_factors |
Maximum number of retained components. |
The function works on standardized residual structures derived from
diagnose_mfrm(). When a fitted object from fit_mfrm() is supplied,
diagnostics are computed internally.
Conceptually, this follows the Rasch residual-PCA tradition of examining
structure in model residuals after the primary Rasch dimension has been
extracted. In mfrmr, however, the implementation is an exploratory
many-facet adaptation: it works on standardized residual matrices built as
person x combined-facet or person x facet-level layouts, rather than
reproducing FACETS/Winsteps residual-contrast tables one-to-one.
Output tables use:
Component: principal-component index (1, 2, ...)
Eigenvalue: eigenvalue for each component
Proportion: component variance proportion
Cumulative: cumulative variance proportion
For mode = "facet" or "both", by_facet_table additionally includes
a Facet column.
summary(pca) is supported through summary().
plot(pca) is dispatched through plot() for class
mfrm_residual_pca. Available types include "overall_scree",
"facet_scree", "overall_loadings", and "facet_loadings".
A named list with:
mode: resolved mode used for computation
facet_names: facets analyzed
overall: overall PCA bundle (or NULL)
by_facet: named list of facet PCA bundles
overall_table: variance table for overall PCA
by_facet_table: stacked variance table across facets
errors: named list of any per-facet PCA errors that were
caught and turned into NA_real_ rows in the variance tables
(e.g., psych::principal() failure on a near-singular residual
matrix). The list is empty when every facet PCA succeeded.
Use overall_table first:
early components with noticeably larger eigenvalues or proportions suggest stronger residual structure that may deserve follow-up.
Then inspect by_facet_table:
helps localize which facet contributes most to residual structure.
Finally, inspect loadings via plot_residual_pca() to identify which
variables/elements drive each component.
For a simulation-calibrated null threshold, use
check_residual_dimensionality() and plot_residual_dimensionality().
The residual-PCA idea follows the Rasch residual-structure literature,
especially Linacre's discussions of principal components of Rasch residuals.
The current mfrmr implementation should be interpreted as an exploratory
extension for many-facet workflows rather than as a direct reproduction of a
single FACETS/Winsteps output table.
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185.
Glorfeld, L. W. (1995). An improvement on Horn's parallel analysis methodology. Educational and Psychological Measurement, 55, 377-393.
Linacre, J. M. (1998). Structure in Rasch residuals: Why principal components analysis (PCA)? Rasch Measurement Transactions, 12(2), 636.
Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-283.
Chou, Y.-T., & Wang, W.-C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70, 717-731.
Fit model and run diagnose_mfrm() with residual_pca = "none" or "both".
Call analyze_residual_pca(..., mode = "both").
Review summary(pca), then plot scree/loadings.
Optionally call check_residual_dimensionality() for a parallel-analysis
null threshold.
Cross-check with fit/misfit diagnostics before conclusions.
diagnose_mfrm(), plot_residual_pca(),
check_residual_dimensionality(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "both") pca <- analyze_residual_pca(diag, mode = "both") pca2 <- analyze_residual_pca(fit, mode = "both") summary(pca) p <- plot_residual_pca(pca, mode = "overall", plot_type = "scree", draw = FALSE) p$data$plot head(p$data) head(pca$overall_table)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "both") pca <- analyze_residual_pca(diag, mode = "both") pca2 <- analyze_residual_pca(fit, mode = "both") summary(pca) p <- plot_residual_pca(pca, mode = "overall", plot_type = "scree", draw = FALSE) p$data$plot head(p$data) head(pca$overall_table)
Re-estimates a many-facet Rasch model on new data while holding selected facet parameters fixed at the values from a previous (baseline) calibration. This is the standard workflow for placing new data onto an existing scale, linking test forms, or carrying a baseline calibration across administration windows.
anchor_to_baseline( new_data, baseline_fit, person, facets, score, anchor_facets = NULL, include_person = FALSE, weight = NULL, model = NULL, method = NULL, anchor_policy = "warn", ... ) ## S3 method for class 'mfrm_anchored_fit' print(x, ...) ## S3 method for class 'mfrm_anchored_fit' summary(object, ...) ## S3 method for class 'summary.mfrm_anchored_fit' print(x, ...)anchor_to_baseline( new_data, baseline_fit, person, facets, score, anchor_facets = NULL, include_person = FALSE, weight = NULL, model = NULL, method = NULL, anchor_policy = "warn", ... ) ## S3 method for class 'mfrm_anchored_fit' print(x, ...) ## S3 method for class 'mfrm_anchored_fit' summary(object, ...) ## S3 method for class 'summary.mfrm_anchored_fit' print(x, ...)
new_data |
Data frame in long format (one row per rating). |
baseline_fit |
An |
person |
Character column name for person/examinee. |
facets |
Character vector of facet column names. |
score |
Character column name for the rating score. |
anchor_facets |
Character vector of facets to anchor (default: all non-Person facets). |
include_person |
If |
weight |
Optional character column name for observation weights. |
model |
Scale model override; defaults to baseline model. |
method |
Estimation method override; defaults to baseline method. |
anchor_policy |
How to handle anchor issues: |
... |
Ignored. |
x |
An |
object |
An |
This function automates the baseline-anchored calibration workflow:
Extracts anchor values from the baseline fit using make_anchor_table().
Re-estimates the model on new_data with those anchors fixed via
fit_mfrm(..., anchors = anchor_table).
Runs diagnose_mfrm() on the anchored fit.
Computes element-level differences (new estimate minus baseline estimate) for every common element.
The model and method arguments default to the baseline fit's settings
so the calibration framework remains consistent. Elements present in the
anchor table but absent from the new data are handled according to
anchor_policy: "warn" (default) emits a message, "error" stops
execution, and "silent" ignores silently.
The returned drift table is best interpreted as an anchored consistency
check. When a facet is fixed through anchor_facets, those anchored levels
are constrained in the new run, so their reported differences are not an
independent drift analysis. For genuine cross-wave drift monitoring, fit the
waves separately and use detect_anchor_drift() on the resulting fits.
Element-level differences are calculated for every element that appears in both the baseline and the new calibration:
An element is flagged when logits or
, where
.
Object of class mfrm_anchored_fit with components:
The anchored mfrm_fit object.
Output of diagnose_mfrm() on the anchored fit.
Anchor table extracted from the baseline.
Tibble of element-level drift statistics.
Use anchor_to_baseline() when you have one new dataset and want to place
it directly on a baseline scale.
Use detect_anchor_drift() when you already have multiple fitted waves
and want to compare their stability.
Use build_equating_chain() when you need cumulative offsets across an
ordered series of waves.
$drift: one row per common element with columns Facet, Level,
Baseline, New, Drift, SE_Baseline, SE_New, SE_Diff,
Drift_SE_Ratio, and Flag.
Read this as an anchored consistency table. Small absolute differences
indicate that the anchored re-fit stayed close to the baseline scale.
Flagged rows warrant review, but they are not a substitute for a separate
drift study on unanchored common elements.
$fit: the full anchored mfrm_fit object, usable with
diagnose_mfrm(), measurable_summary_table(), etc.
$diagnostics: pre-computed diagnostics for the anchored calibration.
$baseline_anchors: the anchor table fed to fit_mfrm(), useful for
auditing which elements were constrained.
Fit the baseline model: fit1 <- fit_mfrm(...).
Collect new data (e.g., a later administration).
Call res <- anchor_to_baseline(new_data, fit1, ...).
Inspect summary(res) to confirm the anchored run remains close to the
baseline scale.
For multi-wave drift monitoring, fit waves separately and pass the fits to
detect_anchor_drift() or build_equating_chain().
fit_mfrm(), make_anchor_table(), detect_anchor_drift(),
diagnose_mfrm(), build_equating_chain(), mfrmr_linking_and_dff
d1 <- load_mfrmr_data("study1") keep1 <- unique(d1$Person)[1:15] d1 <- d1[d1$Person %in% keep1, , drop = FALSE] fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) d2 <- load_mfrmr_data("study2") keep2 <- unique(d2$Person)[1:15] d2 <- d2[d2$Person %in% keep2, , drop = FALSE] res <- anchor_to_baseline(d2, fit1, "Person", c("Rater", "Criterion"), "Score", anchor_facets = "Criterion") summary(res) head(res$drift[, c("Facet", "Level", "Drift", "Flag")]) res$baseline_anchors[1:3, ]d1 <- load_mfrmr_data("study1") keep1 <- unique(d1$Person)[1:15] d1 <- d1[d1$Person %in% keep1, , drop = FALSE] fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) d2 <- load_mfrmr_data("study2") keep2 <- unique(d2$Person)[1:15] d2 <- d2[d2$Person %in% keep2, , drop = FALSE] res <- anchor_to_baseline(d2, fit1, "Person", c("Rater", "Criterion"), "Score", anchor_facets = "Criterion") summary(res) head(res$drift[, c("Facet", "Level", "Drift", "Flag")]) res$baseline_anchors[1:3, ]
Build APA-style table output using base R structures
apa_table( x, which = NULL, diagnostics = NULL, digits = 2, caption = NULL, note = NULL, bias_results = NULL, context = list(), whexact = FALSE, branch = c("apa", "facets") )apa_table( x, which = NULL, diagnostics = NULL, digits = 2, caption = NULL, note = NULL, bias_results = NULL, context = list(), whexact = FALSE, branch = c("apa", "facets") )
x |
A data.frame, |
which |
Optional table selector when |
diagnostics |
Optional diagnostics from |
digits |
Number of rounding digits for numeric columns. |
caption |
Optional caption text. |
note |
Optional note text. |
bias_results |
Optional output from |
context |
Optional context list forwarded when auto-generating APA metadata for fit-based tables. |
whexact |
Logical forwarded to APA metadata helpers. |
branch |
Output branch:
|
This helper avoids styling dependencies and returns a reproducible base
data.frame plus metadata.
Supported which values:
For mfrm_fit: "summary", "person", "facets", "steps"
For summary() outputs or mfrm_summary_table_bundle:
names listed in build_summary_table_bundle(x)$table_index
For diagnostics list: "overall_fit", "measures", "fit",
"reliability", "facets_chisq", "bias", "interactions",
"interrater_summary", "interrater_pairs", "obs"
For bias-result list: "table", "summary", "chi_sq"
A list of class apa_table with fields:
table (data.frame)
which
caption
note
digits
branch, style
table: plain data.frame ready for export or further formatting.
which: source component that produced the table.
caption/note: manuscript-oriented metadata stored with the table.
For fit-based which = "summary", the automatic caption describes the
model overview table; use which = "facets" or which = "person" for
Table 1-style measurement summaries.
Build table object with apa_table(...).
Inspect quickly with summary(tbl).
Render with as_kable(tbl) for R Markdown / Quarto or
as_flextable(tbl) for Word when those packages are installed.
Render base preview via plot(tbl, ...) or export tbl$table.
fit_mfrm(), diagnose_mfrm(), build_apa_outputs(),
reporting_checklist(), mfrmr_reporting_and_apa
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tbl <- apa_table(fit, which = "summary", caption = "Model summary", note = "Toy example") tbl_facets <- apa_table(fit, which = "summary", branch = "facets") fit_bundle <- build_summary_table_bundle(summary(fit)) tbl_from_summary <- apa_table(fit_bundle, which = "facet_overview") summary(tbl) p <- plot(tbl, draw = FALSE) p_facets <- plot(tbl_facets, type = "numeric_profile", draw = FALSE) p$data$plot p_facets$data$plot if (interactive()) { plot( tbl, type = "numeric_profile", main = "APA Table Numeric Profile (Customized)", palette = c(numeric_profile = "#2b8cbe", grid = "#d9d9d9"), label_angle = 45 ) } tbl$notetoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tbl <- apa_table(fit, which = "summary", caption = "Model summary", note = "Toy example") tbl_facets <- apa_table(fit, which = "summary", branch = "facets") fit_bundle <- build_summary_table_bundle(summary(fit)) tbl_from_summary <- apa_table(fit_bundle, which = "facet_overview") summary(tbl) p <- plot(tbl, draw = FALSE) p_facets <- plot(tbl_facets, type = "numeric_profile", draw = FALSE) p$data$plot p_facets$data$plot if (interactive()) { plot( tbl, type = "numeric_profile", main = "APA Table Numeric Profile (Customized)", palette = c(numeric_profile = "#2b8cbe", grid = "#d9d9d9"), label_angle = 45 ) } tbl$note
Post-hoc shrinkage helper that augments an mfrm_fit with James-Stein
/ empirical-Bayes shrunk estimates for each non-person facet. The
shrinkage variance is estimated by method of
moments from the facet-level point estimates and their standard
errors:
where the first term is the population variance of the facet point
estimates around their known mean of zero (the mfrmr sum-to-zero
identification pins the facet mean exactly at 0, so no degree of
freedom is consumed by mean estimation). The shrinkage factor is
, and
the shrunk point / standard error are
and
.
The posterior SE form treats as known; it omits
the Morris (1983, eqs. 4.1-4.2, p. 51) confidence-interval correction
with
, where is the number of
regression coefficients used to model the prior mean (under mfrmr's
sum-to-zero pinning, , so the divisor is ).
This correction adds variance proportional to the squared deviation
, accounting for uncertainty in
. Under the equal-variance assumption
, the omitted variance is
on the order of times the reported posterior
variance , so the true SE is approximately
times the reported ShrunkSE. Magnitudes:
SE understated by ~73\
at , ~7\
ShrunkSE as a lower bound rather than a calibrated posterior SE.
apply_empirical_bayes_shrinkage( fit, facet_prior_sd = NULL, shrink_person = FALSE )apply_empirical_bayes_shrinkage( fit, facet_prior_sd = NULL, shrink_person = FALSE )
fit |
An |
facet_prior_sd |
Optional numeric scalar. When supplied, the
shrinkage variance is fixed at |
shrink_person |
Logical. When |
fit$facets$others gains ShrunkEstimate, ShrunkSE, and
ShrinkageFactor columns, and fit$shrinkage_report records the
per-facet , mean shrinkage, and effective degrees
of freedom (, which
matches the "effective number of parameters" defined by
Efron & Morris, 1973). The original Estimate / SE columns are
preserved.
The same mfrm_fit, with augmented columns and a new
shrinkage_report list entry, and with
fit$config$facet_shrinkage set to "empirical_bayes".
Fit the model as usual with fit_mfrm().
Call apply_empirical_bayes_shrinkage(fit) when small-N facets
are present (see facet_small_sample_audit()).
Report both the original and shrunk estimates in the manuscript,
citing Efron & Morris (1973). build_apa_outputs() will add the
sentence automatically when fit$config$facet_shrinkage is set.
Efron, B., & Morris, C. (1973). Combining possibly related estimation problems. Journal of the Royal Statistical Society: Series B, 35(3), 379-402.
Efron, B. (2021). Empirical Bayes: Concepts and methods (Technical report). Department of Statistics, Stanford University. https://efron.ckirby.su.domains/papers/2021EB-concepts-methods.pdf
Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications. Journal of the American Statistical Association, 78(381), 47-55.
fit_mfrm() (which accepts facet_shrinkage directly),
facet_small_sample_audit(), compute_facet_icc().
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) fit_eb <- apply_empirical_bayes_shrinkage(fit) fit_eb$shrinkage_report # Look for: # - `Tau2` is the estimated between-level prior variance per facet. # `Tau2 = 0` means the data did not justify any pooling and the # shrunken estimates equal the raw estimates (`MeanShrinkage = 0`). # - `MeanShrinkage` near 0 = little movement, near 1 = heavy pooling # toward 0. Small-N facets typically pull values further than # well-identified ones. # - `EffectiveDF` is the implied "effective number of parameters" # (Efron & Morris 1973); EffectiveDF much smaller than the row # count of the facet means most levels were pooled together. head(fit_eb$facets$others[, c("Facet", "Level", "Estimate", "ShrunkEstimate", "ShrinkageFactor")]) # Look for: rows where `ShrinkageFactor` is large (close to 1) had # their estimates pulled most strongly toward the facet mean (0).toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) fit_eb <- apply_empirical_bayes_shrinkage(fit) fit_eb$shrinkage_report # Look for: # - `Tau2` is the estimated between-level prior variance per facet. # `Tau2 = 0` means the data did not justify any pooling and the # shrunken estimates equal the raw estimates (`MeanShrinkage = 0`). # - `MeanShrinkage` near 0 = little movement, near 1 = heavy pooling # toward 0. Small-N facets typically pull values further than # well-identified ones. # - `EffectiveDF` is the implied "effective number of parameters" # (Efron & Morris 1973); EffectiveDF much smaller than the row # count of the facet means most levels were pooled together. head(fit_eb$facets$others[, c("Facet", "Level", "Estimate", "ShrunkEstimate", "ShrinkageFactor")]) # Look for: rows where `ShrinkageFactor` is large (close to 1) had # their estimates pulled most strongly toward the facet mean (0).
flextable
Generic for converting objects to a flextable
as_flextable(x, ...)as_flextable(x, ...)
x |
Object to convert. |
... |
Passed to methods. |
A flextable object (concrete return type from the
underlying method, e.g. [as_flextable.apa_table()] returns a
flextable ready for flextable::save_as_docx()).
as_flextable.apa_table() for the apa_table method;
as_kable() for a knitr::kable-targeted alternative;
apa_table() for constructing an apa_table in the first place.
apa_table to a flextable
Produces a Word / PowerPoint-friendly flextable with the
caption and note wired in. Requires flextable (in Suggests).
## S3 method for class 'apa_table' as_flextable(x, ...)## S3 method for class 'apa_table' as_flextable(x, ...)
x |
An |
... |
Additional arguments reserved for future use. |
A flextable object, or a message when flextable is
unavailable.
as_kable.apa_table(), apa_table().
knitr::kable
Generic for converting objects to a knitr::kable
as_kable(x, ...)as_kable(x, ...)
x |
Object to convert. |
... |
Passed to methods. |
A knitr::kable object (concrete return type from the
underlying method, e.g. [as_kable.apa_table()] returns a
kableExtra object when the package is installed).
as_kable.apa_table() for the apa_table method;
as_flextable() for a flextable-targeted alternative;
apa_table() for constructing an apa_table in the first place.
apa_table to a knitr::kable() objectRenders the table payload for direct inclusion in RMarkdown,
Quarto, or HTML reports, wiring the caption and note slots
into the standard APA placement (caption above, note below).
When kableExtra is installed the note is attached as a footer;
otherwise the note is appended as a knitr::asis_output() block.
## S3 method for class 'apa_table' as_kable(x, format = c("pipe", "html", "latex"), digits = 3L, ...)## S3 method for class 'apa_table' as_kable(x, format = c("pipe", "html", "latex"), digits = 3L, ...)
x |
An |
format |
One of |
digits |
Numeric; passed to |
... |
Additional arguments forwarded to |
A knitr_kable object ready to be printed inline in a
report, or a message when knitr is unavailable.
as_flextable.apa_table(), apa_table().
Convert simulation evaluation objects to data frames.
## S3 method for class 'mfrm_design_evaluation' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("results", "rep_overview", "design_summary", "overview"), ... ) ## S3 method for class 'summary.mfrm_design_evaluation' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("design_summary", "overview"), ... ) ## S3 method for class 'mfrm_signal_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("results", "rep_overview", "detection_summary", "overview"), ... ) ## S3 method for class 'summary.mfrm_signal_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("detection_summary", "overview"), ... ) ## S3 method for class 'mfrm_bias_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("results", "estimates", "reliability", "fit_statistics", "fit_summary", "pair_results", "rep_overview", "target_summary", "pair_summary", "design_grid"), ... ) ## S3 method for class 'summary.mfrm_bias_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("target_summary", "pair_summary", "fit_summary"), ... )## S3 method for class 'mfrm_design_evaluation' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("results", "rep_overview", "design_summary", "overview"), ... ) ## S3 method for class 'summary.mfrm_design_evaluation' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("design_summary", "overview"), ... ) ## S3 method for class 'mfrm_signal_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("results", "rep_overview", "detection_summary", "overview"), ... ) ## S3 method for class 'summary.mfrm_signal_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("detection_summary", "overview"), ... ) ## S3 method for class 'mfrm_bias_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("results", "estimates", "reliability", "fit_statistics", "fit_summary", "pair_results", "rep_overview", "target_summary", "pair_summary", "design_grid"), ... ) ## S3 method for class 'summary.mfrm_bias_detection' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("target_summary", "pair_summary", "fit_summary"), ... )
x |
A simulation evaluation object returned by |
row.names |
Ignored; included for compatibility with |
optional |
Ignored; included for compatibility with |
component |
Table component to extract. The default is |
... |
Reserved for future extensions. |
The simulation evaluators already store their core outputs as ordinary data frames. These methods make that contract explicit and provide a stable route for write.csv(as.data.frame(x, component = "results"), ...) and custom graphics workflows.
For evaluate_mfrm_design(), the "results" component includes facet-level separation, strata, reliability, fit summaries, and recovery metrics for each design and replication. For evaluate_mfrm_bias_detection(), use "estimates" for fitted measure estimates and "reliability" or "fit_summary" for simulation-derived reliability coefficients.
A base data.frame.
evaluate_mfrm_design(),
evaluate_mfrm_signal_detection(),
evaluate_mfrm_bias_detection()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 10, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2), score_levels = 4 ) targets <- data.frame(Rater = "Rater03", Task = "Task02", Effect = -0.5) eval <- suppressWarnings(evaluate_mfrm_bias_detection( spec, bias_targets = targets, reps = 1, fit_method = "JML", maxit = 20, bias_max_iter = 1, seed = 1 )) head(as.data.frame(eval, component = "estimates")) head(as.data.frame(eval, component = "reliability"))spec <- build_mfrm_arbitrary_sim_spec( n_person = 10, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2), score_levels = 4 ) targets <- data.frame(Rater = "Rater03", Task = "Task02", Effect = -0.5) eval <- suppressWarnings(evaluate_mfrm_bias_detection( spec, bias_targets = targets, reps = 1, fit_method = "JML", maxit = 20, bias_max_iter = 1, seed = 1 )) head(as.data.frame(eval, component = "estimates")) head(as.data.frame(eval, component = "reliability"))
Returns all facet-level estimates (person and others) in a single
tidy data.frame. Useful for quick interactive export:
write.csv(as.data.frame(fit), "results.csv").
## S3 method for class 'mfrm_fit' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'mfrm_fit' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An |
row.names |
Ignored (included for S3 generic compatibility). |
optional |
Ignored (included for S3 generic compatibility). |
... |
Additional arguments (ignored). |
This method returns four columns (Facet, Level,
Estimate, Extreme) so that the result is easy to
inspect, join, or write to disk.
A data.frame with columns Facet, Level,
Estimate, and Extreme. The Extreme column
is populated for person rows from the extreme-score flag added
in 0.1.6 ("Min" / "Max" / NA); non-person
facet rows carry NA in that column by design.
Person estimates are returned with Facet = "Person".
All non-person facets are stacked underneath in the same schema.
Fit a model with fit_mfrm().
Convert with as.data.frame(fit) for a compact long-format export.
Join additional diagnostics later if you need SE or fit statistics.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) head(as.data.frame(fit))toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) head(as.data.frame(fit))
Coerce residual dimensionality output to a data frame
## S3 method for class 'mfrm_residual_dimensionality' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("comparison", "observed", "null_distribution"), ... )## S3 method for class 'mfrm_residual_dimensionality' as.data.frame( x, row.names = NULL, optional = FALSE, component = c("comparison", "observed", "null_distribution"), ... )
x |
Output from |
row.names |
Ignored. |
optional |
Ignored. |
component |
Component to return: |
... |
Additional arguments ignored. |
A data frame.
mfrmr overlap bundleAudit an exact-overlap ConQuest comparison against an mfrmr overlap bundle
audit_conquest_overlap( bundle, conquest_population = NULL, conquest_item_estimates = NULL, conquest_case_eap = NULL, conquest_population_term = "auto", conquest_population_estimate = "auto", conquest_item_id = "auto", conquest_item_estimate = "auto", item_id_source = c("auto", "response_var", "level"), conquest_case_person = "auto", conquest_case_estimate = "auto" )audit_conquest_overlap( bundle, conquest_population = NULL, conquest_item_estimates = NULL, conquest_case_eap = NULL, conquest_population_term = "auto", conquest_population_estimate = "auto", conquest_item_id = "auto", conquest_item_estimate = "auto", item_id_source = c("auto", "response_var", "level"), conquest_case_person = "auto", conquest_case_estimate = "auto" )
bundle |
Output from |
conquest_population |
Normalized ConQuest population-parameter table as a
data.frame, or output from |
conquest_item_estimates |
Normalized ConQuest item-estimate table as a
data.frame. Leave |
conquest_case_eap |
Normalized ConQuest case-level EAP table as a
data.frame. Leave |
conquest_population_term |
Column in |
conquest_population_estimate |
Column in |
conquest_item_id |
Column in |
conquest_item_estimate |
Column in |
item_id_source |
How |
conquest_case_person |
Column in |
conquest_case_estimate |
Column in |
This helper compares normalized ConQuest output tables against the exact-
overlap bundle produced by build_conquest_overlap_bundle(). It is
intentionally conservative:
it does not parse raw ConQuest text output automatically;
it expects already normalized data frames or output from
normalize_conquest_overlap_tables();
and it reports numerical differences and missing elements without claiming that any fixed tolerance implies software equivalence.
This is the package's external-table audit path. It is distinct from
reference_case_benchmark(cases = "synthetic_conquest_overlap_dry_run"),
which only round-trips package-native tables through the same normalization
and audit contract without executing ConQuest.
The intended workflow is:
export an exact-overlap bundle with build_conquest_overlap_bundle();
run the narrow matching case in ConQuest;
normalize the resulting ConQuest outputs into data frames;
pass those tables here to inspect direct differences, centered item agreement, and case-level EAP agreement.
A named list with class mfrm_conquest_overlap_audit.
The returned object has class mfrm_conquest_overlap_audit and includes:
overall: one-row comparison summary with missing/duplicate/non-numeric
attention-item counts and worst-row labels
population_comparison: parameter-by-parameter comparison table
item_comparison: centered item-estimate comparison table
case_comparison: case-level EAP comparison table
attention_items: missing, malformed, or unmatched elements
settings: audit settings
notes: interpretation notes
Read summary(audit)$audit_scope first to confirm that the result is a
supplied-table audit, not raw ConQuest text parsing or a software-
equivalence claim.
Population slopes and sigma2 are intended for direct comparison.
Item estimates should be interpreted after centering.
Case estimates should be interpreted as posterior EAP summaries under the fitted population model.
The overall table reports both mean and maximum absolute differences for
compared population, centered item, and case rows. The
PopulationMaxAbsParameter, ItemCenteredMaxAbsItem, and
CaseMaxAbsPerson columns identify the row where each maximum absolute
difference occurs.
Missing or non-numeric rows in attention_items indicate that the external
tables do not yet align cleanly with the exported overlap bundle.
build_conquest_overlap_bundle(),
normalize_conquest_overlap_files(), normalize_conquest_overlap_tables(),
reference_case_benchmark()
bundle <- build_conquest_overlap_bundle() raw_pop <- data.frame( Term = bundle$mfrmr_population$Parameter, Est = bundle$mfrmr_population$Estimate ) raw_item <- data.frame( Item = bundle$mfrmr_item_estimates$ResponseVar, Est = bundle$mfrmr_item_estimates$Estimate ) raw_case <- data.frame( PID = bundle$mfrmr_case_eap$Person, EAP = bundle$mfrmr_case_eap$Estimate ) normalized <- normalize_conquest_overlap_tables( conquest_population = raw_pop, conquest_item_estimates = raw_item, conquest_case_eap = raw_case, conquest_population_term = "Term", conquest_population_estimate = "Est", conquest_item_id = "Item", conquest_item_estimate = "Est", conquest_case_person = "PID", conquest_case_estimate = "EAP" ) audit <- audit_conquest_overlap(bundle, normalized) summary(audit)$summarybundle <- build_conquest_overlap_bundle() raw_pop <- data.frame( Term = bundle$mfrmr_population$Parameter, Est = bundle$mfrmr_population$Estimate ) raw_item <- data.frame( Item = bundle$mfrmr_item_estimates$ResponseVar, Est = bundle$mfrmr_item_estimates$Estimate ) raw_case <- data.frame( PID = bundle$mfrmr_case_eap$Person, EAP = bundle$mfrmr_case_eap$Estimate ) normalized <- normalize_conquest_overlap_tables( conquest_population = raw_pop, conquest_item_estimates = raw_item, conquest_case_eap = raw_case, conquest_population_term = "Term", conquest_population_estimate = "Est", conquest_item_id = "Item", conquest_item_estimate = "Est", conquest_case_person = "PID", conquest_case_estimate = "EAP" ) audit <- audit_conquest_overlap(bundle, normalized) summary(audit)$summary
Audit and normalize anchor/group-anchor tables
audit_mfrm_anchors( data, person, facets, score, anchors = NULL, group_anchors = NULL, weight = NULL, rating_min = NULL, rating_max = NULL, keep_original = FALSE, missing_codes = NULL, min_common_anchors = 5L, min_obs_per_element = 30, min_obs_per_category = 10, noncenter_facet = "Person", dummy_facets = NULL )audit_mfrm_anchors( data, person, facets, score, anchors = NULL, group_anchors = NULL, weight = NULL, rating_min = NULL, rating_max = NULL, keep_original = FALSE, missing_codes = NULL, min_common_anchors = 5L, min_obs_per_element = 30, min_obs_per_category = 10, noncenter_facet = "Person", dummy_facets = NULL )
data |
A data.frame in long format (one row per rating event). |
person |
Column name for person IDs. |
facets |
Character vector of facet column names. |
score |
Column name for observed score. |
anchors |
Optional anchor table (Facet, Level, Anchor). |
group_anchors |
Optional group-anchor table (Facet, Level, Group, GroupValue). |
weight |
Optional weight/frequency column name. |
rating_min |
Optional minimum category value. |
rating_max |
Optional maximum category value. |
keep_original |
Keep original category values. |
missing_codes |
Optional. |
min_common_anchors |
Minimum anchored levels per linking facet used in
recommendations (default |
min_obs_per_element |
Minimum weighted observations per facet level used
in recommendations (default |
min_obs_per_category |
Minimum weighted observations per score category
used in recommendations (default |
noncenter_facet |
One facet to leave non-centered. |
dummy_facets |
Facets to fix at zero. |
Anchoring (also called "fixing" or scale linking) constrains selected parameter estimates to pre-specified values, placing the current analysis on a previously established scale. This is essential when comparing results across administrations, linking test forms, or monitoring rater drift over time.
This function applies the same preprocessing and key-resolution rules
as fit_mfrm(), but returns an audit object so constraints can be
checked before estimation. Running the audit first helps avoid
estimation failures caused by misspecified or data-incompatible
anchors.
Anchor types:
Direct anchors fix individual element measures to specific logit values (e.g., Rater R1 anchored at 0.35 logits).
Group anchors constrain the mean of a set of elements to a target value, allowing individual elements to vary freely around that mean.
When both types overlap for the same element, the direct anchor takes precedence.
Design checks verify that each anchored element has at least
min_obs_per_element weighted observations (default 30) and each
score category has at least min_obs_per_category (default 10).
These thresholds follow standard Rasch sample-size recommendations
(Linacre, 1994).
A list of class mfrm_anchor_audit with:
anchors: cleaned anchor table used by estimation
group_anchors: cleaned group-anchor table used by estimation
facet_summary: counts of levels, constrained levels, and free levels
design_checks: observation-count checks by level/category
thresholds: active threshold settings used for recommendations
issue_counts: issue-type counts
issues: list of issue tables
recommendations: package-native anchor guidance strings
issue_counts/issues: concrete data or specification problems.
facet_summary: constraint coverage by facet.
design_checks: whether anchor targets have enough observations.
recommendations: action items before estimation.
Build candidate anchors (e.g., with make_anchor_table()).
Run audit_mfrm_anchors(...).
Resolve issues, then fit with fit_mfrm().
fit_mfrm(), describe_mfrm_data(), make_anchor_table()
toy <- load_mfrmr_data("example_core") anchors <- data.frame( Facet = c("Rater", "Rater"), Level = c("R1", "R1"), Anchor = c(0, 0.1), stringsAsFactors = FALSE ) aud <- audit_mfrm_anchors( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", anchors = anchors ) aud$issue_counts summary(aud) p_aud <- plot(aud, draw = FALSE) p_aud$data$plottoy <- load_mfrmr_data("example_core") anchors <- data.frame( Facet = c("Rater", "Rater"), Level = c("R1", "R1"), Anchor = c(0, 0.1), stringsAsFactors = FALSE ) aud <- audit_mfrm_anchors( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", anchors = anchors ) aud$issue_counts summary(aud) p_aud <- plot(aud, draw = FALSE) p_aud$data$plot
Build a bias-cell count report
bias_count_table( bias_results, min_count_warn = 10, branch = c("original", "facets"), fit = NULL )bias_count_table( bias_results, min_count_warn = 10, branch = c("original", "facets"), fit = NULL )
bias_results |
Output from |
min_count_warn |
Minimum count threshold for flagging sparse bias cells. |
branch |
Output branch:
|
fit |
Optional |
This helper summarizes how many observations contribute to each bias-cell estimate and flags sparse cells.
Branch behavior:
"facets": keeps legacy manual-aligned column labels (Sq,
Observd Count, Obs-Exp Average, Model S.E.) for side-by-side
comparison with external workflows.
"original": keeps compact field names (Count, BiasSize, SE) for
custom QC workflows and scripting.
A named list with:
table: cell-level counts with low-count flags
by_facet: named list of counts aggregated by each interaction facet
by_facet_a, by_facet_b: first two facet summaries (legacy compatibility)
summary: one-row summary
thresholds: applied thresholds
branch, style: output branch metadata
fit_overview: optional one-row fit metadata when fit is supplied
table: cell-level contribution counts and low-count flags.
by_facet: sparse-cell structure by each interaction facet.
summary: overall low-count prevalence.
fit_overview: optional run context (when fit is supplied).
Low-count cells should be interpreted cautiously because bias-size estimates can become unstable with sparse support.
Estimate bias with estimate_bias().
Build bias_count_table(...) in desired branch.
Review low-count flags before interpreting bias magnitudes.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
The table data.frame contains, in the legacy-compatible branch:
Interaction facet level identifiers; placeholder names for the two interaction facets.
Sequential row number.
Number of observations for this cell.
Observed minus expected average for this cell.
Standard error of the bias estimate.
Fit statistics for this cell.
Logical; TRUE when count < min_count_warn.
The summary data.frame contains:
Names of the interaction facets.
Number of cells and total observations.
Number and share of low-count cells.
estimate_bias(), unexpected_after_bias_table(), build_fixed_reports(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) t11 <- bias_count_table(bias) t11_facets <- bias_count_table(bias, branch = "facets", fit = fit) summary(t11) p <- plot(t11, draw = FALSE) p2 <- plot(t11, type = "lowcount_by_facet", draw = FALSE) if (interactive()) { plot( t11, type = "cell_counts", draw = TRUE, main = "Bias Cell Counts (Customized)", palette = c(count = "#2b8cbe", low = "#cb181d"), label_angle = 45 ) }toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) t11 <- bias_count_table(bias) t11_facets <- bias_count_table(bias, branch = "facets", fit = fit) summary(t11) p <- plot(t11, draw = FALSE) p2 <- plot(t11, type = "lowcount_by_facet", draw = FALSE) if (interactive()) { plot( t11, type = "cell_counts", draw = TRUE, main = "Bias Cell Counts (Customized)", palette = c(count = "#2b8cbe", low = "#cb181d"), label_angle = 45 ) }
Bundles the ranked flagged-cells view of a bias-interaction run for downstream printing and plotting. The three sibling reports in this family are intentionally distinct:
bias_interaction_report() (this one) = FACETS Table 13: a ranked
list of interaction cells with t, bias size, and screening tail
area – use when reviewing which (facet_a, facet_b) cells deserve
follow-up.
bias_iteration_report() = iteration history / convergence trace
for the bias recalibration (FACETS Table 9 territory) – use when
diagnosing whether the bias run itself stabilised.
bias_pairwise_report() = pairwise contrast table for a target
facet (FACETS Table 14 territory) – use when comparing levels
within a facet while controlling for the other.
bias_interaction_report( x, diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001, top_n = 50, abs_t_warn = 2, abs_bias_warn = 0.5, p_max = 0.05, sort_by = c("abs_t", "abs_bias", "prob") )bias_interaction_report( x, diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001, top_n = 50, abs_t_warn = 2, abs_bias_warn = 0.5, p_max = 0.05, sort_by = c("abs_t", "abs_bias", "prob") )
x |
Output from |
diagnostics |
Optional output from |
facet_a |
First facet name (required when |
facet_b |
Second facet name (required when |
interaction_facets |
Character vector of two or more facets. |
max_abs |
Bound for absolute bias size when estimating from fit. |
omit_extreme |
Omit extreme-only elements when estimating from fit. |
max_iter |
Iteration cap for bias estimation when |
tol |
Convergence tolerance for bias estimation when |
top_n |
Maximum number of ranked rows to keep. |
abs_t_warn |
Warning cutoff for absolute t statistics. |
abs_bias_warn |
Warning cutoff for absolute bias size. |
p_max |
Warning cutoff for p-values. |
sort_by |
Ranking key: |
Preferred bundle API for interaction-bias diagnostics. The function can:
use a precomputed bias object from estimate_bias(), or
estimate internally from mfrm_fit + facet specification.
A named list with bias-interaction plotting/report components. Class:
mfrm_bias_interaction.
Focus on ranked rows where multiple screening criteria converge:
large absolute t statistic
large absolute bias size
small screening tail area
The bundle is optimized for downstream summary() and
plot_bias_interaction() views.
Run estimate_bias() (or provide mfrm_fit here).
Build bias_interaction_report(...).
Review summary(out) and visualize with plot_bias_interaction().
estimate_bias(), build_fixed_reports(), plot_bias_interaction()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) out <- bias_interaction_report(bias, top_n = 10) summary(out) p_bi <- plot(out, draw = FALSE) p_bi$data$plottoy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) out <- bias_interaction_report(bias, top_n = 10) summary(out) p_bi <- plot(out, draw = FALSE) p_bi$data$plot
This report is NOT an alias of bias_interaction_report() despite the
similar name. It focuses on the recalibration path of a bias run:
iteration table, convergence summary, and orientation audit. Use this
to confirm that the bias recalibration itself converged; use
bias_interaction_report() to review the ranked flagged cells from
the converged run.
bias_iteration_report( x, diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001, top_n = 10 )bias_iteration_report( x, diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001, top_n = 10 )
x |
Output from |
diagnostics |
Optional output from |
facet_a |
First facet name (required when |
facet_b |
Second facet name (required when |
interaction_facets |
Character vector of two or more facets. |
max_abs |
Bound for absolute bias size when estimating from fit. |
omit_extreme |
Omit extreme-only elements when estimating from fit. |
max_iter |
Iteration cap for bias estimation when |
tol |
Convergence tolerance for bias estimation when |
top_n |
Maximum number of iteration rows to keep in preview-oriented summaries. The full iteration table is always returned. |
This report focuses on the recalibration path used by estimate_bias().
It provides a package-native counterpart to legacy iteration printouts by
exposing the iteration table, convergence summary, and orientation audit in
one bundle.
A named list with:
table: iteration history
summary: one-row convergence summary
orientation_audit: interaction-facet sign audit
settings: resolved reporting options
direction_note: one-line interpretive note describing which
direction the iteration moved (carried from the bias estimator;
empty string when the underlying estimator does not emit one)
recommended_action: one-line recommended action label
(e.g. "converged", "increase max_iter"); empty string when
the underlying estimator does not emit one
estimate_bias(), bias_interaction_report(), build_fixed_reports()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- bias_iteration_report(fit, diagnostics = diag, facet_a = "Rater", facet_b = "Criterion") summary(out)toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- bias_iteration_report(fit, diagnostics = diag, facet_a = "Rater", facet_b = "Criterion") summary(out)
Build a pairwise contrast table that, for a chosen target facet
(e.g. raters), compares each pair of target-facet levels while
holding a context facet (e.g. items / criteria) constant. This is
the FACETS Table 14 view: it answers "is rater A consistently
more severe than rater B on the same items?" rather than "which
(rater, item) cell has the largest local bias?" – the latter is
covered by bias_interaction_report().
bias_pairwise_report( x, diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001, target_facet = NULL, context_facet = NULL, top_n = 50, p_max = 0.05, sort_by = c("abs_t", "abs_contrast", "prob") )bias_pairwise_report( x, diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001, target_facet = NULL, context_facet = NULL, top_n = 50, p_max = 0.05, sort_by = c("abs_t", "abs_contrast", "prob") )
x |
Output from |
diagnostics |
Optional output from |
facet_a |
First facet name (required when |
facet_b |
Second facet name (required when |
interaction_facets |
Character vector of two or more facets. |
max_abs |
Bound for absolute bias size when estimating from fit. |
omit_extreme |
Omit extreme-only elements when estimating from fit. |
max_iter |
Iteration cap for bias estimation when |
tol |
Convergence tolerance for bias estimation when |
target_facet |
Facet whose local contrasts should be compared across the paired context facet. Defaults to the first interaction facet. |
context_facet |
Optional facet to condition on. Defaults to the other facet in a 2-way interaction. |
top_n |
Maximum number of ranked rows to keep. |
p_max |
Flagging cutoff for pairwise p-values. |
sort_by |
Ranking key: |
This helper exposes the pairwise contrast table that was previously only reachable through fixed-width output generation. It is available only for 2-way interactions. The pairwise contrast statistic uses a Welch/Satterthwaite approximation and is labeled as a Rasch-Welch comparison in the output metadata.
A named list with:
table: pairwise contrast rows
summary: one-row contrast summary
orientation_audit: interaction-facet sign audit
settings: resolved reporting options
direction_note: one-line interpretive note describing the
dominant pairwise-contrast direction (carried from the
underlying bias estimator; empty string when not applicable)
recommended_action: one-line recommended-action label
(e.g. routing the user to follow-up review of the largest
flagged pairs); empty string when the underlying estimator
does not emit one
table: one row per ordered (target_level_1, target_level_2)
pair, with Bias_diff, SE_diff, t_diff, df_diff,
p_diff, and the underlying per-level bias rows. Rows are
sorted so that the largest-magnitude |t_diff| rises to the
top.
summary: one-row screening summary with MaxAbsBiasDiff,
MaxAbsT, Significant (count of flagged pairs at p_max),
BonferroniSignificant, and HolmSignificant.
orientation_audit carries the same facet-orientation sign
audit as the parent estimate_bias() run.
The SE caveat below applies: read Significant /
BonferroniSignificant as a screening triage, not as formal
inferential tests.
Fit and diagnose the model.
Run estimate_bias() to get the underlying interaction effects.
Pass that result to bias_pairwise_report() for the rater-pair
contrast table.
Use summary(out)$MaxAbsT and the top rows of out$table to
flag rater-pair systematic differences for follow-up review.
For the ranked flagged-cells view (which (rater, item) pairs
have the largest local bias), use bias_interaction_report()
on the same estimate_bias() output.
The contrast standard error is computed as
SE(b_i - b_j) = sqrt(SE_i^2 + SE_j^2) – the independence
approximation. For same-facet bias values that share a sum-to-zero
identification, Cov(b_i, b_j) < 0, so the true contrast variance
is SE_i^2 + SE_j^2 - 2 * Cov(b_i, b_j), which is smaller
than the reported value. The reported t-statistics and p-values
are therefore conservative for same-facet contrasts (the true
significance is higher than reported). For across-facet contrasts
the covariance term is approximately zero and the approximation
is appropriate. Use the report as a screening / triage table; for
inferential claims that hinge on a marginally-significant
same-facet contrast, follow up with a contrast that uses the full
parameter covariance.
Linacre, J. M. (1989). Many-Facet Rasch Measurement. MESA Press.
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
estimate_bias(), bias_interaction_report(), build_fixed_reports()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- bias_pairwise_report(fit, diagnostics = diag, facet_a = "Rater", facet_b = "Criterion") s <- summary(out) s$summary # Look for: `MaxAbsBiasDiff` < ~0.5 logits and `Significant = 0` mean # no rater pair contrasts above the screen. The `BonferroniSignificant` # / `HolmSignificant` columns count pairs that survive multiple- # testing correction; both being 0 is a stronger "no rater-pair # inconsistency" signal than the raw screen-positive count alone. head(out$table) # Look for: top rows with `|t_diff|` > 2 and |Bias_diff| > 0.5 logits # warrant content-review of the two raters' scoring conventions on # the conditioning context facet (e.g. compare their item-level # marks for systematic strictness/leniency patterns).toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- bias_pairwise_report(fit, diagnostics = diag, facet_a = "Rater", facet_b = "Criterion") s <- summary(out) s$summary # Look for: `MaxAbsBiasDiff` < ~0.5 logits and `Significant = 0` mean # no rater pair contrasts above the screen. The `BonferroniSignificant` # / `HolmSignificant` columns count pairs that survive multiple- # testing correction; both being 0 is a stronger "no rater-pair # inconsistency" signal than the raw screen-positive count alone. head(out$table) # Look for: top rows with `|t_diff|` > 2 and |Bias_diff| > 0.5 logits # warrant content-review of the two raters' scoring conventions on # the conditioning context facet (e.g. compare their item-level # marks for systematic strictness/leniency patterns).
Build an APA reporting bundle from model results
build_apa_outputs( fit, diagnostics, bias_results = NULL, context = list(), whexact = FALSE )build_apa_outputs( fit, diagnostics, bias_results = NULL, context = list(), whexact = FALSE )
fit |
Output from |
diagnostics |
Output from |
bias_results |
Optional output from |
context |
Optional named list for report context. |
whexact |
Logical. If |
context is an optional named list for narrative customization.
Frequently used fields include:
assessment, setting, scale_desc
rater_training, raters_per_response
rater_facet (used for targeted reliability note text)
line_width (optional text wrapping width for report_text; default = 92)
Output text includes residual-PCA screening commentary if PCA diagnostics are
available in diagnostics.
build_apa_outputs() is the single front-door helper for concise
manuscript reporting output. It intentionally reuses the same facts exposed
by summary(fit), summary(diagnostics), and companion reporting helpers,
but the object returned here is the paper-facing bundle: printing it shows
the compact Method / Results narrative, whereas summary(apa) is a QA
checklist for completeness, convergence/precision readiness, and wording
alignment.
For bounded GPCM, this helper returns a caveated APA scaffold. It uses
the GPCM-specific model wording and carries a support_status table plus a
caveat field. Keep fair-average and bias language at the screening tier,
and do not describe conditional fair-average SEs as full joint-uncertainty
intervals. Use gpcm_capability_matrix() as the formal boundary statement
for that branch.
By default, report_text includes:
model/data design summary (N, facet counts, scale range)
optimization/convergence metrics (Converged, Iterations, LogLik, AIC, BIC)
anchor/constraint summary (noncenter_facet, anchored levels, group anchors, dummy facets)
latent-regression population-model wording when fit has an active
population_formula
category/threshold diagnostics (including disordered-step details when present)
overall fit, misfit count, and top misfit levels
facet reliability/separation, residual PCA summary, and bias-screen counts
An object of class mfrm_apa_outputs with:
report_text: APA-style Method/Results draft prose
table_figure_notes: consolidated draft notes for tables/visuals
table_figure_captions: draft caption candidates without figure numbering
section_map: package-native section table for manuscript assembly
contract: structured APA reporting contract used for downstream checks
report_text: manuscript-draft narrative covering Method (model
specification, estimation, convergence) and Results (global fit,
facet separation/reliability, misfit triage, category diagnostics,
residual-PCA screening, bias screening). Written in third-person past tense
following APA 7th edition conventions, but still intended for human review.
table_figure_notes: reusable draft note blocks for table/figure appendices.
table_figure_captions: draft caption candidates aligned to generated outputs.
active latent-regression fits add a population-model section and Table 5 notes/captions that distinguish conditional-normal coefficient reporting from post hoc regression on EAP/MLE scores.
When bias results or PCA diagnostics are not supplied, those sections are omitted from the narrative rather than producing placeholder text.
Build diagnostics (and optional bias results). For RSM / PCM
reporting runs, prefer an MML fit and
diagnose_mfrm(..., diagnostic_mode = "both").
Run build_apa_outputs(...).
Check summary(apa) for completeness and analysis-readiness flags.
Print the returned object to view the concise manuscript narrative
(cat(apa$report_text) is equivalent for scripted output).
Insert apa$report_text, apa$section_map, and note/caption fields
into manuscript drafts after checking the listed cautions.
A minimal context list can include fields such as:
assessment: name of the assessment task
setting: administration context
scale_desc: short description of the score scale
rater_facet: rater facet label used in narrative reliability text
fit must be an mfrm_fit object from fit_mfrm().
diagnostics must be an mfrm_diagnostics object from diagnose_mfrm().
context must be a list (use NULL or list() for no extra context).
If supplied, bias_results must come from estimate_bias() or another
package-native bias helper that provides a table component.
build_visual_summaries(), estimate_bias(),
reporting_checklist(), mfrmr_reporting_and_apa
# Fast smoke run: a JML fit and a legacy diagnostic let us build the # APA bundle and confirm `report_text` is non-empty in well under # a second. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag_quick <- diagnose_mfrm(fit_quick, residual_pca = "none", diagnostic_mode = "legacy") apa_quick <- build_apa_outputs(fit_quick, diag_quick) nchar(apa_quick$report_text) > 0 fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") apa <- build_apa_outputs( fit, diag, context = list( assessment = "Toy writing task", setting = "Demonstration dataset", scale_desc = "0-2 rating scale", rater_facet = "Rater" ) ) s_apa <- summary(apa) s_apa$overview[, c("DraftContractPass", "AnalysisReady")] apa$section_map[, c("SectionId", "Available", "SentenceCount")] # Look for: `DraftContractPass = TRUE` before using the generated prose. # `AnalysisReady = FALSE` means convergence or formal precision still # needs review before submission. chk <- reporting_checklist(fit, diagnostics = diag) head(chk$checklist[, c("Section", "Item", "DraftReady", "NextAction")]) # Look for: rows with `DraftReady = "yes"` are ready to draft from # under the documented caveats. `"no"` rows tell you which helper / setting # needs to run before that paragraph can be drafted, via # `NextAction`. Aim for every Visual Displays / Reliability / # Diagnostics row to be `"yes"` before submitting. apa apa$section_map[, c("SectionId", "Available")]# Fast smoke run: a JML fit and a legacy diagnostic let us build the # APA bundle and confirm `report_text` is non-empty in well under # a second. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag_quick <- diagnose_mfrm(fit_quick, residual_pca = "none", diagnostic_mode = "legacy") apa_quick <- build_apa_outputs(fit_quick, diag_quick) nchar(apa_quick$report_text) > 0 fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") apa <- build_apa_outputs( fit, diag, context = list( assessment = "Toy writing task", setting = "Demonstration dataset", scale_desc = "0-2 rating scale", rater_facet = "Rater" ) ) s_apa <- summary(apa) s_apa$overview[, c("DraftContractPass", "AnalysisReady")] apa$section_map[, c("SectionId", "Available", "SentenceCount")] # Look for: `DraftContractPass = TRUE` before using the generated prose. # `AnalysisReady = FALSE` means convergence or formal precision still # needs review before submission. chk <- reporting_checklist(fit, diagnostics = diag) head(chk$checklist[, c("Section", "Item", "DraftReady", "NextAction")]) # Look for: rows with `DraftReady = "yes"` are ready to draft from # under the documented caveats. `"no"` rows tell you which helper / setting # needs to run before that paragraph can be drafted, via # `NextAction`. Aim for every Visual Displays / Reliability / # Diagnostics row to be `"yes"` before submitting. apa apa$section_map[, c("SectionId", "Available")]
Build a scoped ConQuest-overlap bundle
build_conquest_overlap_bundle( fit = NULL, case = c("synthetic_latent_regression"), output_dir = NULL, prefix = "conquest_overlap", overwrite = FALSE, quad_points = 7L, maxit = 40L, reltol = 1e-06 )build_conquest_overlap_bundle( fit = NULL, case = c("synthetic_latent_regression"), output_dir = NULL, prefix = "conquest_overlap", overwrite = FALSE, quad_points = 7L, maxit = 40L, reltol = 1e-06 )
fit |
Optional output from |
case |
Overlap case used when |
output_dir |
Optional directory where the bundle files should be
written. When |
prefix |
File-name prefix used when writing the bundle to disk. |
overwrite |
If |
quad_points |
Quadrature points used when |
maxit |
Maximum optimizer iterations used when |
reltol |
Relative convergence tolerance used when |
This helper prepares a narrow ConQuest comparison bundle for an RSM / PCM
latent-regression MML fit and records the mfrmr-side tables to compare
after an external ConQuest run. The supported overlap is intentionally
narrow:
ordered-response RSM / PCM only;
binary responses only;
exactly one non-person facet, treated as the item facet;
active latent-regression MML;
exactly one numeric person covariate beyond the intercept;
complete person-by-item rectangular data.
The returned bundle standardizes the responses to {0, 1}, pivots them to a
one-row-per-person wide CSV, stores the corresponding person covariates, and
records the mfrmr estimates that should be compared externally.
The conquest_command component is a conservative starting template, not a
guaranteed version-invariant automation. The conquest_output_contract
component records which requested external output should feed each
normalized audit table.
Use normalize_conquest_overlap_files() or
normalize_conquest_overlap_tables() and then audit_conquest_overlap() only
after the matching ConQuest run has been executed externally and the relevant
output tables have been extracted. The bundle and command template alone are
not external validation evidence.
A named list with class mfrm_conquest_overlap_bundle.
regression slope: compare directly;
residual variance sigma2: compare directly;
item estimates: compare after centering because the Rasch location origin remains constraint-dependent;
case EAP estimates: compare as posterior summaries under the fitted population model.
The returned object has class mfrm_conquest_overlap_bundle and includes:
summary: one-row scope summary with posterior-basis and
population-model audit fields
comparison_targets: comparison rules for the exported tables
conquest_output_contract: requested ConQuest outputs and audit handoff
response_long: long-format binary response data used by the bundle
response_wide: wide CSV-ready response matrix for the ConQuest template
person_data: one-row-per-person covariate table
item_map: mapping from exported response columns to original item levels
mfrmr_population: fitted population-model coefficients plus sigma2
mfrmr_item_estimates: fitted item estimates with centered values
mfrmr_case_eap: posterior EAP summaries for the fitted persons
conquest_command: conservative ConQuest command template
written_files: file inventory when output_dir is supplied
settings: bundle settings
notes: interpretation notes
normalize_conquest_overlap_files(),
normalize_conquest_overlap_tables(), audit_conquest_overlap(),
reference_case_benchmark(), build_mfrm_replay_script(),
export_mfrm_bundle()
bundle <- build_conquest_overlap_bundle() bundle$summary[, c("Case", "Facet", "Covariate", "Persons", "Items")] summary(bundle)$conquest_command_scope summary(bundle)$conquest_output_contract cat(substr(bundle$conquest_command, 1, 120))bundle <- build_conquest_overlap_bundle() bundle$summary[, c("Case", "Facet", "Covariate", "Persons", "Items")] summary(bundle)$conquest_command_scope summary(bundle)$conquest_output_contract cat(substr(bundle$conquest_command, 1, 120))
Links a series of calibration waves by computing mean offsets between adjacent pairs of fits. Common linking elements (e.g., raters or items that appear in consecutive administrations) are used to estimate the scale shift. Cumulative offsets place all waves on a common metric anchored to the first wave. The procedure is intended as a practical screened linking aid, not as a full general-purpose equating framework.
build_equating_chain( fits, anchor_facets = NULL, include_person = FALSE, drift_threshold = 0.5 ) ## S3 method for class 'mfrm_equating_chain' print(x, ...) ## S3 method for class 'mfrm_equating_chain' plot( x, y = NULL, type = c("common_anchors", "graph", "chain"), preset = c("standard", "publication", "compact"), draw = TRUE, ... ) ## S3 method for class 'mfrm_equating_chain' summary(object, ...) ## S3 method for class 'summary.mfrm_equating_chain' print(x, ...)build_equating_chain( fits, anchor_facets = NULL, include_person = FALSE, drift_threshold = 0.5 ) ## S3 method for class 'mfrm_equating_chain' print(x, ...) ## S3 method for class 'mfrm_equating_chain' plot( x, y = NULL, type = c("common_anchors", "graph", "chain"), preset = c("standard", "publication", "compact"), draw = TRUE, ... ) ## S3 method for class 'mfrm_equating_chain' summary(object, ...) ## S3 method for class 'summary.mfrm_equating_chain' print(x, ...)
fits |
Named list of |
anchor_facets |
Character vector of facets to use as linking elements. |
include_person |
Include person estimates in linking. |
drift_threshold |
Threshold for flagging large residuals in links. |
x |
An |
... |
Ignored. |
y |
Unused (S3 plot signature requirement). |
type |
One of |
preset |
Visual preset. |
draw |
If |
object |
An |
The screened linking chain uses a screened link-offset method. For each pair of
adjacent waves , the function:
Identifies common linking elements (facet levels present in both fits).
Computes per-element differences:
Computes a preliminary link offset using the inverse-variance weighted mean of these differences when standard errors are available (otherwise an unweighted mean).
Screens out elements whose residual from that preliminary offset exceeds
drift_threshold, then recomputes the final offset on the retained set.
Records Offset_SD (standard deviation of retained residuals) and
Max_Residual (maximum absolute deviation from the mean) as
indicators of link quality.
Flags links with fewer than 5 retained common elements in any linking facet as having thin support.
Cumulative offsets are computed by chaining link offsets from Wave 1 forward, placing all waves onto the metric of the first wave.
Elements whose per-link residual exceeds drift_threshold are flagged
in $element_detail$Flag. A high Offset_SD, many flagged elements, or a
thin retained anchor set signals an unstable link that may compromise the
resulting scale placement.
Object of class mfrm_equating_chain with components:
Tibble of link-level statistics (offset, SD, etc.).
Tibble of cumulative offsets per wave.
Tibble of element-level linking details.
Tibble of retained common-element counts by facet.
List of analysis configuration.
Use anchor_to_baseline() for a single new wave anchored to a known
baseline.
Use detect_anchor_drift() when you want direct comparison against one
reference wave.
Use build_equating_chain() when no single wave should dominate and you
want ordered, adjacent links across the series.
$links: one row per adjacent pair with From, To, N_Common,
N_Retained, Offset_Prelim, Offset, Offset_SD, and
Max_Residual. Small Offset_SD
relative to the offset indicates a consistent shift across elements.
LinkSupportAdequate = FALSE means at least one linking facet retained
fewer than 5 common elements after screening.
$cumulative: one row per wave with its cumulative offset from Wave 1.
Wave 1 always has offset 0.
$element_detail: per-element linking statistics (estimate in each
wave, difference, residual from mean offset, and flag status).
Flagged elements may indicate DIF or rater re-training effects.
$common_by_facet: retained common-element counts by linking facet for
each adjacent link.
$config: records wave names and analysis parameters.
Read links before cumulative: weak adjacent links can make later
cumulative offsets less trustworthy.
Fit each administration wave separately: fit_a <- fit_mfrm(...).
Combine into an ordered named list:
fits <- list(Spring23 = fit_s, Fall23 = fit_f, Spring24 = fit_s2).
Call chain <- build_equating_chain(fits).
Review summary(chain) for link quality.
Visualize with plot_anchor_drift(chain, type = "chain").
For problematic links, investigate flagged elements in
chain$element_detail and consider removing them from the anchor set.
detect_anchor_drift(), anchor_to_baseline(),
make_anchor_table(), plot_anchor_drift()
toy <- load_mfrmr_data("example_core") people <- unique(toy$Person) d1 <- toy[toy$Person %in% people[1:12], , drop = FALSE] d2 <- toy[toy$Person %in% people[13:24], , drop = FALSE] fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) chain <- build_equating_chain(list(Form1 = fit1, Form2 = fit2)) summary(chain) chain$cumulativetoy <- load_mfrmr_data("example_core") people <- unique(toy$Person) d1 <- toy[toy$Person %in% people[1:12], , drop = FALSE] d2 <- toy[toy$Person %in% people[13:24], , drop = FALSE] fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) chain <- build_equating_chain(list(Form1 = fit1, Form2 = fit2)) summary(chain) chain$cumulative
Build legacy-compatible fixed-width text reports
build_fixed_reports( bias_results, target_facet = NULL, branch = c("facets", "original") )build_fixed_reports( bias_results, target_facet = NULL, branch = c("facets", "original") )
bias_results |
Output from |
target_facet |
Optional target facet for pairwise contrast table. |
branch |
Output branch:
|
This function generates plain-text, fixed-width output intended to be read in console/log environments or exported into text reports.
The pairwise section (Table 14 style) is only generated for 2-way bias runs.
For higher-order interactions (interaction_facets length >= 3), the function
returns the bias table text and a note explaining why pairwise contrasts were
skipped.
A named list with class mfrm_fixed_reports (and a branch-specific
subclass mfrm_fixed_reports_<branch>):
bias_fixed: fixed-width interaction table text
pairwise_fixed: fixed-width pairwise contrast text
pairwise_table: underlying pairwise data.frame
branch: character scalar "original" or "facets" echoing
which fixed-width style was rendered
style: character scalar carrying the resolved style preset
used when building the text artifact
interaction_label: human-readable label for the interaction
that drove the bias run ("Rater x Criterion"-style); NA
when no bias rows are available
target_facet: character scalar identifying which facet was
used as the target facet for pairwise contrasts; NA when no
pairwise contrasts were requested or available
bias_fixed: fixed-width table of interaction effects.
pairwise_fixed: pairwise contrast text (2-way only).
pairwise_table: machine-readable contrast table.
interaction_label: facets used for the bias run.
Run estimate_bias().
Build text bundle with build_fixed_reports(...).
Use summary()/plot() for quick checks, then export text blocks.
For new reporting workflows, prefer bias_interaction_report() and
build_apa_outputs(). Use build_fixed_reports() when a fixed-width text
artifact is specifically required for a compatibility handoff.
estimate_bias(), build_apa_outputs(), bias_interaction_report(),
mfrmr_reports_and_tables, mfrmr_compatibility_layer
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) fixed <- build_fixed_reports(bias) fixed_original <- build_fixed_reports(bias, branch = "original") summary(fixed) p <- plot(fixed, draw = FALSE) p2 <- plot(fixed, type = "pvalue", draw = FALSE) if (interactive()) { plot( fixed, type = "contrast", draw = TRUE, main = "Pairwise Contrasts (Customized)", palette = c(pos = "#1b9e77", neg = "#d95f02"), label_angle = 45 ) }toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) fixed <- build_fixed_reports(bias) fixed_original <- build_fixed_reports(bias, branch = "original") summary(fixed) p <- plot(fixed, draw = FALSE) p2 <- plot(fixed, type = "pvalue", draw = FALSE) if (interactive()) { plot( fixed, type = "contrast", draw = TRUE, main = "Pairwise Contrasts (Customized)", palette = c(pos = "#1b9e77", neg = "#d95f02"), label_angle = 45 ) }
Build a linking-review synthesis object
build_linking_review( anchor_audit = NULL, drift = NULL, chain = NULL, top_n = 10 )build_linking_review( anchor_audit = NULL, drift = NULL, chain = NULL, top_n = 10 )
anchor_audit |
Optional output from |
drift |
Optional output from |
chain |
Optional output from |
top_n |
Maximum number of linking-risk rows to highlight in summary outputs. The full object keeps the full risk tables. |
build_linking_review() does not recompute anchor, drift, or chain
statistics. It is a synthesis layer that organizes package-native evidence
into one operational review surface with:
a front-door status block,
ranked linking risks,
explicit next actions,
plot routing metadata,
a reporting/export handoff map.
The helper keeps the current conservative interpretation policy: anchor drift and screened links are operational review tools, not automatic proofs of scale equivalence or score comparability.
An object of class mfrm_linking_review.
Use existing package-native outputs in this order:
audit_mfrm_anchors() for pre-fit anchor adequacy.
detect_anchor_drift() for direct wave-to-reference drift screening.
build_equating_chain() for adjacent screened-link review across waves.
overview: which evidence sources were supplied and the current review status.
top_linking_risks: primary operational triage table.
group_view_index: stable wave/link/facet/source-family grouping routes.
plot_map: which existing plotting helper should be used next.
reporting_map: what is covered here versus which manuscript-oriented
helper should be used separately.
This helper is currently intended for the validated RSM / PCM linking
workflow. If the supplied drift/chain sources resolve to bounded GPCM,
the helper stops with a package-level message rather than silently implying
support.
audit_mfrm_anchors(), detect_anchor_drift(),
build_equating_chain(), plot_anchor_drift(), mfrmr_linking_and_dff
d1 <- load_mfrmr_data("study1") d2 <- load_mfrmr_data("study2") fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) audit <- audit_mfrm_anchors(d1, "Person", c("Rater", "Criterion"), "Score") drift <- detect_anchor_drift(list(Wave1 = fit1, Wave2 = fit2)) chain <- build_equating_chain(list(Wave1 = fit1, Wave2 = fit2)) review <- build_linking_review(anchor_audit = audit, drift = drift, chain = chain) summary(review) review$top_linking_risks review$group_view_indexd1 <- load_mfrmr_data("study1") d2 <- load_mfrmr_data("study2") fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) audit <- audit_mfrm_anchors(d1, "Person", c("Rater", "Criterion"), "Score") drift <- detect_anchor_drift(list(Wave1 = fit1, Wave2 = fit2)) chain <- build_equating_chain(list(Wave1 = fit1, Wave2 = fit2)) review <- build_linking_review(anchor_audit = audit, drift = drift, chain = chain) summary(review) review$top_linking_risks review$group_view_index
Build an arbitrary-facet MFRM simulation specification.
build_mfrm_arbitrary_sim_spec( n_person = 50, facets, facet_sd = NULL, facets_per_person = NULL, score_levels = 5, theta_sd = 1, noise_sd = 0, step_span = 1.4, thresholds = NULL, group_levels = NULL, dif_effects = NULL, interaction_effects = NULL, model = c("RSM", "PCM", "GPCM") )build_mfrm_arbitrary_sim_spec( n_person = 50, facets, facet_sd = NULL, facets_per_person = NULL, score_levels = 5, theta_sd = 1, noise_sd = 0, step_span = 1.4, thresholds = NULL, group_levels = NULL, dif_effects = NULL, interaction_effects = NULL, model = c("RSM", "PCM", "GPCM") )
n_person |
Number of persons. A vector creates design-grid choices. |
facets |
Named counts for all non-person facets, or a named list whose elements are count choices. |
facet_sd |
Optional named standard deviations for simulated facet effects. A single unnamed value is reused for every facet. |
facets_per_person |
Optional named assignment counts. Facets not listed are fully crossed within each person; listed facets use deterministic rotating subsets. |
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread of generated common RSM thresholds. |
thresholds |
Optional common RSM thresholds. Must have |
group_levels |
Optional group labels assigned at the person level. |
dif_effects |
Optional DIF effect table. |
interaction_effects |
Optional facet-interaction effect table. |
model |
Measurement model for the arbitrary-facet generator. Version 0.2.0 supports |
This specification is the arbitrary-facet counterpart to build_mfrm_sim_spec().
The older role-based Person x Rater x Criterion generator remains available for PCM/GPCM simulation contracts.
This branch focuses on flexible RSM-based design and bias-screening sensitivity checks with any number of non-person facets.
If facets_per_person contains Rater = 2 and Task = 3, each person receives a deterministic rotating subset of two raters and three tasks. Omitted facets, such as Criteria, are fully crossed with those selected levels.
A typical design-first workflow is to build a specification, inspect it with summarize_mfrm_sim_design() or plot_mfrm_sim_design(), simulate data, and then fit or evaluate the design. If a researcher already has a fitted RSM model, use extract_mfrm_arbitrary_sim_spec() instead so the simulation starts from the observed response skeleton and fitted estimates.
An object of class mfrm_arbitrary_sim_spec.
simulate_mfrm_arbitrary_data(),
extract_mfrm_arbitrary_sim_spec(),
summarize_mfrm_sim_design(),
summarize_mfrm_sim_grid(),
plot_mfrm_sim_grid(),
evaluate_mfrm_bias_detection()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = c(Rater = 4, Criteria = 3, Task = 5, Occasion = 2), facet_sd = c(Rater = .35, Criteria = .25, Task = .30, Occasion = .10), facets_per_person = c(Rater = 2, Task = 3), score_levels = 5 ) spec$design_grid design <- summarize_mfrm_sim_design(spec) design$overview design$assignmentspec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = c(Rater = 4, Criteria = 3, Task = 5, Occasion = 2), facet_sd = c(Rater = .35, Criteria = .25, Task = .30, Occasion = .10), facets_per_person = c(Rater = 2, Task = 3), score_levels = 5 ) spec$design_grid design <- summarize_mfrm_sim_design(spec) design$overview design$assignment
Build a reproducibility manifest for an MFRM analysis
build_mfrm_manifest( fit, diagnostics = NULL, bias_results = NULL, population_prediction = NULL, unit_prediction = NULL, plausible_values = NULL, include_person_anchors = FALSE, data = NULL )build_mfrm_manifest( fit, diagnostics = NULL, bias_results = NULL, population_prediction = NULL, unit_prediction = NULL, plausible_values = NULL, include_person_anchors = FALSE, data = NULL )
fit |
Output from |
diagnostics |
Optional output from |
bias_results |
Optional output from |
population_prediction |
Optional output from
|
unit_prediction |
Optional output from |
plausible_values |
Optional output from |
include_person_anchors |
If |
data |
Optional original analysis data frame. When supplied,
the manifest's |
This helper captures a package-native configuration export. It summarizes analysis settings, source columns, anchoring information, and which downstream outputs are currently available.
A named list with class mfrm_manifest.
Use build_mfrm_manifest() when you want a compact, machine-readable record
of how an analysis was run. Compared with related helpers:
export_mfrm() writes analysis tables only.
build_mfrm_manifest() records settings and available outputs.
build_mfrm_replay_script() creates an executable R script.
export_mfrm_bundle() writes a shareable folder of files.
The returned bundle has class mfrm_manifest and includes:
summary: one-row analysis overview
environment: package/R/platform metadata
model_settings: key-value model settings table
source_columns: key-value data-column table
estimation_control: key-value optimizer settings table
anchor_summary: facet-level anchor summary
anchors: machine-readable anchor table
available_outputs: availability table for diagnostics/bias/PCA/prediction
outputs
settings: manifest build settings
The summary table is the quickest place to confirm that you are looking at
the intended analysis. The model_settings, source_columns, and
estimation_control tables are designed for audit trails and method write-up.
Active latent-regression fits also record their population-model provenance
there, including the fitted scoring basis, stored population_formula, and
person-level contract used by the fitted population model. When categorical
background variables are expanded through stats::model.matrix(),
population_xlevel_variables and population_contrast_variables identify
the variables whose fitted coding must be preserved for replay/scoring.
The available_outputs table is especially useful before building bundles,
because it tells you whether residual PCA, anchors, bias results, or
prediction-side artifacts are already available. A practical reading order is
summary first, available_outputs second, and anchors last when
reproducibility depends on fixed constraints.
Fit a model with fit_mfrm() or run_mfrm_facets().
Compute diagnostics once with diagnose_mfrm() if you want explicit
control over residual PCA.
Build a manifest and inspect summary plus available_outputs.
If you need files on disk, pass the same objects to
export_mfrm_bundle().
For bounded GPCM fits, the manifest records a support_status table and
should be interpreted as a package-native reproducibility bundle for the
caveated GPCM route. FACETS score-side compatibility exports remain outside
scope; use gpcm_capability_matrix() for the current boundary.
export_mfrm_bundle(), build_mfrm_replay_script(),
make_anchor_table(), reporting_checklist()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") manifest <- build_mfrm_manifest(fit, diagnostics = diag) manifest$summary[, c("Model", "Method", "Observations", "Facets")] manifest$available_outputs[, c("Component", "Available")]toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") manifest <- build_mfrm_manifest(fit, diagnostics = diag) manifest$summary[, c("Model", "Method", "Observations", "Facets")] manifest$available_outputs[, c("Component", "Available")]
Build a package-native replay script for an MFRM analysis
build_mfrm_replay_script( fit, diagnostics = NULL, bias_results = NULL, population_prediction = NULL, unit_prediction = NULL, plausible_values = NULL, data_file = "your_data.csv", fit_person_data_file = NULL, script_mode = c("auto", "fit", "facets"), include_bundle = FALSE, bundle_dir = "analysis_bundle", bundle_prefix = "mfrmr_replay" )build_mfrm_replay_script( fit, diagnostics = NULL, bias_results = NULL, population_prediction = NULL, unit_prediction = NULL, plausible_values = NULL, data_file = "your_data.csv", fit_person_data_file = NULL, script_mode = c("auto", "fit", "facets"), include_bundle = FALSE, bundle_dir = "analysis_bundle", bundle_prefix = "mfrmr_replay" )
fit |
Output from |
diagnostics |
Optional output from |
bias_results |
Optional output from |
population_prediction |
Optional output from
|
unit_prediction |
Optional output from |
plausible_values |
Optional output from
|
data_file |
Path to the analysis data file used in the generated script. |
fit_person_data_file |
Optional CSV filename to read for the fit-level
latent-regression replay person table. When |
script_mode |
One of |
include_bundle |
If |
bundle_dir |
Output directory used when |
bundle_prefix |
Prefix used by the generated bundle exporter call. |
This helper creates a reproducible-download style script using mfrmr's
installed API rather than embedding a separate estimation engine.
The generated script assumes the user has the package installed and provides
a data file at data_file.
Anchor and group-anchor constraints are embedded directly from the fitted object's stored configuration, so the script can replay anchored analyses without manual table reconstruction.
When the supplied fit uses the latent-regression MML branch, the generated
fit-mode script also carries the stored replay-ready person table together
with the corresponding population_formula / person_id /
population_policy arguments needed to recreate the population model.
By default that replay-ready table is embedded inline; when
fit_person_data_file is supplied, the generated script reads it from that
sidecar CSV relative to the replay script location.
This replay layer is intentionally unavailable for bounded GPCM, because
the current bundle/export contract still depends on the diagnostics/reporting
route that remains formalized only for the Rasch-family branch.
A named list with class mfrm_replay_script.
Use build_mfrm_replay_script() when you want a package-native recipe that
another analyst can rerun later. Compared with related helpers:
build_mfrm_manifest() records settings but does not run anything.
build_mfrm_replay_script() produces executable R code.
export_mfrm_bundle() can optionally write the replay script to disk.
The returned object contains:
summary: a one-row overview of the chosen replay mode and whether bundle
export was included
script: the generated R code as a single string
anchors and group_anchors: the exact stored constraints that were
embedded into the script
If ScriptMode is "facets", the script replays the higher-level
run_mfrm_facets() workflow. If it is "fit", the script uses
fit_mfrm() directly.
"auto" is the safest default and follows the structure of the supplied
object.
"fit" is useful when you want a minimal script centered on
fit_mfrm().
"facets" is useful when you want to preserve the higher-level
run_mfrm_facets() workflow, including stored column mapping.
Finalize a fit and diagnostics object.
Generate the replay script with the path you want users to read from.
Write replay$script to disk, or let export_mfrm_bundle() do it for
you.
Rerun the script in a fresh R session to confirm reproducibility.
build_mfrm_manifest(), export_mfrm_bundle(), run_mfrm_facets()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) replay <- build_mfrm_replay_script(fit, data_file = "your_data.csv") replay$summary[, c("ScriptMode", "ResidualPCA", "BiasPairs")] cat(substr(replay$script, 1, 120))toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) replay <- build_mfrm_replay_script(fit, data_file = "your_data.csv") replay$summary[, c("ScriptMode", "ResidualPCA", "BiasPairs")] cat(substr(replay$script, 1, 120))
Build an explicit simulation specification for MFRM design studies
build_mfrm_sim_spec( n_person = 50, n_rater = 4, n_criterion = 4, raters_per_person = n_rater, design = NULL, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, thresholds = NULL, model = c("RSM", "PCM", "GPCM"), step_facet = NULL, slope_facet = NULL, slopes = NULL, facet_names = NULL, assignment = c("crossed", "rotating", "resampled", "skeleton"), latent_distribution = c("normal", "empirical"), empirical_person = NULL, empirical_rater = NULL, empirical_criterion = NULL, assignment_profiles = NULL, design_skeleton = NULL, group_levels = NULL, dif_effects = NULL, interaction_effects = NULL, population_formula = NULL, population_coefficients = NULL, population_sigma2 = NULL, population_covariates = NULL )build_mfrm_sim_spec( n_person = 50, n_rater = 4, n_criterion = 4, raters_per_person = n_rater, design = NULL, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, thresholds = NULL, model = c("RSM", "PCM", "GPCM"), step_facet = NULL, slope_facet = NULL, slopes = NULL, facet_names = NULL, assignment = c("crossed", "rotating", "resampled", "skeleton"), latent_distribution = c("normal", "empirical"), empirical_person = NULL, empirical_rater = NULL, empirical_criterion = NULL, assignment_profiles = NULL, design_skeleton = NULL, group_levels = NULL, dif_effects = NULL, interaction_effects = NULL, population_formula = NULL, population_coefficients = NULL, population_sigma2 = NULL, population_covariates = NULL )
n_person |
Number of persons/respondents to generate. |
n_rater |
Number of rater facet levels to generate. |
n_criterion |
Number of criterion/item facet levels to generate. |
raters_per_person |
Number of raters assigned to each person. |
design |
Optional named design override supplied as a named list,
named vector, or one-row data frame. Names may use canonical variables
( |
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
rater_sd |
Standard deviation of simulated rater severities. |
criterion_sd |
Standard deviation of simulated criterion difficulties. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread used to generate equally spaced thresholds when
|
thresholds |
Optional threshold specification. Use either a numeric
vector of common thresholds or a data frame with columns |
model |
Measurement model recorded in the simulation specification. |
step_facet |
Step facet used when |
slope_facet |
Slope facet used when |
slopes |
Optional slope specification for |
facet_names |
Optional public names for the two simulated non-person
facet columns. Supply either an unnamed character vector of length 2
in rater-like / criterion-like order, or a named vector with names
|
assignment |
Assignment design. |
latent_distribution |
Latent-value generator. |
empirical_person |
Optional numeric support values used when
|
empirical_rater |
Optional numeric support values used when
|
empirical_criterion |
Optional numeric support values used when
|
assignment_profiles |
Optional data frame with columns
|
design_skeleton |
Optional data frame with columns |
group_levels |
Optional character vector of group labels. |
dif_effects |
Optional data frame of true group-linked DIF effects. |
interaction_effects |
Optional data frame of true interaction effects. |
population_formula |
Optional one-sided formula describing a
person-level latent-regression population model used when generating
person measures, for example |
population_coefficients |
Optional numeric vector of latent-regression
coefficients corresponding to the design matrix implied by
|
population_sigma2 |
Optional residual variance for the latent-regression person distribution. |
population_covariates |
Optional template data frame containing one row
per template person and the background variables referenced by
|
build_mfrm_sim_spec() creates an explicit, portable simulation
specification that can be passed to simulate_mfrm_data(). The goal is to
make the data-generating mechanism inspectable and reusable rather than
relying only on ad hoc scalar arguments.
The resulting object records:
design counts (n_person, n_rater, n_criterion, raters_per_person)
latent spread assumptions (theta_sd, rater_sd, criterion_sd)
optional empirical latent support values for semi-parametric simulation
threshold structure (threshold_table)
optional discrimination structure for bounded GPCM
(slope_table)
assignment design (assignment)
optional empirical assignment profiles (assignment_profiles) with
optional person-level Group labels
optional observed response skeleton (design_skeleton)
with optional person-level Group labels and observation-level Weight
values
optional person-level latent-regression population metadata including
population_formula, population_coefficients, population_sigma2, and
a reusable template of person-level covariates, including model-matrix
xlevel/contrast provenance for categorical covariates
planning_scope, an explicit record that the current planning/forecasting
helpers still target the role-based person x rater-like x criterion-like
design contract rather than a fully arbitrary-facet planner
planning_constraints, an explicit record of which design variables can
currently be changed from that specification without rebuilding it
planning_schema, a combined schema contract bundling the role descriptor,
scope boundary, current mutability map, a facet_manifest, a
schema-only future_facet_table, and a matching
future_design_template, plus a nested future_branch_schema scaffold
for a future arbitrary-facet planning branch
the current design$facets(...) parser now normalizes nested facet-count
input through that bundled future_branch_schema, whose nested
design_schema is now the authoritative schema-only branch object
optional signal tables for DIF and interaction bias
The current generator still targets the package's standard person x rater x
criterion workflow, but the public output names for those two facet roles
can now be customized with facet_names. This naming layer improves public
ergonomics; it does not yet turn the generator into a fully arbitrary-facet
simulator. Internally, helper objects still keep canonical role mappings so
that planning functions can treat the first non-person facet as rater-like
and the second as criterion-like. When threshold values are provided by
StepFacet, the supported step facets are the generated levels of the
chosen public rater-like or criterion-like column.
When model = "GPCM", the same public facet naming rules apply to the
slope table; the current bounded branch keeps slope_facet equal to
step_facet.
If population_formula is supplied, the simulation specification carries a
first-version person-level latent-regression generator. This affects only the
person distribution. The current implementation keeps the non-person facets
in the existing many-facet Rasch generator and resamples rows from
population_covariates to the requested design size before computing
with
.
An object of class mfrm_sim_spec.
This object does not contain simulated data. It is a data-generating
specification that tells simulate_mfrm_data() how to generate them.
extract_mfrm_sim_spec(), simulate_mfrm_data()
spec <- build_mfrm_sim_spec( design = list(person = 8, rater = 2, criterion = 2, assignment = 1), assignment = "rotating" ) spec$model spec$assignment nrow(spec$threshold_table)spec <- build_mfrm_sim_spec( design = list(person = 8, rater = 2, criterion = 2, assignment = 1), assignment = "rotating" ) spec$model spec$assignment nrow(spec$threshold_table)
Build a case-level misfit review bundle
build_misfit_casebook( fit, diagnostics = NULL, unexpected = NULL, displacement = NULL, administration_id = NULL, wave_id = NULL, top_n = 25 )build_misfit_casebook( fit, diagnostics = NULL, unexpected = NULL, displacement = NULL, administration_id = NULL, wave_id = NULL, top_n = 25 )
fit |
Output from |
diagnostics |
Optional output from |
unexpected |
Optional output from |
displacement |
Optional output from |
administration_id |
Optional scalar identifier describing the current administration or form. It is stored in row-level provenance and summary outputs when supplied. |
wave_id |
Optional scalar identifier for the current wave or occasion. It is stored in row-level provenance and summary outputs when supplied. |
top_n |
Maximum number of rows to keep in compact summary outputs. |
build_misfit_casebook() is a synthesis layer over package-native screening
outputs. It does not invent a new misfit statistic. Instead, it organizes
existing evidence families into one case-level review surface:
element-level Infit / Outfit MnSq misfit from diagnostics$fit
(rows whose Infit or Outfit MnSq falls outside the active MnSq
screening band returned by mfrm_misfit_thresholds())
strict marginal cell screens from diagnostics$marginal_fit$top_cells
strict pairwise screens from diagnostics$marginal_fit$pairwise$top_pairs
unexpected responses from unexpected_response_table()
displacement flags from displacement_table()
The result is an operational review bundle. It is not a formal adjudication
system, and repeated signals across evidence families should be prioritized
over any single isolated case row. In addition to raw case rows, the object
includes stable grouping views such as by_person, by_facet_level,
by_source_family, and by_wave to support operational triage. The
source_support component records which evidence families are currently
supported, caveated, or deferred under the active model.
An object of class mfrm_misfit_casebook.
Fit with fit_mfrm().
Build diagnostics with diagnose_mfrm().
Optionally build unexpected_response_table() and displacement_table()
yourself when you want custom thresholds before synthesizing the casebook.
For bounded GPCM, the helper is available with caveat. The casebook inherits
exploratory screening semantics from the underlying residual and strict
marginal sources; it should not be read as a formal inferential case test.
diagnose_mfrm(), unexpected_response_table(),
displacement_table(), plot_unexpected(), plot_displacement(),
plot_marginal_fit(), plot_marginal_pairwise()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", quad_points = 11) diag <- diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none") casebook <- build_misfit_casebook(fit, diagnostics = diag, top_n = 10) summary(casebook) casebook$top_cases[, c("CaseID", "SourceFamily", "Direction", "Signal")]toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", quad_points = 11) diag <- diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none") casebook <- build_misfit_casebook(fit, diagnostics = diag, top_n = 10) summary(casebook) casebook$top_cases[, c("CaseID", "SourceFamily", "Direction", "Signal")]
summary() outputsBuild a manuscript-oriented table bundle from summary() outputs
build_summary_table_bundle( x, which = NULL, appendix_preset = NULL, include_empty = FALSE, digits = 3, top_n = 10, preview_chars = 160 )build_summary_table_bundle( x, which = NULL, appendix_preset = NULL, include_empty = FALSE, digits = 3, top_n = 10, preview_chars = 160 )
x |
An |
which |
Optional character vector selecting a subset of named tables. |
appendix_preset |
Optional appendix-oriented table preset:
|
include_empty |
If |
digits |
Digits forwarded when |
top_n |
Row cap forwarded to compact |
preview_chars |
Character cap forwarded to
|
This helper turns the package's compact summary objects into a reproducible
table bundle for manuscript drafting, appendix handoff, or downstream
formatting. It does not replace apa_table(); instead, it provides a
consistent bridge from summary() to named data.frame components that can
later be rendered with apa_table() or exported directly.
The public entry point validates x and the summary-object contract up
front, so malformed summaries fail with a package-level message instead of
falling through to opaque downstream errors.
The function first normalizes x through the corresponding summary()
method when needed, then records a table_index describing every available
table and returns the selected tables in tables. Optional appendix presets
can be applied at bundle-construction time when you want a conservative
manuscript-facing subset before plotting or export.
An object of class mfrm_summary_table_bundle with:
overview
table_index
plot_index
tables
appendix_preset
notes
source_class
summary_class
fit_mfrm() or summary(fit)
diagnose_mfrm() or summary(diag)
describe_mfrm_data() or summary(ds)
reporting_checklist() or summary(chk)
build_apa_outputs() or summary(apa)
evaluate_mfrm_design() or summary(sim_eval)
evaluate_mfrm_signal_detection() or summary(sig_eval)
predict_mfrm_population() or summary(pred)
planning_schema$future_branch_active_branch or summary(...)
run_mfrm_facets() or summary(out)
estimate_bias() or summary(bias)
audit_mfrm_anchors() or summary(audit)
build_linking_review() or summary(review)
build_misfit_casebook() or summary(casebook)
build_weighting_audit() or summary(audit)
predict_mfrm_units() or summary(pred_units)
sample_mfrm_plausible_values() or summary(pv)
overview: one-row metadata about the source summary and table counts.
table_index: table names, dimensions, roles, and manuscript-oriented
descriptions.
plot_index: which returned tables contain numeric content and which
bundle-level plot types can use them directly.
tables: named data.frame objects ready for formatting or export.
appendix_preset: active appendix subset mode ("none" when not used).
notes: short guidance about omitted empty tables or source-level caveats.
fit-level caveats use the analysis_caveats role; pre-fit data
score-support caveats use the score_category_caveats role. Both roles are
classified as diagnostics and stay in recommended appendix subsets.
latent-regression fit summaries expose population_coding in the methods
appendix role so categorical levels, contrasts, and encoded columns can be
documented with the coefficient table.
Build a compact object with summary(...).
Convert it with build_summary_table_bundle(...).
Use bundle$tables[[...]] directly, or hand a selected table to
apa_table() for formatted manuscript output.
If you want a manuscript appendix subset up front, use a preset such as
appendix_preset = "recommended", "compact", or "diagnostics".
summary(), apa_table(), reporting_checklist(),
build_apa_outputs()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) bundle <- build_summary_table_bundle(fit) bundle$table_index summary(bundle)$role_summarytoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) bundle <- build_summary_table_bundle(fit) bundle$table_index summary(bundle)$role_summary
Build warning and narrative summaries for visual outputs
build_visual_summaries( fit, diagnostics, threshold_profile = "standard", thresholds = NULL, summary_options = NULL, whexact = FALSE, branch = c("original", "facets") )build_visual_summaries( fit, diagnostics, threshold_profile = "standard", thresholds = NULL, summary_options = NULL, whexact = FALSE, branch = c("original", "facets") )
fit |
Output from |
diagnostics |
Output from |
threshold_profile |
Threshold profile name ( |
thresholds |
Optional named overrides for profile thresholds. |
summary_options |
Summary options for |
whexact |
Use exact ZSTD transformation. |
branch |
Output branch:
|
This function returns visual-keyed text maps to support dashboard/report rendering without hard-coding narrative strings in UI code.
thresholds can override any profile field by name. Common overrides:
n_obs_min, n_person_min
misfit_ratio_warn, zstd2_ratio_warn, zstd3_ratio_warn
pca_first_eigen_warn, pca_first_prop_warn
summary_options supports:
detail: "standard" or "detailed"
max_facet_ranges: max facet-range snippets shown in visual summaries
top_misfit_n: number of top misfit entries included
For bounded GPCM, this helper is available as a caveated visual-routing
layer. Fair-average and bias entries use slope-aware GPCM screens and should
be read as exploratory score-side diagnostics rather than Rasch-family
invariance evidence.
An object of class mfrm_visual_summaries with:
warning_map: visual-level warning text vectors
summary_map: visual-level descriptive text vectors
warning_counts, summary_counts: message counts by visual key
plot_payloads: reusable draw-free payloads for comparison,
warning_counts, summary_counts, and optionally
category_probability_surface
public_plot_routes: public helper / draw-free route map for follow-up
crosswalk: FACETS-reference mapping for main visual keys
branch, style, threshold_profile: branch metadata
warning_map: rule-triggered warning text by visual key.
summary_map: descriptive narrative text by visual key.
strict marginal keys appear when diagnose_mfrm(..., diagnostic_mode = "both")
supplies latent-integrated first-order and pairwise screening summaries.
warning_counts / summary_counts: message-count tables for QA checks.
plot_payloads: ready-to-reuse mfrm_plot_data payloads for the bundle's
own comparison/count plots and, when step estimates are available, the
exploratory category_probability_surface payload from
plot(fit, type = "ccc_surface", draw = FALSE). The surface payload
carries category_support, interpretation_guide, and reporting_policy
tables for zero-frequency category and reporting-boundary checks.
public_plot_routes: draw-free helper routes for the dedicated public plot
functions behind each visual family.
support_status / caveat: present for bounded GPCM fits to document
the exploratory support boundary.
inspect defaults with mfrm_threshold_profiles()
choose threshold_profile (strict / standard / lenient)
optionally override selected fields via thresholds
inspect vis$public_plot_routes to choose the dedicated public helper
and vis$plot_payloads for reusable draw-free payloads
cross-check figure placement with
reporting_checklist(fit, diagnostics)$visual_scope and
visual_reporting_template() before writing captions or results text
mfrm_threshold_profiles(), build_apa_outputs(),
plot_marginal_fit(), plot_marginal_pairwise()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") vis <- build_visual_summaries(fit, diag, threshold_profile = "strict") vis2 <- build_visual_summaries( fit, diag, threshold_profile = "standard", thresholds = c(misfit_ratio_warn = 0.20, pca_first_eigen_warn = 2.0), summary_options = list(detail = "detailed", top_misfit_n = 5) ) vis_facets <- build_visual_summaries(fit, diag, branch = "facets") vis_facets$branch summary(vis) p <- plot(vis, type = "comparison", draw = FALSE) p2 <- plot(vis, type = "warning_counts", draw = FALSE) vis$plot_payloads$comparison$data$plot vis$public_plot_routes[, c("Visual", "PlotHelper", "DrawFreeRoute")] if (interactive()) { plot( vis, type = "comparison", draw = TRUE, main = "Warning vs Summary Counts (Customized)", palette = c(warning = "#cb181d", summary = "#3182bd"), label_angle = 45 ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") vis <- build_visual_summaries(fit, diag, threshold_profile = "strict") vis2 <- build_visual_summaries( fit, diag, threshold_profile = "standard", thresholds = c(misfit_ratio_warn = 0.20, pca_first_eigen_warn = 2.0), summary_options = list(detail = "detailed", top_misfit_n = 5) ) vis_facets <- build_visual_summaries(fit, diag, branch = "facets") vis_facets$branch summary(vis) p <- plot(vis, type = "comparison", draw = FALSE) p2 <- plot(vis, type = "warning_counts", draw = FALSE) vis$plot_payloads$comparison$data$plot vis$public_plot_routes[, c("Visual", "PlotHelper", "DrawFreeRoute")] if (interactive()) { plot( vis, type = "comparison", draw = TRUE, main = "Warning vs Summary Counts (Customized)", palette = c(warning = "#cb181d", summary = "#3182bd"), label_angle = 45 ) }
Build a weighting-policy audit between Rasch-family and bounded GPCM fits
build_weighting_audit( rasch_fit, gpcm_fit, theta_range = c(-6, 6), theta_points = 101L, top_n = 10L )build_weighting_audit( rasch_fit, gpcm_fit, theta_range = c(-6, 6), theta_points = 101L, top_n = 10L )
rasch_fit |
Output from |
gpcm_fit |
Output from |
theta_range |
Numeric vector of length 2 passed to |
theta_points |
Integer number of theta grid points passed to
|
top_n |
Maximum number of rows to keep in compact summary outputs. |
build_weighting_audit() is an operational model-choice review helper. It
is designed for the common question:
what changes when a Rasch-family equal-weighting model is replaced with a
bounded GPCM that allows discrimination-based reweighting?
The helper does not estimate a new model. Instead, it synthesizes four package-native evidence sources:
compare_mfrm() for same-data model comparison
the non-person facet measures from each fit
the bounded GPCM slope table
compute_information() for design-weighted information redistribution
The result is intended for substantive review, not for automatic model
selection. In particular, a better-fitting GPCM should not by itself be
interpreted as a reason to discard an equal-weighting Rasch-family route.
An object of class mfrm_weighting_audit.
Fit an equal-weighting reference model with model = "RSM" or "PCM".
Fit a bounded GPCM on the same prepared response data.
Run build_weighting_audit(rasch_fit, gpcm_fit).
Read summary(audit) before deciding whether the discrimination-based
reweighting is substantively acceptable.
model_comparison: same-data model-comparison bundle from compare_mfrm().
facet_shift: how non-person facet estimates move under bounded GPCM.
slope_profile: which slope_facet levels are upweighted or downweighted.
information_redistribution: within-facet information-share changes
between the Rasch-family fit and bounded GPCM.
top_reweighted_levels: compact triage table for the strongest
slope-facet-level redistribution signals.
This helper is available only for the current bounded GPCM branch. It
requires the package's existing slope_facet == step_facet contract and
should be read as an operational weighting-policy review, not as a formal
validity adjudication.
compare_mfrm(), compute_information(), gpcm_capability_matrix()
toy <- load_mfrmr_data("example_core") rasch_fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", quad_points = 9 ) gpcm_fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "GPCM", step_facet = "Criterion", slope_facet = "Criterion", quad_points = 9 ) audit <- build_weighting_audit(rasch_fit, gpcm_fit, theta_points = 41) summary(audit) audit$top_reweighted_levelstoy <- load_mfrmr_data("example_core") rasch_fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", quad_points = 9 ) gpcm_fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "GPCM", step_facet = "Criterion", slope_facet = "Criterion", quad_points = 9 ) audit <- build_weighting_audit(rasch_fit, gpcm_fit, theta_points = 41) summary(audit) audit$top_reweighted_levels
Build a category curve export bundle (preferred alias)
category_curves_report( fit, theta_range = c(-6, 6), theta_points = 241, digits = 4, include_fixed = FALSE, fixed_max_rows = 400 )category_curves_report( fit, theta_range = c(-6, 6), theta_points = 241, digits = 4, include_fixed = FALSE, fixed_max_rows = 400 )
fit |
Output from |
theta_range |
Theta/logit range for curve coordinates. |
theta_points |
Number of points on the theta grid. |
digits |
Rounding digits for numeric graph output. |
include_fixed |
If |
fixed_max_rows |
Maximum rows shown in fixed-width graph tables. |
Preferred high-level API for category-probability curve exports. Returns tidy curve coordinates and summary metadata for quick plotting/report integration without calling low-level helpers directly.
A named list with category-curve components. Class:
mfrm_category_curves.
Use this report to inspect:
where each category has highest probability across theta
whether adjacent categories cross in expected order
whether probability bands look compressed (often sparse categories)
Recommended read order:
summary(out) for compact diagnostics.
out$curve_points (or equivalent curve table) for downstream graphics.
plot(out) for a default visual check.
Fit model with fit_mfrm().
Run category_curves_report() with suitable theta_points.
Use summary() and plot(); export tables for manuscripts/dashboard use.
category_structure_report(), rating_scale_table(), plot.mfrm_fit(),
mfrmr_reports_and_tables, mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- category_curves_report(fit, theta_points = 101) summary(out) head(out$probabilities[, c("CurveGroup", "Theta", "Category", "Probability")]) p_cc <- plot(out, draw = FALSE) p_cc$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- category_curves_report(fit, theta_points = 101) summary(out) head(out$probabilities[, c("CurveGroup", "Theta", "Category", "Probability")]) p_cc <- plot(out, draw = FALSE) p_cc$data$plot
Build a category structure report (preferred alias)
category_structure_report( fit, diagnostics = NULL, theta_range = c(-6, 6), theta_points = 241, drop_unused = FALSE, include_fixed = FALSE, fixed_max_rows = 200 )category_structure_report( fit, diagnostics = NULL, theta_range = c(-6, 6), theta_points = 241, drop_unused = FALSE, include_fixed = FALSE, fixed_max_rows = 200 )
fit |
Output from |
diagnostics |
Optional output from |
theta_range |
Theta/logit range used to derive transition points. |
theta_points |
Number of grid points used for transition-point search. |
drop_unused |
If |
include_fixed |
If |
fixed_max_rows |
Maximum rows per fixed-width section. |
Preferred high-level API for category-structure diagnostics. This wraps the legacy-compatible bar/transition export and returns a stable bundle interface for reporting and plotting.
A named list with category-structure components. Class:
mfrm_category_structure.
Key components include:
category usage/fit table (count, expected, infit/outfit, ZSTD)
threshold ordering and adjacent threshold gaps
category transition-point table on the requested theta grid
Practical read order:
summary(out) for compact warnings and threshold ordering.
out$category_table for sparse/misfitting categories.
out$median_thresholds for adjacent-threshold caveats when zero-count
categories are retained.
plot(out) for quick visual check.
fit_mfrm() -> model.
diagnose_mfrm() -> residual/fit diagnostics (optional argument here).
category_structure_report() -> category health snapshot.
summary() and plot() for draft-oriented review of category structure.
rating_scale_table(), category_curves_report(), plot.mfrm_fit(),
mfrmr_reports_and_tables, mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- category_structure_report(fit) summary(out) head(out$category_table[, c("Category", "Count", "Infit", "Outfit")]) p_cs <- plot(out, draw = FALSE) p_cs$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- category_structure_report(fit) summary(out) head(out$category_table[, c("Category", "Count", "Infit", "Outfit")]) p_cs <- plot(out, draw = FALSE) p_cs$data$plot
Check residual dimensionality with parallel-analysis thresholds
check_residual_dimensionality( x, mode = c("both", "overall", "facet"), facets = NULL, method = c("residual_normal", "permutation", "parametric"), reps = 100L, quantile = 0.95, pca_max_factors = 10L, seed = NULL )check_residual_dimensionality( x, mode = c("both", "overall", "facet"), facets = NULL, method = c("residual_normal", "permutation", "parametric"), reps = 100L, quantile = 0.95, pca_max_factors = 10L, seed = NULL )
x |
Output from |
mode |
Residual matrix scope: |
facets |
Optional facet subset for facet-level matrices. When supplied, the same subset is also used to form the overall combined-facet residual matrix. |
method |
Null-generation method. |
reps |
Number of null replications. |
quantile |
Parallel-analysis quantile used as the decision threshold.
The default |
pca_max_factors |
Maximum number of components retained in output, or
|
seed |
Optional random seed for reproducible null simulations. |
This function adds a simulation-calibrated layer to the residual PCA tools. It compares each observed residual eigenvalue with eigenvalues obtained under a unidimensional null reference.
The three null methods answer different questions:
"residual_normal" is Horn-style parallel analysis on independent
normal residual matrices with the observed matrix shape and missingness.
"permutation" preserves the empirical column distributions of the
standardized residual matrix, while removing cross-column association.
"parametric" samples categorical responses from the fitted mfrmr
model, then recomputes standardized residual matrices using the fitted
expected scores and variances. It preserves the observed design and the
fitted category-response model, but it does not refit the model in each
replication.
The procedure is exploratory. It is not FACETS ZSTD, TAM itemfit ZSTD, or mirt's S-X2 item-fit statistic. It is a residual-structure diagnostic for deciding whether residual components are larger than expected under a chosen null reference.
An object of class mfrm_residual_dimensionality, containing:
observed: observed residual PCA eigenvalue table
null_distribution: replication-level null eigenvalues
comparison: observed eigenvalues joined to null mean, SD, and quantile
settings: method, repetitions, quantile, and PCA settings
For a standardized residual matrix R, mfrmr computes the eigenvalues
of the positive-definite adjusted residual correlation matrix. In parallel
analysis, the observed eigenvalue lambda_j is compared with a null
threshold q_j, the selected quantile of simulated null eigenvalues for
component j. The component is flagged when lambda_j > q_j.
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185.
Glorfeld, L. W. (1995). An improvement on Horn's parallel analysis methodology. Educational and Psychological Measurement, 55, 377-393.
Linacre, J. M. (1998). Structure in Rasch residuals: Why principal components analysis (PCA)? Rasch Measurement Transactions, 12(2), 636.
Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-283.
Chou, Y.-T., & Wang, W.-C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70, 717-731.
Timmerman, M. E., & Lorenzo-Seva, U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16, 209-220.
analyze_residual_pca(), plot_residual_dimensionality(),
plot_residual_pca()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 20) dim_check <- check_residual_dimensionality( fit, mode = "overall", method = "parametric", reps = 5, seed = 123 ) dim_check head(as.data.frame(dim_check)) plot_residual_dimensionality(dim_check, draw = FALSE)$data$datatoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 20) dim_check <- check_residual_dimensionality( fit, mode = "overall", method = "parametric", reps = 5, seed = 123 ) dim_check head(as.data.frame(dim_check)) plot_residual_dimensionality(dim_check, draw = FALSE)$data$data
Produce a side-by-side comparison of multiple fit_mfrm() results using
information criteria, log-likelihood, and parameter counts. When exactly
two models are supplied and the current conservative nesting audit passes,
a likelihood-ratio test is included.
compare_mfrm(..., labels = NULL, warn_constraints = TRUE, nested = FALSE)compare_mfrm(..., labels = NULL, warn_constraints = TRUE, nested = FALSE)
... |
Two or more |
labels |
Optional character vector of labels for each model.
If |
warn_constraints |
Logical. If |
nested |
Logical. Set to |
Models should be fit to the same data (same rows, same person/facet columns) for the comparison to be meaningful. The function checks that observation counts match and warns otherwise.
Information-criterion ranking is reported only when all candidate models
use the package's MML estimation path, analyze the same observations, and
converge successfully. Raw AIC and BIC values are still shown for each
model, but Delta_*, weights, and preferred-model summaries are suppressed
when the likelihood basis is not comparable enough for primary reporting.
Nesting: Two models are nested when one is a special case of the other obtained by imposing equality constraints. The most common nesting in MFRM is RSM (shared thresholds) inside PCM (item-specific thresholds). Models that differ only in estimation method (MML vs JML) on the same specification are not nested in the usual sense—use information criteria rather than LRT for that comparison.
In the current mfrmr model space, the automatic nesting audit is
intentionally conservative. It currently supports two fixed-effect
restrictions under shared data and shared constraints:
RSM nested inside PCM when the PCM fit has an explicit
step_facet;
same-family additive-vs-interaction comparisons when the smaller fit's
facet_interactions set is a subset of the larger fit's set.
Cross-method comparisons, comparisons that change anchors/dummying/centering, and same-family comparisons that do not add fixed interaction terms are not automatically promoted to LRT claims.
The likelihood-ratio test (LRT) is reported only when exactly two
models are supplied, nested = TRUE, the structural audit passes, and the
difference in the number of parameters is positive:
The LRT is asymptotically valid when models are nested and the data are independent. With small samples or boundary conditions (e.g., variance components near zero), treat p-values as approximate.
An object of class mfrm_comparison (named list) with:
table: data.frame of model-level statistics (LogLik, AIC, BIC,
Delta_AIC, AkaikeWeight, Delta_BIC, BICWeight, npar, nobs, Model,
Method, Converged, ICComparable).
lrt: data.frame with likelihood-ratio test result (only when two models
are supplied and nested = TRUE). Contains ChiSq, df, p_value.
evidence_ratios: data.frame of pairwise Akaike-weight ratios (Model1,
Model2, EvidenceRatio). NULL when weights cannot be computed.
preferred: named list with the preferred model label by each criterion.
comparison_basis: list describing whether IC and LRT comparisons were
considered comparable. Includes a conservative nesting_audit.
In addition to raw AIC and BIC values, the function computes:
Delta_AIC / Delta_BIC: difference from the best (minimum) value. A Delta < 2 is typically considered negligible; 4–7 suggests moderate evidence; > 10 indicates strong evidence against the higher-scoring model (Burnham & Anderson, 2002).
AkaikeWeight / BICWeight: model probabilities derived from
exp(-0.5 * Delta), normalised across the candidate set. An
Akaike weight of 0.90 means the model has a 90\
being the best in the candidate set.
Evidence ratios: pairwise ratios of Akaike weights, quantifying the relative evidence for one model over another (e.g., an evidence ratio of 5 means the preferred model is 5 times more likely).
AIC penalises complexity less than BIC; when they disagree, AIC favours the more complex model and BIC the simpler one.
compare_mfrm() is a same-basis model-comparison helper. Its strongest
claims apply only when the models were fit to the same response data,
under a compatible likelihood basis, and with compatible constraint
structure.
Do not treat AIC/BIC differences as primary evidence when
table$ICComparable is FALSE.
Do not interpret the LRT unless nested = TRUE and the structural audit
in comparison_basis$nesting_audit passes.
Same-family additive-vs-interaction fits are considered nested only when
all other structural settings match and the smaller model's
facet_interactions set is a subset of the larger model's set.
Do not assume that nested = TRUE overrides the package's conservative
nesting boundary; unsupported relations remain unsupported.
Do not compare models fit to different datasets, different score codings, or materially different constraint systems as if they were commensurate.
Lower AIC/BIC values indicate better parsimony-accuracy trade-off only
when table$ICComparable is TRUE.
A significant LRT p-value suggests the more complex model provides a meaningfully better fit only when the nesting assumption truly holds.
preferred indicates the model preferred by each criterion.
evidence_ratios gives pairwise Akaike-weight ratios (returned only
when Akaike weights can be computed for at least two models).
When comparing more than two models, interpret evidence ratios cautiously—they do not adjust for multiple comparisons.
table: first-pass comparison table; start with ICComparable,
Model, Method, AIC, and BIC.
comparison_basis: records whether IC and LRT claims are defensible for
the supplied models. Inspect comparison_basis$nesting_audit$relation
and reason before reading any LRT output.
lrt: nested-model test summary, present only when the requested and
audited conditions are met.
preferred: candidate preferred by each criterion when those summaries
are available.
Inspect comparison_basis before writing conclusions. If comparability is
weak, treat the result as descriptive and revise the model setup (for
example, explicit step_facet, common data, or common constraints) before
using IC or LRT results in reporting.
Fit two models with fit_mfrm() (e.g., RSM and PCM).
Compare with compare_mfrm(fit_rsm, fit_pcm).
Inspect summary(comparison) for AIC/BIC diagnostics and, when
appropriate, an LRT.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461-464.
toy <- load_mfrmr_data("example_core") fit_rsm <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", maxit = 25) fit_pcm <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "PCM", step_facet = "Criterion", maxit = 25) comp <- compare_mfrm(fit_rsm, fit_pcm, labels = c("RSM", "PCM")) comp$table comp$evidence_ratiostoy <- load_mfrmr_data("example_core") fit_rsm <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "RSM", maxit = 25) fit_pcm <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", model = "PCM", step_facet = "Criterion", maxit = 25) comp <- compare_mfrm(fit_rsm, fit_pcm, labels = c("RSM", "PCM")) comp$table comp$evidence_ratios
List retained compatibility aliases and preferred names
compatibility_alias_table( scope = c("all", "functions", "arguments", "columns", "plot_metrics") )compatibility_alias_table( scope = c("all", "functions", "arguments", "columns", "plot_metrics") )
scope |
Which alias surface to return: |
This helper is a compact public registry of the compatibility aliases that
mfrmr intentionally keeps visible for older scripts and downstream
handoffs. It is meant to answer two questions quickly:
Which old names are still accepted?
Which package-native names should new code use instead?
Internal soft-deprecated helpers are deliberately excluded here. This table is only for retained user-facing aliases that remain part of the public surface.
A data.frame with one row per retained alias and columns:
Alias
PreferredName
Surface
Lifecycle
RetainedFor
Notes
Call compatibility_alias_table() when reading older scripts or reports.
Use PreferredName when writing new analysis code.
Keep the alias only when an older workflow or external handoff requires it.
mfrmr_compatibility_layer, run_mfrm_facets(), analyze_dff(),
reporting_checklist(), fair_average_table(), plot_fair_average()
compatibility_alias_table() compatibility_alias_table("functions") compatibility_alias_table("columns")compatibility_alias_table() compatibility_alias_table("functions") compatibility_alias_table("columns")
Combines per-facet average cluster size with ICC estimates to return
the Kish (1965) design effect Deff = 1 + (m - 1) * rho, where m
is the average number of observations per facet element and rho is
the ICC.
compute_facet_design_effect( data, facets, icc_table = NULL, score = NULL, person = NULL )compute_facet_design_effect( data, facets, icc_table = NULL, score = NULL, person = NULL )
data |
Data frame in long format. |
facets |
Character vector of facet column names. |
icc_table |
Output from |
score |
Score column name; required when |
person |
Person column; passed through to compute_facet_icc(). |
A data.frame of class mfrm_facet_design_effect with columns
Facet, AvgClusterSize, ICC, DesignEffect, and EffectiveN.
Deff = 1: facet behaves like simple random sampling; no
clustering-induced variance inflation.
Deff > 1: variance of the mean estimate is inflated by a factor
of Deff relative to SRS. EffectiveN = N / Deff is the sample
size one would need under SRS to achieve the same precision. For
rater-mediated designs, Deff well above 1 on the Rater facet
means rater-level clustering is noticeable; consider whether
rater generalisation is warranted.
Reported ICC is pulled from icc_table$ICC (the variance share);
interpretation is the same as in compute_facet_icc().
Run compute_facet_icc() to get the variance-component shares.
Feed the result and the data into
compute_facet_design_effect(data, facets, icc_table = icc).
Use Deff as part of the Methods discussion when generalising
over raters or sites. Large Deff values argue for reporting
robust SEs or moving to a hierarchical model.
Kish, L. (1965). Survey Sampling. New York: Wiley.
Park, I., & Lee, H. (2001). The design effect: Do we know all about it? In Proceedings of the American Statistical Association, Survey Research Methods Section (pp. 143-148).
compute_facet_icc(), analyze_hierarchical_structure().
toy <- load_mfrmr_data("example_core") if (requireNamespace("lme4", quietly = TRUE)) { icc <- compute_facet_icc(toy, facets = c("Rater", "Criterion"), score = "Score", person = "Person") deff <- compute_facet_design_effect(toy, facets = c("Rater", "Criterion"), icc_table = icc) print(deff) # Large DesignEffect -> modest EffectiveN relative to raw N. }toy <- load_mfrmr_data("example_core") if (requireNamespace("lme4", quietly = TRUE)) { icc <- compute_facet_icc(toy, facets = c("Rater", "Criterion"), score = "Score", person = "Person") deff <- compute_facet_design_effect(toy, facets = c("Rater", "Criterion"), icc_table = icc) print(deff) # Large DesignEffect -> modest EffectiveN relative to raw N. }
Fits a random-effects variance-components model
Score ~ 1 + (1 | Person) + (1 | Facet1) + (1 | Facet2) + ...
using lme4::lmer (in Suggests) and returns the proportion of
observed score variance attributable to each facet. This is a
descriptive summary complementary to the Rasch-metric rater
separation/reliability reported elsewhere.
compute_facet_icc( data, facets, score, person = NULL, reml = TRUE, ci_method = c("none", "profile", "boot"), ci_level = 0.95, ci_boot_reps = 1000L, ci_boot_seed = NULL, ci_boot_parallel = c("no", "multicore", "snow"), ci_boot_ncpus = 1L )compute_facet_icc( data, facets, score, person = NULL, reml = TRUE, ci_method = c("none", "profile", "boot"), ci_level = 0.95, ci_boot_reps = 1000L, ci_boot_seed = NULL, ci_boot_parallel = c("no", "multicore", "snow"), ci_boot_ncpus = 1L )
data |
Data frame in long format. |
facets |
Character vector of facet column names. |
score |
Name of the score column. |
person |
Optional person column. If supplied it is added as a separate random intercept so Person-level variance is partitioned out. |
reml |
Logical; whether to fit with REML. Default |
ci_method |
Confidence-interval method for the ICC column.
One of |
ci_level |
Confidence level when |
ci_boot_reps |
Number of bootstrap replicates used when
|
ci_boot_seed |
Optional integer seed for the bootstrap path
( |
ci_boot_parallel |
Parallelisation strategy for the
parametric-bootstrap CI path, passed through to
|
ci_boot_ncpus |
Number of CPUs to use for the parallel
bootstrap path (ignored when |
A data.frame of class mfrm_facet_icc with one row per
variance component (including a "Residual" row) and columns:
Facet: the grouping factor name (or "Residual").
Variance: REML variance estimate.
ICC: variance share (Variance / sum(Variance)), in [0, 1].
Interpretation: band label according to the facet's scale.
InterpretationScale: "Koo-Li reliability" for the person
facet, "Variance share" for others.
ICC_CI_Lower / ICC_CI_Upper / ICC_CI_Level / ICC_CI_Method:
CI bounds, level, and method (populated when ci_method != "none";
NA_real_ otherwise).
ICC_CI_NReps: bootstrap replicate count when
ci_method = "boot" (absent otherwise).
The Interpretation column uses two scales so the same numeric
ICC reads correctly for each facet role:
For the person facet, higher ICC = better. Koo & Li (2016, p. 161)
bands are applied: < 0.5 Poor, [0.5, 0.75] Moderate,
(0.75, 0.9] Good, > 0.9 Excellent. The strict > boundary at
0.9 follows Koo & Li's wording "values greater than 0.90 indicate
excellent reliability" (so an ICC of exactly 0.9 reads as Good).
For non-person facets (Rater, Criterion, Task, Region, ...) the
same numeric value is a variance share: how much of the total
observed score variance sits at that facet. The bands used here
are different (Trivial share < 0.05, Small share < 0.15,
Moderate share < 0.30, Large share >= 0.30), and a large
rater share is generally bad news (raters disagree about
averages), not good news.
The InterpretationScale column explicitly records which scale
applies to each row, so downstream reporting does not confuse the
two. FACETS (Linacre, 2026) reports rater separation/reliability on
the Rasch metric instead of an ICC; mfrmr surfaces both, with the
Rasch-metric version in diagnostics$reliability and this
variance-share view here.
Note: Koo & Li (2016) recommend applying the reliability bands to
the 95% confidence interval of the ICC rather than to the point
estimate alone. Set ci_method = "profile" (default "none") to
obtain likelihood-profile CI bounds alongside the point estimate,
or ci_method = "boot" for a parametric bootstrap with
ci_boot_reps replicates. The returned data frame gains
ICC_CI_Lower / ICC_CI_Upper columns so downstream reporting can
apply the band to the CI rather than the point estimate. The
Interpretation column still uses the point estimate so
callers who want CI-aware banding can implement it externally from
the supplied bounds.
Fit the MFRM model with fit_mfrm() for the Rasch-metric
separation/reliability.
Call compute_facet_icc(data, facets, score, person) to get the
complementary variance-share summary.
Feed into compute_facet_design_effect() to convert ICCs and
average cluster sizes into Kish (1965) design effects.
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48.
compute_facet_design_effect(),
analyze_hierarchical_structure(), detect_facet_nesting(),
facet_small_sample_audit().
toy <- load_mfrmr_data("example_core") if (requireNamespace("lme4", quietly = TRUE)) { icc <- compute_facet_icc(toy, facets = c("Rater", "Criterion"), score = "Score", person = "Person") print(icc) # Look for: # - Person ICC reads as Koo & Li (2016) reliability: < 0.5 poor, # 0.5-0.75 moderate, 0.75-0.9 good, > 0.9 excellent. # - Rater / Criterion ICC reads as variance share, NOT reliability; # here SMALL values are desirable (raters / items agree), and # shares > 0.10 hint at meaningful systematic facet differences. # - `Interpretation` summarises the variance-share band the helper # has assigned to each row. }toy <- load_mfrmr_data("example_core") if (requireNamespace("lme4", quietly = TRUE)) { icc <- compute_facet_icc(toy, facets = c("Rater", "Criterion"), score = "Score", person = "Person") print(icc) # Look for: # - Person ICC reads as Koo & Li (2016) reliability: < 0.5 poor, # 0.5-0.75 moderate, 0.75-0.9 good, > 0.9 excellent. # - Rater / Criterion ICC reads as variance share, NOT reliability; # here SMALL values are desirable (raters / items agree), and # shares > 0.10 hint at meaningful systematic facet differences. # - `Interpretation` summarises the variance-share band the helper # has assigned to each row. }
Calculates design-weighted score-variance curves across the latent
trait (theta) for a fitted ordered-category many-facet Rasch model. Returns both
an overall precision curve ($tif) and per-facet-level contribution
curves ($iif) based on the realized observation pattern.
compute_information(fit, theta_range = c(-6, 6), theta_points = 201L)compute_information(fit, theta_range = c(-6, 6), theta_points = 201L)
fit |
Output from |
theta_range |
Numeric vector of length 2 giving the range of theta
values. Default |
theta_points |
Integer number of points at which to evaluate
information. Default |
For a polytomous Rasch model with K+1 categories, the score variance at theta for one observed design cell is:
where is the category probability and is the
expected score at theta. In mfrmr, these cell-level variances are then
aggregated with weights taken from the realized observation counts in
fit$prep$data.
The resulting total curve is therefore a design-weighted precision screen
rather than a pure textbook test-information function for an abstract fixed
item set. The associated standard error summary is still
for positive information values.
In an ordered Rasch-family model, category discrimination is fixed at 1, so
this score-variance representation is the natural conditional information
identity rather than a separate approximation. For binary data it reduces to
the familiar form. For PCM, the package
evaluates each observed design cell using the threshold vector associated
with that cell's realized step_facet level. For bounded GPCM, the
same design-weighted score variance is scaled by the squared discrimination
attached to the realized slope_facet level, which is the
item-information identity that
Muraki (1993, Equation 10) derives by applying Samejima's (1974)
polytomous information formula to the GPCM kernel of Muraki (1992).
An object of class mfrm_information (named list) with:
tif: tibble with columns Theta, Information, SE. The
Information column stores the design-weighted precision value.
iif: tibble with columns Theta, Facet, Level, Information,
and Exposure. Here too, Information stores a design-weighted
contribution value retained under that column name for compatibility.
theta_range: the evaluated theta range.
tif and iif mean hereIn mfrmr, this helper supports ordered-category RSM, PCM, and the
current bounded GPCM fit. The total curve ($tif) is the sum of
design-weighted cell contributions across all non-person facet levels in the
fitted model. The facet-level contribution curves ($iif) keep those
weighted contributions separated, so you can see which observed rater
levels, criteria, or other facet levels are driving precision at different
parts of the scale. For PCM, step-facet-specific thresholds are respected
when each observed design cell is evaluated. For bounded GPCM, those
same cell-level variances are additionally scaled by the squared
discrimination associated with the realized slope_facet level.
It is not a textbook many-facet test-information function for an abstract fixed item set.
It should not be used as if it were design-free evidence about a form's precision independent of the realized observation pattern.
It does not currently extend beyond the ordered-category RSM / PCM /
bounded GPCM family implemented by fit_mfrm().
Use compute_information() when you want a design-weighted precision screen
for an RSM, PCM, or bounded GPCM fit along the latent
continuum. In practice:
start with the total precision curve for overall targeting across the realized observation pattern
inspect facet-level contribution curves when you want to see which raters, criteria, or other facet levels account for more of that design-weighted precision
widen theta_range if you expect extreme measures and want to inspect the
tails explicitly
The defaults (theta_range = c(-6, 6), theta_points = 201) work well for
routine inspection. Expand the range if person or facet measures extend into
the tails, and increase theta_points only when you need a smoother grid
for reporting or custom graphics.
The ordered-category probability structures come from Andrich's RSM
formulation and Masters' PCM. The bounded GPCM information identity
is derived in Muraki
(1993, Equation 10) by applying Samejima's (1974) general polytomous
information formula to the GPCM probability
kernel of Muraki (1992). For the integer scoring function
used by mfrmr, this reduces to
. In mfrmr, those formulas
are applied to the realized many-facet observation design, so the output
should be read as a design-weighted precision summary rather than as a
design-free abstract test function.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
Muraki, E. (1992). A generalized partial credit model: Application
of an EM algorithm. Applied Psychological Measurement, 16(2),
159-176. doi:10.1177/014662169201600206 (See Equations 6, 10, and
13 for the probability kernel and the
derivative used by all GPCM helpers in mfrmr.)
Muraki, E. (1993). Information functions of the generalized
partial credit model. Applied Psychological Measurement, 17(4),
351-363. doi:10.1177/014662169301700402 (Equation 10 derives the
item information function for the GPCM,
, by
applying Samejima's (1974) polytomous information formula to the
GPCM kernel; this is the canonical reference for compute_information()
under bounded GPCM.)
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39, 111-121. (Source for the general polytomous information formula that Muraki 1993 specializes to the GPCM.)
$tif: design-weighted precision curve data with theta, Information, and SE.
$iif: design-weighted facet-level contribution curves for the fitted
non-person facets.
Higher information implies more precise measurement at that theta.
SE is inversely related to information.
Peaks in the total curve show the trait region where the realized calibration is most informative.
Facet-level curves help explain which observed facet levels contribute to those peaks; they are not standalone item-information curves and should be read as design contributions.
Theta: point on the latent continuum where the curve is evaluated.
Information: design-weighted precision value at that theta.
SE: approximate 1 / sqrt(Information) summary for positive values.
Exposure: total realized observation weight contributing to a facet-level
curve in $iif.
Compare the precision peak with person/facet locations from a Wright map or
related diagnostics. If you need to decide how strongly SE/CI language can
be used in reporting, follow with precision_audit_report().
Fit a model with fit_mfrm().
Run compute_information(fit).
Plot with plot_information(info, type = "tif").
If needed, inspect facet contributions with
plot_information(info, type = "iif", facet = "Rater").
fit_mfrm(), plot_information()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) info <- compute_information(fit) head(info$tif) info$tif$Theta[which.max(info$tif$Information)]toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) info <- compute_information(fit) head(info$tif) info$tif$Theta[which.max(info$tif$Information)]
Computes person-level fit statistics for an MFRM bundle, extending
the Infit / Outfit / ZSTD columns that diagnose_mfrm()$measures
already exposes with the standardized log-likelihood lz, a
Snijders-style lz_star for JML fits, and an explicitly named
finite-N heuristic lz_finite_n.
compute_person_fit_indices(diagnostics, fit = NULL)compute_person_fit_indices(diagnostics, fit = NULL)
diagnostics |
Output from |
fit |
Optional |
A data frame with one row per Person and columns:
PersonPerson ID.
NNumber of contributing response opportunities.
LogLikSum of log P(X = x | theta) under the fitted
model. Computed from the per-observation category probability
PrObserved (the model probability of the observed category),
not from a Gaussian residual approximation.
lzDrasgow et al. (1985) standardized log-likelihood, in its proper polytomous form.
lz_starSnijders-style score-projection corrected statistic,
computed for JML fits by projecting the log-likelihood weights away
from the person-score estimating equation. For MML/EAP fits this
column is NA because EAP posterior means do not satisfy the ML
person-score equation used by the correction.
lz_finite_nFinite-N heuristic retained for continuity:
. This is not the published Snijders
statistic.
lz_star_methodAudit label describing whether lz_star
was computed and why it may be unavailable.
Under the conditional-independence assumption of the MFRM, lz is
asymptotically standard normal. Practical reporting thresholds:
|lz| > 1.96 flags a person at the 5% level; |lz| > 2.58 at the
1% level. lz_star should be read as a conditional person-fit
statistic: it corrects for estimated JML person measures but still
treats the fitted non-person parameters as fixed.
Note: this implementation reads the model category probabilities
directly from the diagnostics bundle. Earlier mfrmr releases used
a Gaussian-residual approximation
as a stand-in for , which overstated the per-item
variance of for polytomous items, shrinking the
reported lz toward zero. Numerical lz values are therefore
not directly comparable across mfrmr releases; treat the values
returned here as the polytomous statistic and re-evaluate any
historical |lz| > 1.96 flagging that was based on the earlier
approximation.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86.
Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331-342.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "legacy") pf <- compute_person_fit_indices(diag, fit = fit) head(pf) # Look for: |lz| > 1.96 (5% level) flags a person whose response # pattern is statistically inconsistent with the model; > 2.58 is # a 1% flag. `lz_star_method` tells you whether the Snijders-style # correction was computed or left unavailable for the active estimator.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "legacy") pf <- compute_person_fit_indices(diag, fit = fit) head(pf) # Look for: |lz| > 1.96 (5% level) flags a person whose response # pattern is statistically inconsistent with the model; > 2.58 is # a 1% flag. `lz_star_method` tells you whether the Snijders-style # correction was computed or left unavailable for the active estimator.
Build a data quality summary report (preferred alias)
data_quality_report( fit, data = NULL, person = NULL, facets = NULL, score = NULL, weight = NULL, include_fixed = FALSE )data_quality_report( fit, data = NULL, person = NULL, facets = NULL, score = NULL, weight = NULL, include_fixed = FALSE )
fit |
Output from |
data |
Optional raw data frame used for row-level audit. When omitted,
the report uses the preprocessing row audit stored in |
person |
Optional person column name in |
facets |
Optional facet column names in |
score |
Optional score column name in |
weight |
Optional weight column name in |
include_fixed |
If |
summary(out) is supported through summary().
plot(out) is dispatched through plot() for class
mfrm_data_quality (type = "row_audit", "category_counts",
"missing_rows").
A named list with data-quality report components. Class:
mfrm_data_quality.
summary: retained/dropped row overview.
row_audit: reason-level breakdown for data issues.
category_counts: post-filter category usage.
unknown_elements: facet levels in raw data but not in fitted design.
Run data_quality_report(...) with raw data.
Check row-audit and missing/unknown element sections.
Resolve issues before final estimation/reporting.
fit_mfrm(), describe_mfrm_data(), specifications_report(),
mfrmr_reports_and_tables, mfrmr_compatibility_layer
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- data_quality_report( fit, data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score" ) summary(out) p_dq <- plot(out, draw = FALSE) p_dq$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- data_quality_report( fit, data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score" ) summary(out) p_dq <- plot(out, draw = FALSE) p_dq$data$plot
Summarize MFRM input data (TAM-style descriptive snapshot)
describe_mfrm_data( data, person, facets, score, weight = NULL, rating_min = NULL, rating_max = NULL, keep_original = FALSE, missing_codes = NULL, include_person_facet = FALSE, include_agreement = TRUE, rater_facet = NULL, context_facets = NULL, agreement_top_n = NULL )describe_mfrm_data( data, person, facets, score, weight = NULL, rating_min = NULL, rating_max = NULL, keep_original = FALSE, missing_codes = NULL, include_person_facet = FALSE, include_agreement = TRUE, rater_facet = NULL, context_facets = NULL, agreement_top_n = NULL )
data |
A data.frame in long format (one row per rating event). |
person |
Column name for person IDs. |
facets |
Character vector of facet column names. |
score |
Column name for observed score. |
weight |
Optional weight/frequency column name. |
rating_min |
Optional minimum category value. Supply with
|
rating_max |
Optional maximum category value. Supply with
|
keep_original |
Keep original category values. Use this with
|
missing_codes |
Optional. |
include_person_facet |
If |
include_agreement |
If |
rater_facet |
Optional rater facet name used for agreement summaries.
If |
context_facets |
Optional facets used to define matched contexts for
agreement. If |
agreement_top_n |
Optional maximum number of agreement pair rows. |
This function provides a compact descriptive bundle similar to the
pre-fit summaries commonly checked in TAM workflows:
sample size, score distribution, per-facet coverage, and linkage counts.
psych::describe() is used for numeric descriptives of score and weight.
Key data-quality checks to perform before fitting:
Sparse categories: any score category with fewer than 10 weighted observations may produce unstable threshold estimates (Linacre, 2002). Consider collapsing adjacent categories.
Unlinked elements: if a facet level has zero overlap with one or
more levels of another facet, the design is disconnected and
parameters cannot be placed on a common scale. Check
linkage_summary for low connectivity.
Extreme scores: persons or facet levels with all-minimum or all-maximum scores yield infinite logit estimates under JML; they are handled via Bayesian shrinkage under MML.
A list of class mfrm_data_description with:
overview: one-row run-level summary
missing_by_column: missing counts in selected input columns
missing_rate_summary: per-column missingness rate summary
(one row per input column, with raw and proportion-of-N columns)
score_descriptives: output from psych::describe() for score
weight_descriptives: output from psych::describe() for weight
score_distribution: weighted and raw score frequencies over the prepared
score support. Unused boundary categories are retained when the rating
range was supplied explicitly; unused intermediate categories require
keep_original = TRUE.
facet_level_summary: per-level usage and score summaries
facet_crosstabs: pairwise observation-count crosstabs between
non-person facets (named list keyed "facetA__facetB"); used by
summary(ds)$design_links to flag sparse / disconnected
facet-pair coverage
linkage_summary: person-facet connectivity diagnostics
agreement: observed-score inter-rater agreement bundle
score_support: minimal prepared score-support metadata used by
summary(ds)$caveats
Recommended order:
overview: confirms sample size, facet count, and category span.
The MinWeightedN column shows the smallest weighted observation
count across all facet levels; values below 30 may lead to
unstable parameter estimates.
missing_by_column: identifies immediate data-quality risks.
Any non-zero count warrants investigation before fitting.
score_distribution: checks sparse/unused score categories.
Balanced usage across categories is ideal; heavily skewed
distributions may compress the measurement range.
facet_level_summary and linkage_summary: checks per-level
support and person-facet connectivity. Low linkage ratios
indicate sparse or disconnected design blocks.
agreement: optional observed inter-rater consistency summary
(exact agreement, correlation, mean differences per rater pair).
Run describe_mfrm_data() on long-format input.
Review summary(ds) and plot(ds, ...).
Resolve missingness/sparsity issues before fit_mfrm().
fit_mfrm(), audit_mfrm_anchors()
toy <- load_mfrmr_data("example_core") ds <- describe_mfrm_data( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score" ) s_ds <- summary(ds) s_ds$overview p_ds <- plot(ds, draw = FALSE) p_ds$data$plottoy <- load_mfrmr_data("example_core") ds <- describe_mfrm_data( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score" ) s_ds <- summary(ds) s_ds$overview p_ds <- plot(ds, draw = FALSE) p_ds$data$plot
Compares facet estimates across two or more calibration waves to identify elements whose difficulty/severity has shifted beyond acceptable thresholds. Useful for monitoring rater drift over time or checking the stability of item banks.
detect_anchor_drift( fits, facets = NULL, drift_threshold = 0.5, flag_se_ratio = 2, reference = 1L, include_person = FALSE ) ## S3 method for class 'mfrm_anchor_drift' print(x, ...) ## S3 method for class 'mfrm_anchor_drift' summary(object, ...) ## S3 method for class 'summary.mfrm_anchor_drift' print(x, ...)detect_anchor_drift( fits, facets = NULL, drift_threshold = 0.5, flag_se_ratio = 2, reference = 1L, include_person = FALSE ) ## S3 method for class 'mfrm_anchor_drift' print(x, ...) ## S3 method for class 'mfrm_anchor_drift' summary(object, ...) ## S3 method for class 'summary.mfrm_anchor_drift' print(x, ...)
fits |
Named list of |
facets |
Character vector of facets to compare (default: all non-Person facets). |
drift_threshold |
Absolute drift threshold for flagging (logits, default 0.5). |
flag_se_ratio |
Drift/SE ratio threshold for flagging (default 2.0). |
reference |
Index or name of the reference fit (default: first). |
include_person |
Include person estimates in comparison. |
x |
An |
... |
Ignored. |
object |
An |
For each non-reference wave, the function extracts facet-level estimates
using make_anchor_table() and computes the element-by-element difference
against the reference wave. Standard errors are obtained from
diagnose_mfrm() applied to each fit. Only elements common to both the
reference and a comparison wave are included. Before reporting drift, the
function removes the weighted common-element link offset between the two
waves so that Drift represents residual instability rather than the
overall shift between calibrations. The function also records how many
common elements survive the screening step within each linking facet and
treats fewer than 5 retained common elements per facet as thin support.
An element is flagged when either condition is met:
The dual-criterion approach guards against flagging elements with large but imprecise estimates, and against missing small but precisely estimated shifts.
When facets is NULL, all non-Person facets are compared. Providing a
subset (e.g., facets = "Criterion") restricts comparison to those facets
only.
Object of class mfrm_anchor_drift with components:
Tibble of element-level drift statistics.
Drift summary aggregated by facet and wave.
Tibble of pairwise common-element counts.
Tibble of common-element counts between each wave and the reference wave (i.e., which elements remain comparable across the entire chain).
Integer count of elements that are
common across every wave; used by summary() to gauge how
robust the chain is to chained linking error.
Tibble of retained common-element counts by facet.
List of analysis configuration.
Use anchor_to_baseline() when your starting point is raw new data plus a
single baseline fit.
Use detect_anchor_drift() when you already have multiple fitted waves
and want a reference-versus-wave comparison.
Use build_equating_chain() when the waves form a sequence and you need
cumulative linking offsets.
$drift_table: one row per element x wave combination, with columns
Facet, Level, Wave, Ref_Est, Wave_Est, LinkOffset, Drift,
SE_Ref, SE_Wave, SE, Drift_SE_Ratio, LinkSupportAdequate, and
Flag. Large drift signals instability after alignment to the
common-element link.
$summary: aggregated statistics by facet and wave: number of elements,
mean/max absolute drift, and count of flagged elements.
$common_elements: pairwise common-element counts in tidy table form.
Small
overlap weakens the comparison and results should be interpreted
cautiously.
$common_by_facet: retained common-element counts by linking facet for
each reference-vs-wave comparison. LinkSupportAdequate = FALSE means the
link rests on fewer than 5 retained common elements in at least one facet.
$config: records the analysis parameters for reproducibility.
A practical reading order is summary(drift) first, then
drift$drift_table, then drift$common_by_facet if overlap looks thin.
Fit separate models for each administration wave.
Combine into a named list: fits <- list(Spring = fit_s, Fall = fit_f).
Call drift <- detect_anchor_drift(fits).
Review summary(drift) and plot_anchor_drift(drift).
Flagged elements may need to be removed from anchor sets or investigated for substantive causes (e.g., rater re-training).
anchor_to_baseline(), build_equating_chain(),
make_anchor_table(), plot_anchor_drift(), mfrmr_linking_and_dff
d1 <- load_mfrmr_data("study1") d2 <- load_mfrmr_data("study2") fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) drift <- detect_anchor_drift(list(Wave1 = fit1, Wave2 = fit2)) summary(drift) head(drift$drift_table[, c("Facet", "Level", "Wave", "Drift", "Flag")]) drift$common_elementsd1 <- load_mfrmr_data("study1") d2 <- load_mfrmr_data("study2") fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) drift <- detect_anchor_drift(list(Wave1 = fit1, Wave2 = fit2)) summary(drift) head(drift$drift_table[, c("Facet", "Level", "Wave", "Drift", "Flag")]) drift$common_elements
Classifies every ordered pair of facets (optionally including Person)
as crossed, partially nested, near-perfectly nested, or fully nested,
based on a conditional-entropy index:
An index near 1 means that knowing the level of A essentially
determines the level of B (A is nested in B).
detect_facet_nesting(data, facets, person = NULL, weight_col = NULL)detect_facet_nesting(data, facets, person = NULL, weight_col = NULL)
data |
Data frame in long format (one row per rating). |
facets |
Character vector of facet column names. |
person |
Optional name of the person column (adds Person to the nesting matrix if supplied). |
weight_col |
Optional name of a weight column; if supplied, rows are replicated proportionally when counting element co-occurrences. |
This is a pure descriptive audit of the observed design. It does not affect estimation; fit_mfrm() continues to treat all facets as fixed effects.
A list of class mfrm_facet_nesting with:
pairwise_table: one row per ordered facet pair with
NestingIndex_AinB, NestingIndex_BinA, classification strings,
and Direction.
summary: a one-line summary table with facet counts and whether
any non-crossed structure was detected.
facets: the facet vector that was audited.
"Fully nested": nesting index >= 0.99.
"Near-perfectly nested": 0.95 <= index < 0.99.
"Partially nested": 0.50 <= index < 0.95.
"Crossed": index < 0.50.
The direction column records which facet is nested in which, or
"crossed" when neither direction is above 0.95.
A Direction value of "Rater nested in Region" means that every
rater appears in exactly one region (or very close to it). For
additive fixed-effects MFRM, this is a concern: the severity of a
rater is confounded with region-level variance that the model cannot
partition. Consider reporting the nesting direction explicitly and,
when relevant, refitting without the nested facet or moving to a
hierarchical estimation tool (e.g. lme4::lmer, brms, TAM) to
separate the variance components.
Direction = "crossed" is the most common reading when both nesting
indices are below 0.5; the two facets largely co-occur at multiple
combinations, which is the setting Linacre (1989) assumed.
Call detect_facet_nesting(data, facets) before fitting.
If any pair is flagged as nested or partially nested, review the
numeric index and the LevelsA/LevelsB counts.
For downstream reporting, use analyze_hierarchical_structure()
to bundle this output with ICC and design-effect summaries, which
build_mfrm_manifest() then records for reproducibility.
McEwen, M. R. (2018). The effects of incomplete rating designs on results from many-facets-Rasch model analyses (Doctoral thesis, Brigham Young University). https://scholarsarchive.byu.edu/etd/6689/
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
facet_small_sample_audit(),
analyze_hierarchical_structure(), compute_facet_icc(),
compute_facet_design_effect(), fit_mfrm() (see "Fixed effects
assumption" in its details).
toy <- load_mfrmr_data("example_core") nesting <- detect_facet_nesting(toy, c("Rater", "Criterion")) summary(nesting) # Synthetic example: raters fully nested within regions. d <- data.frame( Person = rep(paste0("P", formatC(1:20, width = 2, flag = "0")), each = 6), Rater = rep(paste0("R", 1:6), 20), Region = rep(rep(c("A", "A", "B", "B", "C", "C"), 20)), Score = sample(0:4, 120, replace = TRUE), stringsAsFactors = FALSE ) nest <- detect_facet_nesting(d, c("Rater", "Region")) nest$pairwise_table[, c("FacetA", "FacetB", "NestingIndex_AinB", "Direction")]toy <- load_mfrmr_data("example_core") nesting <- detect_facet_nesting(toy, c("Rater", "Criterion")) summary(nesting) # Synthetic example: raters fully nested within regions. d <- data.frame( Person = rep(paste0("P", formatC(1:20, width = 2, flag = "0")), each = 6), Rater = rep(paste0("R", 1:6), 20), Region = rep(rep(c("A", "A", "B", "B", "C", "C"), 20)), Score = sample(0:4, 120, replace = TRUE), stringsAsFactors = FALSE ) nest <- detect_facet_nesting(d, c("Rater", "Region")) nest$pairwise_table[, c("FacetA", "FacetB", "NestingIndex_AinB", "Direction")]
mfrm_fit objectCompute diagnostics for an mfrm_fit object
diagnose_mfrm( fit, interaction_pairs = NULL, top_n_interactions = 20, whexact = FALSE, diagnostic_mode = c("both", "legacy", "marginal_fit"), residual_pca = c("none", "overall", "facet", "both"), pca_max_factors = 10L )diagnose_mfrm( fit, interaction_pairs = NULL, top_n_interactions = 20, whexact = FALSE, diagnostic_mode = c("both", "legacy", "marginal_fit"), residual_pca = c("none", "overall", "facet", "both"), pca_max_factors = 10L )
fit |
Output from |
interaction_pairs |
Optional list of facet pairs. |
top_n_interactions |
Number of top interactions. |
whexact |
Logical controlling the ZSTD standardisation of
mean-square fit statistics. |
diagnostic_mode |
Diagnostic basis to compute: |
residual_pca |
Residual PCA mode: |
pca_max_factors |
Maximum number of PCA factors to retain per matrix. |
This function computes a diagnostic bundle used by downstream reporting. It calculates element-level fit statistics, approximate facet separation/reliability summaries, residual-based QC diagnostics, and optionally residual PCA for exploratory residual-structure screening.
diagnostic_mode keeps the legacy residual fit path explicit rather than
silently replacing it. The legacy path is a compatibility-oriented
residual/EAP stack, whereas the strict marginal path targets
latent-integrated first-order category counts. When diagnostic_mode = "both", the output includes a diagnostic_basis guide so downstream
tables and summaries can distinguish these targets.
Choosing diagnostic_mode:
"legacy": use when continuity with historical residual-based workflows is
the priority.
"marginal_fit": use when you want the strict latent-integrated screen
without the extra legacy bundle.
"both": recommended when you want continuity with the legacy residual
stack while making the strict marginal path explicit for RSM, PCM,
and bounded GPCM fits.
For bounded GPCM, the same generalized partial credit kernel now
drives both the residual/probability tables and the strict marginal
category-fit companion. Residual-based MnSq summaries should still be read
as exploratory screening tools rather than strict Rasch-style invariance
tests because discrimination is free, and the strict marginal companion
should likewise be treated as a slope-aware screen rather than a finalized
inferential test family.
Key fit statistics computed for each element:
Infit MnSq: information-weighted mean-square residual; sensitive to on-target misfitting patterns. Expected value = 1.0.
Outfit MnSq: unweighted mean-square residual; sensitive to off-target outliers. Expected value = 1.0.
ZSTD: Wilson-Hilferty cube-root transformation of MnSq to an approximate standard normal deviate.
PTMEA: point-measure correlation (item-rest correlation in MFRM context); positive values confirm alignment with the latent trait.
Misfit flagging guidelines (Bond & Fox, 2015):
MnSq < 0.5: overfit (too predictable; may inflate reliability)
MnSq 0.5–1.5: productive for measurement
MnSq > 1.5: underfit (noise degrades measurement)
: statistically significant misfit (5\
When Infit and Outfit disagree, Infit is generally more informative because it downweights extreme observations. Large Outfit with acceptable Infit typically indicates a few outlying responses rather than systematic misfit.
interaction_pairs controls which facet interactions are summarized.
Each element can be:
a length-2 character vector such as c("Rater", "Criterion"), or
omitted (NULL) to let the function select top interactions automatically.
Residual PCA behavior:
"none": skip PCA (fastest; recommended for initial exploration)
"overall": compute overall residual PCA across all facets
"facet": compute facet-specific residual PCA for each facet
"both": compute both overall and facet-specific PCA
Overall PCA examines the person combined-facet residual
matrix; facet-specific PCA examines person facet-level
matrices. These summaries are exploratory screens for residual
structure, not standalone proofs for or against unidimensionality.
Facet-specific PCA can help localise where a stronger residual signal
is concentrated.
An object of class mfrm_diagnostics including:
obs: observed/expected/residual-level table
measures: facet/person fit table (Infit, Outfit, ZSTD, PTMEA)
overall_fit: overall fit summary
fit: element-level fit diagnostics
reliability: facet-level model/real separation and reliability
precision_profile: one-row summary of the active precision tier and its
recommended use
precision_audit: package-native checks for SE, CI, and reliability
facet_precision: facet-level precision summary by distribution basis and
SE mode
facets_chisq: fixed/random facet variability summary
interactions: top interaction diagnostics
interrater: inter-rater agreement bundle (summary, pairs) including
agreement and rater-severity spread indices
unexpected: unexpected-response bundle
fair_average: adjusted-score reference bundle (placeholder only for
bounded GPCM)
displacement: displacement diagnostics bundle
approximation_notes: method notes for SE/CI/reliability summaries
diagnostic_basis: guide to the statistical target of each diagnostic path
marginal_fit: optional strict marginal-fit companion based on
posterior-expected first-order category counts
residual_pca_overall: optional overall PCA object
residual_pca_by_facet: optional facet PCA objects
Practical interpretation often starts with:
overall_fit: global infit/outfit and degrees of freedom.
reliability: facet-level model/real separation and reliability. MML
uses model-based ModelSE values where available; JML keeps these
quantities as exploratory approximations.
fit: element-level misfit scan (Infit, Outfit, ZSTD).
unexpected, fair_average, displacement: targeted QC bundles.
For bounded GPCM, fair_average uses a slope-aware expected-score
construction and carries a caveat; treat it as a GPCM-specific screening
view rather than Rasch-family fair-M invariance evidence.
approximation_notes: method notes for SE/CI/reliability summaries.
Start with overall_fit and reliability, then move to element-level
diagnostics (fit) and targeted bundles (unexpected, displacement,
interrater, facets_chisq, fair_average). For bounded GPCM,
interpret fair_average with the caveat stored on that component.
Consistent signals across multiple components are typically more robust than a single isolated warning. For example, an element flagged for both high Outfit and high displacement is more concerning than one flagged on a single criterion.
SE is kept as a compatibility alias for ModelSE. RealSE is a
fit-adjusted companion defined as ModelSE * sqrt(max(Infit, 1)).
Reliability tables report model and fit-adjusted bounds from observed
variance, error variance, and true variance; JML entries should still be
treated as exploratory. Separation, strata, and reliability follow the
Wright & Masters (1982) conventions:
,
, and .
Start with diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none").
Inspect summary(diag) and use diagnostic_basis to separate legacy residual evidence from strict marginal evidence.
If needed, rerun with residual PCA ("overall" or "both").
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis.
MESA Press. (G/R/H separation, reliability, and strata
formulas summarized in s_diag$reliability follow this
convention.)
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370. (Source for the broad 0.5-1.5 Infit / Outfit screening band used as the package default; narrower reporting bands are also used in applied literature.)
Linacre, J. M. (1989). Many-Facet Rasch Measurement. MESA
Press. (FACETS Tables 6 + 7 correspond to the per-facet
element measures, fit, and chi-square heterogeneity screen
exposed via s_diag$reliability and s_diag$facets_chisq.)
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). Routledge. (Reference text for the Rasch-family fit conventions exposed by this helper.)
Linacre, J. M. (2002). What do Infit and Outfit, Mean-square and Standardized mean? Rasch Measurement Transactions, 16(2), 878.
fit_mfrm(), analyze_residual_pca(), build_visual_summaries(),
mfrmr_visual_diagnostics, mfrmr_reporting_and_apa
# Fast smoke run: legacy-only diagnostic mode is enough to confirm # the bundle has the expected slots. ~1 s on example_core. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag_quick <- diagnose_mfrm(fit_quick, diagnostic_mode = "legacy", residual_pca = "none") summary(diag_quick)$overview[, c("Observations", "Facets", "Categories")] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none") s_diag <- summary(diag) s_diag$overview[, c("Observations", "Facets", "Categories")] s_diag$diagnostic_basis[, c("DiagnosticPath", "Status", "Basis")] s_diag$key_warnings # Look for: "No immediate warnings ..." in `key_warnings` is the # "all clear" signal. Lines starting with "MnSq misfit:" name the # element + Infit / Outfit values that fell outside the # active MnSq screening band; review those first. The default band is # 0.5-1.5, but published and operational misfit bands vary. s_diag$facets_chisq # Look for: `FixedProb` < 0.05 means that facet's elements differ # reliably under the fixed-effect "all elements equal" null. A # facet with a non-significant chi-square contributes little # spread to the test scale. s_diag$interrater # Look for: ExactAgreement >= ExpectedExactAgreement and # AgreementMinusExpected >= 0 indicate raters agree at least as # often as the model expects. Negative values warrant a closer # look at `diag$interrater$pairs`. p_qc <- plot_qc_dashboard(fit, diagnostics = diag, draw = FALSE) p_qc$data$plot # Optional: include residual PCA in the diagnostic bundle diag_pca <- diagnose_mfrm(fit, residual_pca = "overall") pca <- analyze_residual_pca(diag_pca, mode = "overall") head(pca$overall_table) # Reporting route: prec <- precision_audit_report(fit, diagnostics = diag) summary(prec)# Fast smoke run: legacy-only diagnostic mode is enough to confirm # the bundle has the expected slots. ~1 s on example_core. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag_quick <- diagnose_mfrm(fit_quick, diagnostic_mode = "legacy", residual_pca = "none") summary(diag_quick)$overview[, c("Observations", "Facets", "Categories")] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none") s_diag <- summary(diag) s_diag$overview[, c("Observations", "Facets", "Categories")] s_diag$diagnostic_basis[, c("DiagnosticPath", "Status", "Basis")] s_diag$key_warnings # Look for: "No immediate warnings ..." in `key_warnings` is the # "all clear" signal. Lines starting with "MnSq misfit:" name the # element + Infit / Outfit values that fell outside the # active MnSq screening band; review those first. The default band is # 0.5-1.5, but published and operational misfit bands vary. s_diag$facets_chisq # Look for: `FixedProb` < 0.05 means that facet's elements differ # reliably under the fixed-effect "all elements equal" null. A # facet with a non-significant chi-square contributes little # spread to the test scale. s_diag$interrater # Look for: ExactAgreement >= ExpectedExactAgreement and # AgreementMinusExpected >= 0 indicate raters agree at least as # often as the model expects. Negative values warrant a closer # look at `diag$interrater$pairs`. p_qc <- plot_qc_dashboard(fit, diagnostics = diag, draw = FALSE) p_qc$data$plot # Optional: include residual PCA in the diagnostic bundle diag_pca <- diagnose_mfrm(fit, residual_pca = "overall") pca <- analyze_residual_pca(diag_pca, mode = "overall") head(pca$overall_table) # Reporting route: prec <- precision_audit_report(fit, diagnostics = diag) summary(prec)
Produces a cell-level interaction table showing Obs-Exp differences, standardized residuals, and screening statistics for each facet-level x group-value cell.
dif_interaction_table( fit, diagnostics, facet, group, data = NULL, min_obs = 10, p_adjust = "holm", abs_t_warn = 2, abs_bias_warn = 0.5 )dif_interaction_table( fit, diagnostics, facet, group, data = NULL, min_obs = 10, p_adjust = "holm", abs_t_warn = 2, abs_bias_warn = 0.5 )
fit |
Output from |
diagnostics |
Output from |
facet |
Character scalar naming the facet. |
group |
Character scalar naming the grouping column. |
data |
Optional data frame with the group column. If |
min_obs |
Minimum observations per cell. Cells with fewer than
this many observations are flagged as sparse and their test
statistics set to |
p_adjust |
P-value adjustment method, passed to
|
abs_t_warn |
Threshold for flagging cells by absolute t-value.
Default |
abs_bias_warn |
Threshold for flagging cells by absolute
Obs-Exp average (in logits). Default |
This function uses the fitted model's observation-level residuals
(from the internal compute_obs_table() function) rather than
re-estimating the model. For each facet-level x group-value cell,
it computes:
N: number of observations in the cell
ObsScore: sum of observed scores
ExpScore: sum of expected scores
ObsExpAvg: mean observed-minus-expected difference
Var_sum: sum of model variances
StdResidual: (ObsScore - ExpScore) / sqrt(Var_sum)
t: approximate t-statistic (equal to StdResidual)
df: N - 1
p_value: two-tailed p-value from the t-distribution
Object of class mfrm_dif_interaction with:
table: tibble with per-cell statistics and flags.
summary: tibble summarizing flagged and sparse cell counts.
config: list of analysis parameters.
Use dif_interaction_table() when you want cell-level screening for a
single facet-by-group table. Use analyze_dff() when you want group-pair
contrasts summarized into differential-functioning effect sizes and
method-appropriate classifications.
For plot selection and follow-up diagnostics, see mfrmr_visual_diagnostics.
$table: the full interaction table with one row per cell.
$summary: overview counts of flagged and sparse cells.
$config: analysis configuration parameters.
Cells with |t| > abs_t_warn or |ObsExpAvg| > abs_bias_warn
are flagged in the flag_t and flag_bias columns.
Sparse cells (N < min_obs) have sparse = TRUE and NA statistics.
Fit a model with fit_mfrm().
Run dif_interaction_table(fit, diag, facet = "Rater", group = "Gender", data = df).
Inspect $table for flagged cells.
Visualize with plot_dif_heatmap().
analyze_dff(), analyze_dif(), plot_dif_heatmap(), dif_report(),
estimate_bias()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") int <- dif_interaction_table(fit, diag, facet = "Rater", group = "Group", data = toy, min_obs = 2) int$summary head(int$table[, c("Level", "GroupValue", "ObsExpAvg", "flag_bias")])toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") int <- dif_interaction_table(fit, diag, facet = "Rater", group = "Group", data = toy, min_obs = 2) int$summary head(int$table[, c("Level", "GroupValue", "ObsExpAvg", "flag_bias")])
Produces APA-style narrative text interpreting the results of a differential-
functioning analysis or interaction table. For method = "refit", the
report summarises the number of facet levels classified as negligible (A),
moderate (B), and large (C). For method = "residual", it summarises
screening-positive results, lists the specific levels and their direction,
and includes a caveat about the distinction between construct-relevant
variation and measurement bias.
dif_report(dif_result, ...)dif_report(dif_result, ...)
dif_result |
Output from |
... |
Currently unused; reserved for future extensions. |
When dif_result is an mfrm_dff/mfrm_dif object, the report is based on
the pairwise differential-functioning contrasts in $dif_table. When it is an
mfrm_dif_interaction object, the report uses the cell-level
statistics and flags from $table.
For method = "refit", ETS-style magnitude labels are used only when
subgroup calibrations were successfully linked back to a common baseline
scale; otherwise the report labels those contrasts as unclassified because
the refit difference is descriptive rather than comparable on a linked
logit scale. For method = "residual", the report describes
screening-positive versus screening-negative contrasts instead of applying
ETS labels.
Object of class mfrm_dif_report with narrative,
counts, large_dif, and config.
$narrative: character scalar with the full narrative text.
$counts: named integer vector of method-appropriate counts.
$large_dif: tibble of large ETS results (method = "refit") or
screening-positive contrasts/cells (method = "residual").
$config: analysis configuration inherited from the input.
Run analyze_dff() / analyze_dif() or dif_interaction_table().
Pass the result to dif_report().
Print the report or extract $narrative for inclusion in a
manuscript.
The narrative caveat about distinguishing construct-relevant variation from unwanted measurement bias is grounded in:
Eckes, T. (2011). Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments. Frankfurt am Main: Peter Lang. ISBN 978-3-631-61350-4.
McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 555–576. doi:10.1177/0265532211430367
analyze_dff(), analyze_dif(), dif_interaction_table(),
plot_dif_heatmap(), build_apa_outputs()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") dif <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy) rpt <- dif_report(dif) cat(rpt$narrative)toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") dif <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy) rpt <- dif_report(dif) cat(rpt$narrative)
Compute displacement diagnostics for facet levels
displacement_table( fit, diagnostics = NULL, facets = NULL, anchored_only = FALSE, abs_displacement_warn = 0.5, abs_t_warn = 2, top_n = NULL )displacement_table( fit, diagnostics = NULL, facets = NULL, anchored_only = FALSE, abs_displacement_warn = 0.5, abs_t_warn = 2, top_n = NULL )
fit |
Output from |
diagnostics |
Optional output from |
facets |
Optional subset of facets. |
anchored_only |
If |
abs_displacement_warn |
Absolute displacement warning threshold. |
abs_t_warn |
Absolute displacement t-value warning threshold. |
top_n |
Optional maximum number of rows to keep after sorting. |
Displacement is computed as a one-step Newton update:
sum(residual) / sum(information) for each facet level.
This approximates how much a level would move if constraints were relaxed.
A named list with:
table: displacement diagnostics by level
summary: one-row summary
thresholds: applied thresholds
table: level-wise displacement and flag indicators.
summary: count/share of flagged levels.
thresholds: displacement and t-value cutoffs.
Large absolute displacement in anchored levels suggests potential instability in anchor assumptions.
Run displacement_table(fit, anchored_only = TRUE) for anchor checks.
Inspect summary(disp) then detailed rows.
Visualize with plot_displacement().
The table data.frame contains:
Facet name and element label.
One-step Newton displacement estimate (logits).
Standard error of the displacement.
Displacement / SE ratio.
Current measure estimate and its standard error.
Number of observations involving this level.
Anchor metadata.
Logical; TRUE when displacement exceeds thresholds.
diagnose_mfrm(), unexpected_response_table(), fair_average_table()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) disp <- displacement_table(fit, anchored_only = FALSE) summary(disp) p_disp <- plot(disp, draw = FALSE) p_disp$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) disp <- displacement_table(fit, anchored_only = FALSE) summary(disp) p_disp <- plot(disp, draw = FALSE) p_disp$data$plot
Synthetic many-facet rating datasets in long format. All datasets include one row per observed rating.
A data.frame with 5 columns:
Study label ("Study1" or "Study2").
Person/respondent identifier.
Rater identifier.
Criterion facet label.
Observed category score.
Available data objects:
mfrmr_example_core
mfrmr_example_bias
ej2021_study1
ej2021_study2
ej2021_combined
ej2021_study1_itercal
ej2021_study2_itercal
ej2021_combined_itercal
Naming convention:
study1 / study2: separate simulation studies
combined: row-bind of study1 and study2
_itercal: iterative-calibration variant
Use load_mfrmr_data() for programmatic selection by key.
| Dataset | Rows | Persons | Raters | Criteria |
| study1 | 1842 | 307 | 18 | 3 |
| study2 | 3287 | 206 | 12 | 9 |
| combined | 5129 | 307 | 18 | 12 |
| study1_itercal | 1842 | 307 | 18 | 3 |
| study2_itercal | 3341 | 206 | 12 | 9 |
| combined_itercal | 5183 | 307 | 18 | 12 |
Score range: 1–4 (four-category rating scale).
Person ability is drawn from N(0, 1). Rater severity effects span
approximately -0.5 to +0.5 logits. Criterion difficulty effects span
approximately -0.3 to +0.3 logits. Scores are generated from the
resulting linear predictor plus Gaussian noise, then discretized into
four categories. The _itercal variants use a second iteration of
calibrated rater severity parameters.
Each dataset is already in long format and can be passed directly to
fit_mfrm() after confirming column-role mapping.
Inspect available datasets with list_mfrmr_data().
Load one dataset using load_mfrmr_data().
Fit and diagnose with fit_mfrm() and diagnose_mfrm().
Simulated for this package with design settings informed by Eckes and Jin (2021). The Eckes & Jin (2021) Method section reports the following design parameters that motivated the synthetic versions shipped here: Study 1 had 307 examinees (149 males, 158 females), 18 raters (4 males, 14 females), and 3 criteria (global impression, task fulfillment, linguistic realization) on a 4-category rating scale (TDN levels rescored 1-4); Study 2 had 206 examinees (66 males, 140 females), 12 raters (1 male, 11 females), and 9 criteria on the same 4-category scale. The packaged datasets reproduce these (examinees, raters, criteria, categories) shapes but use simulated responses, so they are not the real TestDaF data.
Eckes, T., & Jin, K.-Y. (2021). Measuring rater centrality effects in writing assessment: A Bayesian facets modeling approach. Psychological Test and Assessment Modeling, 63(1), 65–94.
data("ej2021_study1", package = "mfrmr") head(ej2021_study1) table(ej2021_study1$Study)data("ej2021_study1", package = "mfrmr") head(ej2021_study1) table(ej2021_study1$Study)
Estimate bias across multiple facet pairs
estimate_all_bias( fit, diagnostics = NULL, pairs = NULL, include_person = FALSE, drop_empty = TRUE, keep_errors = TRUE, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001 )estimate_all_bias( fit, diagnostics = NULL, pairs = NULL, include_person = FALSE, drop_empty = TRUE, keep_errors = TRUE, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001 )
fit |
Output from |
diagnostics |
Optional output from |
pairs |
Optional list of facet specifications. Each element should be a
character vector of length 2 or more, for example
|
include_person |
If |
drop_empty |
If |
keep_errors |
If |
max_abs |
Passed to |
omit_extreme |
Passed to |
max_iter |
Passed to |
tol |
Passed to |
This function orchestrates repeated calls to estimate_bias() across
multiple facet pairs and returns a consolidated bundle.
Bias/interaction in MFRM refers to a systematic departure from
the additive model for a specific combination of facet elements
(e.g., a particular rater is unexpectedly harsh on a particular
criterion). See estimate_bias() for the mathematical formulation.
When pairs = NULL, the function builds all 2-way combinations of
modelled facets automatically. For a model with facets Rater,
Criterion, and Task, this yields RaterCriterion,
RaterTask, and CriterionTask.
The summary table aggregates results across pairs:
Rows: number of interaction cells estimated
Significant: count of cells with
MeanAbsBias: average absolute bias magnitude (logits)
Per-pair failures (e.g., insufficient data for a sparse pair) are
captured in errors rather than stopping the entire batch.
A named list with class mfrm_bias_collection.
The returned object is a bundle-like list with class
mfrm_bias_collection and components such as:
summary: one row per requested interaction
by_pair: named list of successful estimate_bias() outputs
errors: per-pair error log
settings: resolved execution settings
primary: first successful bias bundle, useful for downstream helpers
Fit with fit_mfrm() and diagnose with diagnose_mfrm(). For
RSM / PCM reporting runs, prefer method = "MML" plus
diagnostic_mode = "both" in the diagnostics call.
Run estimate_all_bias() to compute multi-pair interactions.
Pass the resulting by_pair list into reporting_checklist() or
facet_quality_dashboard().
estimate_bias(), reporting_checklist(), facet_quality_dashboard()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") bias_all <- estimate_all_bias(fit, diagnostics = diag) bias_all$summary[, c("Interaction", "Rows", "Significant")]toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") bias_all <- estimate_all_bias(fit, diagnostics = diag) bias_all$summary[, c("Interaction", "Rows", "Significant")]
Estimate legacy-compatible bias/interaction terms iteratively
estimate_bias( fit, diagnostics, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001 )estimate_bias( fit, diagnostics, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, max_abs = 10, omit_extreme = TRUE, max_iter = 4, tol = 0.001 )
fit |
Output from |
diagnostics |
Output from |
facet_a |
First facet name. Provide together with |
facet_b |
Second facet name. See |
interaction_facets |
Character vector of two or more facets to model as
one interaction effect. When supplied, this takes precedence over
|
max_abs |
Bound for absolute bias size. |
omit_extreme |
Omit extreme-only elements. |
max_iter |
Iteration cap. |
tol |
Convergence tolerance. |
Bias (interaction) in MFRM refers to a systematic departure from the additive model: a specific rater-criterion (or higher-order) combination produces scores that are consistently higher or lower than predicted by the main effects alone. For example, Rater A might be unexpectedly harsh on Criterion 2 despite being lenient overall.
Mathematically, the bias term for rater on
criterion modifies the linear predictor:
The function estimates from the residuals of the fitted
(additive) model using iterative recalibration in a legacy-compatible style
(Myford & Wolfe, 2003, 2004):
Each iteration updates expected scores using the current bias
estimates, then re-computes the bias. Convergence is reached when
the maximum absolute change in bias estimates falls below tol.
For two-way mode, use facet_a and facet_b (or interaction_facets
with length 2).
For higher-order mode, provide interaction_facets with length >= 3.
An object of class mfrm_bias with:
table: interaction rows with effect size, SE, screening t/p metadata,
reporting-use flags, and fit columns
summary: compact summary statistics
chi_sq: fixed-effect chi-square style screening summary
facet_a, facet_b: first two analyzed facet names (legacy compatibility)
interaction_facets, interaction_order, interaction_mode: full
interaction metadata
iteration: iteration history/metadata
orientation_audit: facet-orientation sign-consistency audit table
mixed_sign: logical flag indicating whether bias-size signs flip
across facets in a way that complicates direction interpretation
direction_note: one-line interpretive note describing the
dominant bias direction (empty when not applicable)
recommended_action: one-line recommended-action label routing
the user to the appropriate follow-up helper
inference_tier: always "screening" in this release; surfaces
the SE/t/Prob inference tier so downstream reporting helpers can
route correctly (a future delta-method release will introduce a
"primary" tier alongside)
optimization_failures: per-cell record of any inner-loop
optimizer failures encountered while estimating the bias
parameters; empty when every cell converged cleanly
estimate_bias() summarizes interaction departures from the additive MFRM.
It is best read as a targeted screening tool for potentially noteworthy
cells or facet combinations that may merit substantive review.
t and Prob. are screening metrics, not formal inferential quantities.
A flagged interaction cell is not, by itself, proof of rater bias or construct-irrelevant variance.
Non-flagged cells should not be over-read as evidence that interaction effects are absent.
Use summary for global magnitude, then inspect table for cell-level
interaction effects.
Prioritize rows with:
larger |Bias Size| (effect on logit scale; logits is
typically noteworthy, is large)
larger |t| among the screening metrics ( suggests a
screen-positive interaction cell)
smaller Prob. among the screening metrics
A positive Obs-Exp Average means the cell produced higher scores
than the additive model predicts (unexpected leniency); negative
means unexpected harshness.
iteration helps verify whether iterative recalibration stabilized.
If the maximum change on the final iteration is still above tol,
consider increasing max_iter.
Fit and diagnose model.
Run estimate_bias(...) for target interaction facets.
Review summary(bias) and bias$table.
Visualize/report via plot_bias_interaction() and build_fixed_reports().
In bias$table, the most-used columns are:
Bias Size: estimated interaction effect (logit scale)
t and Prob.: screening metrics, not formal inferential quantities
Obs-Exp Average: direction and practical size of observed-vs-expected
gap on the raw-score metric
The chi_sq element provides a fixed-effect heterogeneity screen across all
interaction cells.
Use plot_bias_interaction() to inspect the flagged cells visually, then
integrate the result with DFF, linking, or substantive scoring review before
making formal claims about fairness or invariance.
Linacre, J. M. (1989). Many-Facet Rasch Measurement. MESA Press. (FACETS Table 13 corresponds to the bias / interaction estimation that this helper implements.)
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221.
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd ed.). Peter Lang.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
build_fixed_reports(), build_apa_outputs()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) s_bias <- summary(bias) s_bias$overview # Look for: `MaxAbsBias` < ~0.5 logits and `Significant = 0` mean # no cell exceeded the screen. The `BonferroniSignificant` / # `HolmSignificant` columns count cells that survive multiple- # testing correction; both being 0 is a stronger "no bias" # signal than the raw screen-positive count alone. s_bias$top_rows # Look for: rows with `|t|` > 2 and |Bias Size| > 0.5 logits warrant # review (large effect AND statistically reliable). Rows with only # one of those triggered are usually small-cell artefacts. p_bias <- plot_bias_interaction(bias, draw = FALSE) p_bias$data$plottoy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) s_bias <- summary(bias) s_bias$overview # Look for: `MaxAbsBias` < ~0.5 logits and `Significant = 0` mean # no cell exceeded the screen. The `BonferroniSignificant` / # `HolmSignificant` columns count cells that survive multiple- # testing correction; both being 0 is a stronger "no bias" # signal than the raw screen-positive count alone. s_bias$top_rows # Look for: rows with `|t|` > 2 and |Bias Size| > 0.5 logits warrant # review (large effect AND statistically reliable). Rows with only # one of those triggered are usually small-cell artefacts. p_bias <- plot_bias_interaction(bias, draw = FALSE) p_bias$data$plot
Build an estimation-iteration report (preferred alias)
estimation_iteration_report( fit, max_iter = 20, reltol = NULL, include_prox = TRUE, include_fixed = FALSE )estimation_iteration_report( fit, max_iter = 20, reltol = NULL, include_prox = TRUE, include_fixed = FALSE )
fit |
Output from |
max_iter |
Maximum replay iterations (excluding optional initial row). |
reltol |
Stopping tolerance for replayed max-logit change. |
include_prox |
If |
include_fixed |
If |
summary(out) is supported through summary().
plot(out) is dispatched through plot() for class
mfrm_iteration_report (type = "residual", "logit_change",
"objective").
A named list with iteration-report components. Class:
mfrm_iteration_report.
iterations: trajectory of convergence indicators by iteration.
summary: final status and stopping diagnostics.
optional PROX row: pseudo-initial reference point when enabled.
Run estimation_iteration_report(fit).
Inspect plateau/stability patterns in summary/plot.
Adjust optimization settings if convergence looks weak.
fit_mfrm(), specifications_report(), data_quality_report(),
mfrmr_reports_and_tables, mfrmr_compatibility_layer
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- estimation_iteration_report(fit, max_iter = 5) summary(out) p_iter <- plot(out, draw = FALSE) p_iter$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- estimation_iteration_report(fit, max_iter = 5) summary(out) p_iter <- plot(out, draw = FALSE) p_iter$data$plot
Evaluate arbitrary-facet interaction-bias screening.
evaluate_mfrm_bias_detection( sim_spec, bias_targets, facet_pairs = NULL, design_id = 1, reps = 20, seed = NULL, alpha = 0.05, p_adjust = "holm", bias_abs_t = 2, bias_p_cut = alpha, fit_method = c("MML", "JML", "JMLE"), maxit = 100, quad_points = 21, bias_max_iter = 2, residual_pca = c("none", "overall", "facet", "both"), fit_args = list() )evaluate_mfrm_bias_detection( sim_spec, bias_targets, facet_pairs = NULL, design_id = 1, reps = 20, seed = NULL, alpha = 0.05, p_adjust = "holm", bias_abs_t = 2, bias_p_cut = alpha, fit_method = c("MML", "JML", "JMLE"), maxit = 100, quad_points = 21, bias_max_iter = 2, residual_pca = c("none", "overall", "facet", "both"), fit_args = list() )
sim_spec |
Output from |
bias_targets |
Data frame of known interaction targets. Use either direct facet columns plus |
facet_pairs |
Optional list of pairwise facet vectors to evaluate. Target pairs are always included. |
design_id |
Design rows to evaluate. Use |
reps |
Replications per design. |
seed |
Optional random seed. |
alpha |
Screening alpha used for adjusted p values. |
p_adjust |
Multiplicity adjustment passed to |
bias_abs_t |
Minimum absolute screening t value. |
bias_p_cut |
Optional p-value cutoff. Defaults to |
fit_method |
Estimation method passed to |
maxit |
Maximum iterations passed to |
quad_points |
Quadrature points passed to |
bias_max_iter |
Maximum iterations passed to |
residual_pca |
Residual PCA mode passed to |
fit_args |
Optional additional arguments passed to |
This helper is a simulation-based screening sensitivity check. It does not convert estimate_bias() into a formal inferential test.
BiasScreenRate is the fraction of replications in which the injected target cell passed the selected adjusted-p and absolute-t screening rules. BiasScreenFalsePositiveRate is computed from non-target cells in the same pairwise bias table.
The Effect value is added to the RSM linear predictor for rows matching the target cell. Positive effects increase expected scores; negative effects decrease expected scores after accounting for person and facet main effects.
When sim_spec was created by extract_mfrm_arbitrary_sim_spec(), the evaluator carries forward the fitted rating range and retained Weight column by default. Pass an explicit fit_args$weight value if a sensitivity run should change that refitting convention.
The evaluator also stores fitted measure estimates, reliability coefficients, and fit statistics for successful replications. Use as.data.frame(eval, component = "estimates"), as.data.frame(eval, component = "reliability"), or as.data.frame(eval, component = "fit_summary") for CSV export or custom visualization.
An object of class mfrm_bias_detection with design grid, per-target results, pair-level summaries, run-level fit summaries, fitted estimates, reliability tables, fit-statistic tables, and settings. These table components are ordinary data frames and can be saved with utils::write.csv() or retrieved with as.data.frame(x, component = "estimates").
estimate_bias(),
extract_mfrm_arbitrary_sim_spec(),
simulate_mfrm_arbitrary_data()
## Not run: spec <- build_mfrm_arbitrary_sim_spec( n_person = 16, facets = c(Rater = 3, Criteria = 2, Task = 3), facets_per_person = c(Rater = 2, Task = 2), score_levels = 4 ) targets <- data.frame(Rater = "Rater03", Task = "Task03", Effect = -0.7) # Repeated simulations estimate how often this injected interaction is # recovered by the bias-screening workflow. eval <- evaluate_mfrm_bias_detection( spec, bias_targets = targets, reps = 10, seed = 1 ) summary(eval) plot(eval, metric = "screen_rate") plot(eval, metric = "reliability", facet = "Rater") head(as.data.frame(eval, component = "estimates")) head(as.data.frame(eval, component = "reliability")) ## End(Not run)## Not run: spec <- build_mfrm_arbitrary_sim_spec( n_person = 16, facets = c(Rater = 3, Criteria = 2, Task = 3), facets_per_person = c(Rater = 2, Task = 2), score_levels = 4 ) targets <- data.frame(Rater = "Rater03", Task = "Task03", Effect = -0.7) # Repeated simulations estimate how often this injected interaction is # recovered by the bias-screening workflow. eval <- evaluate_mfrm_bias_detection( spec, bias_targets = targets, reps = 10, seed = 1 ) summary(eval) plot(eval, metric = "screen_rate") plot(eval, metric = "reliability", facet = "Rater") head(as.data.frame(eval, component = "estimates")) head(as.data.frame(eval, component = "reliability")) ## End(Not run)
Evaluate MFRM design conditions by repeated simulation
evaluate_mfrm_design( n_person = c(30, 50, 100), n_rater = c(3, 5), n_criterion = c(3, 5), raters_per_person = n_rater, design = NULL, reps = 10, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, fit_method = c("JML", "MML"), model = c("RSM", "PCM", "GPCM"), step_facet = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), sim_spec = NULL, seed = NULL, parallel = c("no", "future") )evaluate_mfrm_design( n_person = c(30, 50, 100), n_rater = c(3, 5), n_criterion = c(3, 5), raters_per_person = n_rater, design = NULL, reps = 10, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, fit_method = c("JML", "MML"), model = c("RSM", "PCM", "GPCM"), step_facet = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), sim_spec = NULL, seed = NULL, parallel = c("no", "future") )
n_person |
Vector of person counts to evaluate. |
n_rater |
Vector of rater counts to evaluate. |
n_criterion |
Vector of criterion counts to evaluate. |
raters_per_person |
Vector of rater assignments per person. |
design |
Optional named design-grid override supplied as a named list,
named vector, or one-row data frame. Names may use canonical variables
( |
reps |
Number of replications per design condition. |
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
rater_sd |
Standard deviation of simulated rater severities. |
criterion_sd |
Standard deviation of simulated criterion difficulties. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread of step thresholds on the logit scale. |
fit_method |
Estimation method passed to |
model |
Measurement model passed to |
step_facet |
Step facet passed to |
maxit |
Maximum iterations passed to |
quad_points |
Quadrature points for |
residual_pca |
Residual PCA mode passed to |
sim_spec |
Optional output from |
seed |
Optional seed for reproducible replications. |
parallel |
Parallelisation strategy for the rep loop within
each design row. |
This helper runs a compact Monte Carlo design study for common rater-by-item many-facet settings.
For each design condition, the function:
generates synthetic data with simulate_mfrm_data()
fits the requested MFRM with fit_mfrm()
computes diagnostics with diagnose_mfrm()
stores recovery and precision summaries by facet
The result is intended for planning questions such as:
how many raters are needed for stable rater separation?
how does raters_per_person affect severity recovery?
when do category counts become too sparse for comfortable interpretation?
This is a parametric simulation study. It does not take one observed design (for example, 4 raters x 30 persons x 3 criteria) and analytically extrapolate what would happen under a different design (for example, 2 raters x 40 persons x 5 criteria). Instead, you specify a design grid and data-generating assumptions (latent spread, facet spread, thresholds, noise, and scoring structure), and the function repeatedly generates synthetic data under those assumptions.
When you want the simulated conditions to resemble an existing study, use
substantive knowledge or estimates from that study to choose
theta_sd, rater_sd, criterion_sd, score_levels, and related
settings before running the design evaluation.
When sim_spec is supplied, the function uses it as the explicit
data-generating mechanism. This is the recommended route when you want a
design study to stay close to a previously fitted run while still varying the
candidate sample sizes or rater-assignment counts.
If that specification also stores a latent-regression population generator, each replication carries forward the simulated one-row-per-person background data and refits the MML population-model branch. This remains a scenario study under explicit assumptions; it is not a closed-form predictive distribution for one future administration.
Bounded GPCM is available in this design-evaluation helper through the
package's slope-aware simulation contract and downstream exploratory
diagnostics. The current planning layer is still role-based for exactly two
non-person facets (rater-like and criterion-like), even though the
estimation core supports arbitrary facet counts.
Recovery metrics are reported only when the generator and fitted model target
the same facet-parameter contract. In practice this means the same
model, and for PCM, the same step_facet. When these do not align,
recovery fields are set to NA and the output records the reason. Even when
these contract checks pass, the recovery summaries still assume compatible
orientation and anchoring conventions across the generator and fitted model.
An object of class mfrm_design_evaluation with components:
design_grid: evaluated design conditions. When sim_spec carries custom
public facet names, matching design-variable alias columns are included
alongside the canonical internal columns.
results: facet-level replicate results, with the same design-variable
alias columns when applicable.
rep_overview: run-level status and timing, with the same design-variable
alias columns when applicable.
design_descriptor: role-based design-variable metadata used by planning
summaries and plots
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of which design variables remain
mutable under the current simulation specification
planning_schema: combined planner-schema contract bundling the role
descriptor, scope boundary, and current mutability map
settings: simulation settings
ademp: simulation-study metadata (aims, DGM, estimands, methods, performance measures)
Facet-level simulation results include:
Separation ():
how many statistically distinct strata the facet resolves.
Reliability (): analogous to Cronbach's
for the reproducibility of element ordering.
Strata (): number of distinguishable groups.
Mean Infit and Outfit: average fit mean-squares across elements.
MisfitRate: share of elements with .
SeverityRMSE: root-mean-square error of recovered parameters vs
the known truth after facet-wise mean alignment, so that the
usual Rasch/MFRM location indeterminacy does not inflate recovery
error. This quantity is reported only when the generator and fitted model
target the same facet-parameter contract.
SeverityBias: mean signed recovery error after the same alignment;
values near zero are expected. This is likewise omitted when the
generator/fitted-model contract does not align.
Start with summary(x)$design_summary, then plot one focal metric at a time
(for example rater Separation or criterion SeverityRMSE).
Higher separation/reliability is generally better, whereas lower
SeverityRMSE, MeanMisfitRate, and MeanElapsedSec are preferable.
When choosing among designs, look for the point where increasing
n_person or raters_per_person yields diminishing returns in
separation and RMSE—this identifies the cost-effective design
frontier. ConvergedRuns / reps should be near 1.0; low
convergence rates indicate the design is too small for the chosen
estimation method.
This is a Monte Carlo design-evaluation helper. It can visualize how
separation, reliability, strata, RMSE, and fit-screen rates change when
you vary person, rater, criterion, or assignment counts. It is not a
closed-form generalizability-theory D-study calculator; use
mfrm_generalizability() for observed variance-component summaries and
treat analytic G/Phi coefficient planning as outside the current scope.
The simulation logic follows the general Monte Carlo / operating-characteristic
framework described by Morris, White, and Crowther (2019) and the
ADEMP-oriented planning/reporting guidance summarized for psychology by
Siepe et al. (2024). In mfrmr, evaluate_mfrm_design() is a practical
many-facet design-planning wrapper rather than a direct reproduction of one
published simulation study.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartos, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.
simulate_mfrm_data(), summary.mfrm_design_evaluation, plot.mfrm_design_evaluation
sim_eval <- suppressWarnings(evaluate_mfrm_design( design = list(person = c(8, 12), rater = 2, criterion = 2, assignment = 1), reps = 1, maxit = 8, seed = 123 )) s_eval <- summary(sim_eval) s_eval$design_summary[, c("Facet", "n_person", "MeanSeparation", "MeanSeverityRMSE")] p_eval <- plot(sim_eval, facet = "Rater", metric = "separation", x_var = "n_person", draw = FALSE) names(p_eval)sim_eval <- suppressWarnings(evaluate_mfrm_design( design = list(person = c(8, 12), rater = 2, criterion = 2, assignment = 1), reps = 1, maxit = 8, seed = 123 )) s_eval <- summary(sim_eval) s_eval$design_summary[, c("Facet", "n_person", "MeanSeparation", "MeanSeverityRMSE")] p_eval <- plot(sim_eval, facet = "Rater", metric = "separation", x_var = "n_person", draw = FALSE) names(p_eval)
Evaluate legacy and strict marginal diagnostic screening under controlled misfit scenarios
evaluate_mfrm_diagnostic_screening( n_person = c(30, 50, 100), n_rater = c(4), n_criterion = c(4), raters_per_person = n_rater, design = NULL, reps = 10, scenarios = c("well_specified", "local_dependence"), local_dependence_sd = 0.8, local_dependence_facet = NULL, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, model = c("RSM", "PCM", "GPCM"), step_facet = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), sim_spec = NULL, seed = NULL )evaluate_mfrm_diagnostic_screening( n_person = c(30, 50, 100), n_rater = c(4), n_criterion = c(4), raters_per_person = n_rater, design = NULL, reps = 10, scenarios = c("well_specified", "local_dependence"), local_dependence_sd = 0.8, local_dependence_facet = NULL, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, model = c("RSM", "PCM", "GPCM"), step_facet = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), sim_spec = NULL, seed = NULL )
n_person |
Vector of person counts to evaluate. |
n_rater |
Vector of rater counts to evaluate. |
n_criterion |
Vector of criterion counts to evaluate. |
raters_per_person |
Vector of rater assignments per person. |
design |
Optional named design-grid override supplied as a named list,
named vector, or one-row data frame. Names may use canonical variables
( |
reps |
Number of replications per design condition and scenario. |
scenarios |
Screening scenarios to evaluate. The current first release
supports |
local_dependence_sd |
Standard deviation of the shared context effect
injected in the |
local_dependence_facet |
Facet that receives the shared
|
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
rater_sd |
Standard deviation of simulated rater severities. |
criterion_sd |
Standard deviation of simulated criterion difficulties. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread of step thresholds on the logit scale. |
model |
Measurement model passed to |
step_facet |
Step facet passed to |
maxit |
Maximum iterations passed to |
quad_points |
Quadrature points for the internal |
residual_pca |
Residual PCA mode passed to |
sim_spec |
Optional output from |
seed |
Optional seed for reproducible replications. |
This helper performs a compact Monte Carlo validation study for the package's current diagnostic architecture.
For each design condition and scenario, the function:
generates synthetic data with simulate_mfrm_data()
fits the model with method = "MML"
computes diagnostics with diagnostic_mode = "both"
stores legacy residual-screen metrics and strict marginal-fit metrics
aggregates the results into scenario_summary and scenario_contrast
The "well_specified" scenario uses the ordinary generator with no injected
extra structure. The "local_dependence" scenario adds a shared
Person x facet random effect, centered within the selected facet levels, so
responses in the same context become correlated without changing the
facet-level mean effect contract. The "latent_misspecification" scenario
keeps the same marginal spread targets but replaces the normal person
distribution with a centered bimodal empirical support distribution, while
leaving the non-person facets on the original scale contract. The
"step_structure_misspecification" scenario uses a PCM generator with
facet-specific threshold tables that intentionally mismatch the fitted step
contract: RSM fits receive criterion-specific thresholds, and PCM fits
receive thresholds indexed by the opposite non-person facet.
This function is intentionally screening-oriented. The strict marginal branch remains exploratory in the current release, so the returned summaries should be used to compare relative sensitivity across scenarios rather than to claim calibrated inferential power.
An object of class mfrm_diagnostic_screening with:
design_grid: evaluated design conditions, including public alias columns
when applicable
results: replicate-level screening metrics for each design and scenario
scenario_summary: aggregated scenario-by-design screening summaries
performance_summary: scenario-by-design screening-performance summary
including runtime, agreement, Type I proxy, and sensitivity proxy columns
scenario_contrast: each misspecification scenario minus the
well-specified baseline when the baseline scenario was evaluated
design_descriptor: role-based design-variable metadata
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of mutable/locked design variables
planning_schema: combined planner-schema contract
settings: simulation and fitting settings
ademp: simulation-study metadata
notes: short interpretation notes
simulate_mfrm_data(), evaluate_mfrm_design(), diagnose_mfrm()
diag_eval <- evaluate_mfrm_diagnostic_screening( design = list(person = 10, rater = 2, criterion = 2, assignment = 2), reps = 1, maxit = 6, seed = 123 ) diag_eval$scenario_summary diag_eval$scenario_contrastdiag_eval <- evaluate_mfrm_diagnostic_screening( design = list(person = 10, rater = 2, criterion = 2, assignment = 2), reps = 1, maxit = 6, seed = 123 ) diag_eval$scenario_summary diag_eval$scenario_contrast
Evaluate DIF power and bias-screening behavior under known simulated signals
evaluate_mfrm_signal_detection( n_person = c(30, 50, 100), n_rater = c(4), n_criterion = c(4), raters_per_person = n_rater, design = NULL, reps = 10, group_levels = c("A", "B"), reference_group = NULL, focal_group = NULL, dif_level = NULL, dif_effect = 0.6, bias_rater = NULL, bias_criterion = NULL, bias_effect = -0.8, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, fit_method = c("JML", "MML"), model = c("RSM", "PCM", "GPCM"), step_facet = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), sim_spec = NULL, dif_method = c("residual", "refit"), dif_min_obs = 10, dif_p_adjust = "holm", dif_p_cut = 0.05, dif_abs_cut = 0.43, bias_max_iter = 2, bias_p_cut = 0.05, bias_abs_t = 2, seed = NULL )evaluate_mfrm_signal_detection( n_person = c(30, 50, 100), n_rater = c(4), n_criterion = c(4), raters_per_person = n_rater, design = NULL, reps = 10, group_levels = c("A", "B"), reference_group = NULL, focal_group = NULL, dif_level = NULL, dif_effect = 0.6, bias_rater = NULL, bias_criterion = NULL, bias_effect = -0.8, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, fit_method = c("JML", "MML"), model = c("RSM", "PCM", "GPCM"), step_facet = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), sim_spec = NULL, dif_method = c("residual", "refit"), dif_min_obs = 10, dif_p_adjust = "holm", dif_p_cut = 0.05, dif_abs_cut = 0.43, bias_max_iter = 2, bias_p_cut = 0.05, bias_abs_t = 2, seed = NULL )
n_person |
Vector of person counts to evaluate. |
n_rater |
Vector of rater counts to evaluate. |
n_criterion |
Vector of criterion counts to evaluate. |
raters_per_person |
Vector of rater assignments per person. |
design |
Optional named design-grid override supplied as a named list,
named vector, or one-row data frame. Names may use canonical variables
( |
reps |
Number of replications per design condition. |
group_levels |
Group labels used for DIF simulation. The first two levels define the default reference and focal groups. |
reference_group |
Optional reference group label used when extracting the target DIF contrast. |
focal_group |
Optional focal group label used when extracting the target DIF contrast. |
dif_level |
Target criterion level for the true DIF effect. Can be an
integer index or a criterion label such as |
dif_effect |
True DIF effect size added to the focal group on the target criterion. |
bias_rater |
Target rater level for the true interaction-bias effect.
Can be an integer index or a label such as |
bias_criterion |
Target criterion level for the true interaction-bias effect. Can be an integer index or a criterion label. Defaults to the last criterion level in each design. |
bias_effect |
True interaction-bias effect added to the target
|
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
rater_sd |
Standard deviation of simulated rater severities. |
criterion_sd |
Standard deviation of simulated criterion difficulties. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread of step thresholds on the logit scale. |
fit_method |
Estimation method passed to |
model |
Measurement model passed to |
step_facet |
Step facet passed to |
maxit |
Maximum iterations passed to |
quad_points |
Quadrature points for |
residual_pca |
Residual PCA mode passed to |
sim_spec |
Optional output from |
dif_method |
Differential-functioning method passed to |
dif_min_obs |
Minimum observations per group cell for |
dif_p_adjust |
P-value adjustment method passed to |
dif_p_cut |
P-value cutoff for counting a target DIF detection. |
dif_abs_cut |
Optional absolute contrast cutoff used when counting a
target DIF detection. When omitted, the effective default is |
bias_max_iter |
Maximum iterations passed to |
bias_p_cut |
P-value cutoff for counting a target bias screen-positive result. |
bias_abs_t |
Absolute t cutoff for counting a target bias screen-positive result. |
seed |
Optional seed for reproducible replications. |
This function performs Monte Carlo design screening for two related tasks:
DIF detection via analyze_dff() and interaction-bias screening via
estimate_bias().
For each design condition (combination of n_person, n_rater,
n_criterion, raters_per_person), the function:
Generates synthetic data with simulate_mfrm_data()
Injects one known Group Criterion DIF effect
(dif_effect logits added to the focal group on the target criterion)
Injects one known Rater Criterion interaction-bias
effect (bias_effect logits)
Fits and diagnoses the MFRM
Runs analyze_dff() and estimate_bias()
Records whether the injected signals were detected or screen-positive
Detection criteria:
A DIF signal is counted as "detected" when the target contrast has
dif_p_cut and, when an absolute contrast cutoff is in
force, dif_abs_cut. For
dif_method = "refit", dif_abs_cut is interpreted on the logit scale.
For dif_method = "residual", the residual-contrast screening result is
used and the default is to rely on the significance test alone.
Bias results are different: estimate_bias() reports t and Prob. as
screening metrics rather than formal inferential quantities. Here, a bias
cell is counted as screen-positive only when those screening metrics are
available and satisfy
First-release GPCM is not yet available in this helper because its signal-
detection path still depends on simulation and diagnostics layers validated
only for RSM / PCM. More broadly, the current planning layer is still
role-based for exactly two non-person facets (rater-like and
criterion-like), even though the estimation core supports arbitrary facet
counts.
bias_p_cut and bias_abs_t.
Power is the proportion of replications in which the target signal
was correctly detected. For DIF this is a conventional power summary.
For bias, the primary summary is BiasScreenRate, a screening hit rate
rather than formal inferential power.
False-positive rate is the proportion of non-target cells that were
incorrectly flagged. For DIF this is interpreted in the usual testing
sense. For bias, BiasScreenFalsePositiveRate is a screening rate and
should not be read as a calibrated inferential alpha level.
Default effect sizes: dif_effect = 0.6 logits corresponds to a
moderate criterion-linked differential-functioning effect; bias_effect = -0.8
logits represents a substantial rater-criterion interaction. Adjust
these to match the smallest effect size of practical concern for your
application.
This is again a parametric simulation study. The function does not estimate a new design directly from one observed dataset. Instead, it evaluates detection or screening behavior under user-specified design conditions and known injected signals.
If you want to approximate a real study, choose the design grid and
simulation settings so that they reflect the empirical context of interest.
For example, you may set n_person, n_rater, n_criterion,
raters_per_person, and the latent-spread arguments to values motivated by
an existing assessment program, then study how operating characteristics
change as those design settings vary.
When sim_spec is supplied, the function uses it as the explicit
data-generating mechanism for the latent spreads, thresholds, and assignment
archetype, while still injecting the requested target DIF and bias effects
for each design condition.
If that specification also stores a latent-regression population generator, each replication carries simulated one-row-per-person background data into the MML fit. This remains a screening-oriented Monte Carlo study; it is not a person-level posterior prediction for one observed sample.
An object of class mfrm_signal_detection with:
design_grid: evaluated design conditions. When sim_spec carries custom
public facet names, matching design-variable alias columns are included
alongside the canonical internal columns.
results: replicate-level detection results, with the same
design-variable alias columns when applicable.
rep_overview: run-level status and timing, with the same design-variable
alias columns when applicable.
design_descriptor: role-based design-variable metadata used by planning
summaries and plots
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of which design variables remain
mutable under the current simulation specification
planning_schema: combined planner-schema contract bundling the role
descriptor, scope boundary, and current mutability map
settings: signal-analysis settings
ademp: simulation-study metadata (aims, DGM, estimands, methods, performance measures)
The simulation logic follows the general Monte Carlo / operating-characteristic
framework described by Morris, White, and Crowther (2019) and the
ADEMP-oriented planning/reporting guidance summarized for psychology by
Siepe et al. (2024). In mfrmr, evaluate_mfrm_signal_detection() is a
many-facet screening helper specialized to DIF and interaction-bias use
cases; it is not a direct implementation of one published many-facet Rasch
simulation design.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartos, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.
simulate_mfrm_data(), evaluate_mfrm_design(), analyze_dff(), analyze_dif(), estimate_bias()
sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection( design = list(person = 8, rater = 2, criterion = 2, assignment = 1), reps = 1, maxit = 5, bias_max_iter = 1, seed = 123 )) s_sig <- summary(sig_eval) s_sig$overviewsig_eval <- suppressWarnings(evaluate_mfrm_signal_detection( design = list(person = 8, rater = 2, criterion = 2, assignment = 1), reps = 1, maxit = 5, bias_max_iter = 1, seed = 123 )) s_sig <- summary(sig_eval) s_sig$overview
Writes tidy CSV files suitable for import into spreadsheet software or further analysis in other tools.
export_mfrm( fit, diagnostics = NULL, output_dir = ".", prefix = "mfrm", tables = c("person", "facets", "summary", "steps", "slopes", "measures"), overwrite = FALSE )export_mfrm( fit, diagnostics = NULL, output_dir = ".", prefix = "mfrm", tables = c("person", "facets", "summary", "steps", "slopes", "measures"), overwrite = FALSE )
fit |
Output from |
diagnostics |
Optional output from |
output_dir |
Directory for CSV files. Created if it does not exist. |
prefix |
Filename prefix (default |
tables |
Character vector of tables to export. Any subset of
|
overwrite |
If |
Invisibly, a data.frame listing written files with columns
Table and Path.
{prefix}_person_estimates.csvPerson ID, Estimate, SD.
{prefix}_facet_estimates.csvFacet, Level, Estimate, and optionally SE, Infit, Outfit, PTMEA when diagnostics supplied.
{prefix}_fit_summary.csvOne-row model summary.
{prefix}_step_parameters.csvStep/threshold parameters.
{prefix}_measures.csvFull measures table (requires diagnostics).
The returned data.frame tells you exactly which files were written and where. This is convenient for scripted pipelines where the output directory is created on the fly.
Fit a model with fit_mfrm().
Optionally compute diagnostics with diagnose_mfrm() when you want enriched facet or measures exports.
Call export_mfrm(...) and inspect the returned Path column.
fit_mfrm, diagnose_mfrm,
as.data.frame.mfrm_fit
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- export_mfrm( fit, diagnostics = diag, output_dir = tempdir(), prefix = "mfrmr_example", overwrite = TRUE ) out$Tabletoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- export_mfrm( fit, diagnostics = diag, output_dir = tempdir(), prefix = "mfrmr_example", overwrite = TRUE ) out$Table
Export an analysis bundle for sharing or archiving
export_mfrm_bundle( fit, diagnostics = NULL, bias_results = NULL, population_prediction = NULL, unit_prediction = NULL, plausible_values = NULL, summary_tables = NULL, output_dir = ".", prefix = "mfrmr_bundle", include = c("core_tables", "checklist", "dashboard", "apa", "anchors", "manifest", "visual_summaries", "predictions", "summary_tables", "script", "html"), facet = NULL, include_person_anchors = FALSE, overwrite = FALSE, zip_bundle = FALSE, zip_name = NULL, data = NULL )export_mfrm_bundle( fit, diagnostics = NULL, bias_results = NULL, population_prediction = NULL, unit_prediction = NULL, plausible_values = NULL, summary_tables = NULL, output_dir = ".", prefix = "mfrmr_bundle", include = c("core_tables", "checklist", "dashboard", "apa", "anchors", "manifest", "visual_summaries", "predictions", "summary_tables", "script", "html"), facet = NULL, include_person_anchors = FALSE, overwrite = FALSE, zip_bundle = FALSE, zip_name = NULL, data = NULL )
fit |
Output from |
diagnostics |
Optional output from |
bias_results |
Optional output from |
population_prediction |
Optional output from
|
unit_prediction |
Optional output from |
plausible_values |
Optional output from |
summary_tables |
Optional manuscript-summary bundle input. Can be
|
output_dir |
Directory where files will be written. |
prefix |
File-name prefix. |
include |
Components to export. Supported values are
|
facet |
Optional facet for |
include_person_anchors |
If |
overwrite |
If |
zip_bundle |
If |
zip_name |
Optional zip-file name. Defaults to |
data |
Optional original analysis data frame. When supplied,
|
This function creates a package-native analysis download bundle. It reuses
existing mfrmr helpers instead of reimplementing estimation or
diagnostics.
A named list with class mfrm_export_bundle.
The include argument lets you assemble a bundle for different audiences:
"core_tables" for analysts who mainly want CSV output.
"manifest" for a compact analysis record.
"script" for reproducibility and reruns. For latent-regression fits,
this also writes the fit-level replay person-data sidecar when available.
"html" for a light, shareable summary page. When replay sidecars are
present, the HTML shows an artifact index for them rather than embedding
the raw person-level replay table.
"summary_tables" for manuscript-facing CSV exports of validated
summary() surfaces and their compact indexes.
"visual_summaries" when you want warning maps or residual PCA summaries
to travel with the bundle.
Common starting points are:
minimal tables: include = c("core_tables", "manifest")
reporting bundle: include = c("core_tables", "checklist", "dashboard", "apa", "summary_tables", "html")
archival bundle: include = c("core_tables", "manifest", "script", "visual_summaries", "html")
Depending on include, the exporter can write:
core CSV tables via export_mfrm()
checklist CSVs via reporting_checklist()
facet-dashboard CSVs via facet_quality_dashboard()
APA text files via build_apa_outputs()
manuscript-summary CSVs via build_summary_table_bundle()
anchor CSV via make_anchor_table()
manifest CSV/TXT via build_mfrm_manifest()
visual warning/summary artifacts via build_visual_summaries()
prediction/forecast CSVs via predict_mfrm_population(),
predict_mfrm_units(), and sample_mfrm_plausible_values()
a package-native replay script via build_mfrm_replay_script()
for latent-regression fits, a replay-side person-data CSV paired with the replay script
a lightweight HTML report that bundles the exported tables/text and, for replay sidecars, an artifact summary instead of raw person-level rows
For latent-regression fits, prediction-side artifacts can carry the fitted
population-model scoring basis when you explicitly supply the corresponding
prediction objects. predict_mfrm_population() remains the scenario-level
forecast helper, whereas predict_mfrm_units() and
sample_mfrm_plausible_values() are the scoring layer.
To keep exports and replay scripts practical, large future-planning schemas
from scenario-level population predictions are not flattened into
*_population_prediction_settings.csv or ADeMP CSVs; the compact simulation
specification files carry the replay-relevant settings instead.
This exporter is intentionally unavailable for bounded GPCM, because the
current bundle surface would otherwise depend on blocked narrative,
fit-based export, and replay semantics from the free-discrimination branch.
The returned object reports both high-level bundle status and the exact files
written. In practice, bundle$summary is the quickest sanity check, while
bundle$written_files is the file inventory to inspect or hand off to other
tools.
Fit a model and compute diagnostics once.
Decide whether the audience needs tables only, or also a manifest, replay script, and HTML summary.
Call export_mfrm_bundle() with a dedicated output directory.
Inspect bundle$written_files or open the generated HTML file.
build_mfrm_manifest(), build_mfrm_replay_script(),
export_mfrm(), reporting_checklist(), export_summary_appendix()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bundle <- export_mfrm_bundle( fit, diagnostics = diag, output_dir = tempdir(), prefix = "mfrmr_bundle_example", include = c("core_tables", "manifest", "script", "html"), overwrite = TRUE ) bundle$summary[, c("FilesWritten", "HtmlWritten", "ScriptWritten")] head(bundle$written_files)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bundle <- export_mfrm_bundle( fit, diagnostics = diag, output_dir = tempdir(), prefix = "mfrmr_bundle_example", include = c("core_tables", "manifest", "script", "html"), overwrite = TRUE ) bundle$summary[, c("FilesWritten", "HtmlWritten", "ScriptWritten")] head(bundle$written_files)
Export manuscript appendix tables from validated summary surfaces
export_summary_appendix( x, output_dir = ".", prefix = "mfrmr_appendix", include_html = TRUE, preset = c("all", "recommended", "compact", "methods", "results", "diagnostics", "reporting"), overwrite = FALSE, zip_bundle = FALSE, zip_name = NULL, digits = 3, top_n = 10, preview_chars = 160 )export_summary_appendix( x, output_dir = ".", prefix = "mfrmr_appendix", include_html = TRUE, preset = c("all", "recommended", "compact", "methods", "results", "diagnostics", "reporting"), overwrite = FALSE, zip_bundle = FALSE, zip_name = NULL, digits = 3, top_n = 10, preview_chars = 160 )
x |
A supported |
output_dir |
Directory where files will be written. |
prefix |
File-name prefix for written artifacts. |
include_html |
If |
preset |
Appendix table-selection preset:
|
overwrite |
If |
zip_bundle |
If |
zip_name |
Optional zip-file name. Defaults to |
digits |
Digits forwarded when raw objects must be normalized through
|
top_n |
Row cap forwarded when raw objects must be normalized through
|
preview_chars |
Character cap forwarded when APA-output summaries must
be normalized through |
This helper is the narrow public bridge from validated summary() surfaces
to manuscript appendix artifacts. It accepts the same reporting objects that
build_summary_table_bundle() supports, exports their table bundles as CSV,
and optionally assembles a lightweight HTML appendix page.
Fit-level caveats are exported through the analysis_caveats role, and
pre-fit score-support caveats are exported through the
score_category_caveats role. Both roles are classified as diagnostics, so
they remain available under "recommended" and "diagnostics" presets when
the source summary contains caveat rows.
Unlike export_mfrm_bundle(), this helper does not require a fitted model.
It is intended for the stage where compact reporting summaries already exist
and the task is to hand off appendix-ready tables, catalogs, and reporting
maps.
A named list of class mfrm_summary_appendix_export with:
summary
written_files
selection_summary
selection_table_summary
selection_section_table_summary
selection_handoff_table_summary
selection_handoff_preset_summary
selection_handoff_summary
selection_handoff_bundle_summary
selection_handoff_role_summary
selection_handoff_role_section_summary
selection_role_summary
selection_section_summary
selection_catalog
settings
notes
Build summary(...) objects from fit, diagnostics, data description,
reporting checklist, or APA outputs.
Call export_summary_appendix(...) on one object or a named list.
Hand off the written CSV/HTML appendix artifacts to manuscript or QA workflows.
build_summary_table_bundle(), export_mfrm_bundle(),
apa_table()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") appendix <- export_summary_appendix( list(fit = fit, diagnostics = diag), output_dir = tempdir(), prefix = "mfrmr_appendix_example", include_html = TRUE, overwrite = TRUE ) appendix$summarytoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") appendix <- export_summary_appendix( list(fit = fit, diagnostics = diag), output_dir = tempdir(), prefix = "mfrmr_appendix_example", include_html = TRUE, overwrite = TRUE ) appendix$summary
Extract an arbitrary-facet simulation specification from a fitted model.
extract_mfrm_arbitrary_sim_spec( fit, data = NULL, assignment = c("skeleton", "balanced"), parameter_source = c("estimates", "resampled"), facets_per_person = NULL, group = NULL, include_weights = TRUE, noise_sd = 0, dif_effects = NULL, interaction_effects = NULL )extract_mfrm_arbitrary_sim_spec( fit, data = NULL, assignment = c("skeleton", "balanced"), parameter_source = c("estimates", "resampled"), facets_per_person = NULL, group = NULL, include_weights = TRUE, noise_sd = 0, dif_effects = NULL, interaction_effects = NULL )
fit |
An |
data |
Optional original long-format data. When supplied with |
assignment |
Simulation assignment mode. |
parameter_source |
Parameter source for simulated truth. |
facets_per_person |
Optional named assignment counts used when |
group |
Optional group column in |
include_weights |
Whether the fitted analysis weights should be carried into skeleton simulations. |
noise_sd |
Optional observation-level noise added during simulation. |
dif_effects |
Optional DIF effect table passed through to the resulting specification. |
interaction_effects |
Optional interaction effect table passed through to the resulting specification. |
This helper connects the model a researcher has already estimated to the arbitrary-facet simulation branch. It is deliberately limited to fitted RSM models in version 0.2.0 because simulate_mfrm_arbitrary_data() uses a common-threshold RSM response generator. Role-based PCM/GPCM simulation remains available through extract_mfrm_sim_spec() and simulate_mfrm_data().
With assignment = "skeleton", the generated data reuse the same retained person-by-facet observation rows used by the fitted model. With parameter_source = "estimates", the fitted person measures, facet measures, and step estimates are used as the data-generating truth. This answers a direct sensitivity question: how would repeated responses behave under this fitted model and this observed design? If the fitted analysis used observation weights, the extracted specification records the retained Weight column so evaluate_mfrm_bias_detection() can refit simulated data with the same weighting unless fit_args$weight overrides it.
Use assignment = "balanced" when the question is not a direct parametric replay of the observed design, but a planning question such as "what if the same fitted severity distribution were used under a more balanced rater-task assignment?" In that case, facets_per_person controls the rebuilt design.
An object of class mfrm_arbitrary_sim_spec.
simulate_mfrm_arbitrary_data(),
build_mfrm_arbitrary_sim_spec(),
extract_mfrm_sim_spec()
spec0 <- build_mfrm_arbitrary_sim_spec( n_person = 12, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2), score_levels = 4 ) dat <- simulate_mfrm_arbitrary_data(spec0, seed = 1) fit <- fit_mfrm( dat, person = "Person", facets = spec0$facet_names, score = "Score", rating_min = 1, rating_max = 4, model = "RSM", method = "JML", maxit = 20 ) fitted_spec <- extract_mfrm_arbitrary_sim_spec(fit) summarize_mfrm_sim_design(fitted_spec)$overview fitted_sim <- simulate_mfrm_arbitrary_data(fitted_spec, seed = 2) head(fitted_sim) balanced_spec <- extract_mfrm_arbitrary_sim_spec( fit, assignment = "balanced", parameter_source = "resampled", facets_per_person = c(Rater = 2) ) summarize_mfrm_sim_design(balanced_spec)$assignmentspec0 <- build_mfrm_arbitrary_sim_spec( n_person = 12, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2), score_levels = 4 ) dat <- simulate_mfrm_arbitrary_data(spec0, seed = 1) fit <- fit_mfrm( dat, person = "Person", facets = spec0$facet_names, score = "Score", rating_min = 1, rating_max = 4, model = "RSM", method = "JML", maxit = 20 ) fitted_spec <- extract_mfrm_arbitrary_sim_spec(fit) summarize_mfrm_sim_design(fitted_spec)$overview fitted_sim <- simulate_mfrm_arbitrary_data(fitted_spec, seed = 2) head(fitted_sim) balanced_spec <- extract_mfrm_arbitrary_sim_spec( fit, assignment = "balanced", parameter_source = "resampled", facets_per_person = c(Rater = 2) ) summarize_mfrm_sim_design(balanced_spec)$assignment
Derive a simulation specification from a fitted MFRM object
extract_mfrm_sim_spec( fit, assignment = c("auto", "crossed", "rotating", "resampled", "skeleton"), latent_distribution = c("normal", "empirical"), source_data = NULL, person = NULL, group = NULL )extract_mfrm_sim_spec( fit, assignment = c("auto", "crossed", "rotating", "resampled", "skeleton"), latent_distribution = c("normal", "empirical"), source_data = NULL, person = NULL, group = NULL )
fit |
Output from |
assignment |
Assignment design to record in the returned specification.
Use |
latent_distribution |
Latent-value generator to record in the returned
specification. |
source_data |
Optional original source data used to recover additional
non-calibration columns, currently person-level |
person |
Optional person column name in |
group |
Optional group column name in |
extract_mfrm_sim_spec() uses a fitted model as a practical starting point
for later simulation studies. It extracts:
design counts from the fitted data
empirical spread of person and facet estimates
optional empirical support values for semi-parametric draws
fitted threshold values
either a simplified assignment summary ("crossed" / "rotating"),
empirical resampled assignment profiles ("resampled"), or an observed
response skeleton ("skeleton", optionally carrying Group/Weight)
when the fit used the latent-regression branch, the fitted
population_formula, coefficient vector, residual variance, and the
stored person-level covariate table, including model-matrix xlevel and
contrast provenance for categorical covariates
This is intended as a fit-derived parametric starting point, not as a claim that the fitted object perfectly recovers the true data-generating mechanism. Users should review and, if necessary, edit the returned specification before using it for design planning.
First-release GPCM fits are now supported here for direct data generation,
provided that the returned simulation specification stores both a threshold
table and a parallel slope table. Planning, forecasting, reporting, and
package-native replay/export helpers now consume that slope-aware contract
with explicit bounded-GPCM caveats.
If you want to carry person-level group labels into a fit-derived observed
response skeleton, provide the original source_data together with
person and group. Group labels are treated as person-level metadata and
are checked for one-label-per-person consistency before being merged.
An object of class mfrm_sim_spec.
The returned object is a simulation specification, not a prediction about one future sample. It captures one convenient approximation to the observed design and estimated spread in the fitted run.
build_mfrm_sim_spec(), simulate_mfrm_data()
toy <- simulate_mfrm_data( n_person = 8, n_rater = 3, n_criterion = 2, seed = 123 ) fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 5) spec <- extract_mfrm_sim_spec(fit, latent_distribution = "empirical") spec$assignment spec$model head(spec$threshold_table)toy <- simulate_mfrm_data( n_person = 8, n_rater = 3, n_criterion = 2, seed = 123 ) fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 5) spec <- extract_mfrm_sim_spec(fit, latent_distribution = "empirical") spec$assignment spec$model head(spec$threshold_table)
Build a compact dashboard for one facet at a time, combining facet severity, misfit, central-tendency screening, and optional bias counts.
facet_quality_dashboard( fit, diagnostics = NULL, facet = NULL, bias_results = NULL, severity_warn = 1, misfit_warn = NULL, central_tendency_max = 0.25, bias_count_warn = 1L, bias_abs_t_warn = 2, bias_abs_size_warn = 0.5, bias_p_max = 0.05 )facet_quality_dashboard( fit, diagnostics = NULL, facet = NULL, bias_results = NULL, severity_warn = 1, misfit_warn = NULL, central_tendency_max = 0.25, bias_count_warn = 1L, bias_abs_t_warn = 2, bias_abs_size_warn = 0.5, bias_p_max = 0.05 )
fit |
Output from |
diagnostics |
Optional output from |
facet |
Optional facet name. When |
bias_results |
Optional output from |
severity_warn |
Absolute estimate cutoff used to flag severity outliers. |
misfit_warn |
Mean-square cutoff used to flag misfit. |
central_tendency_max |
Absolute estimate cutoff used to flag central tendency. Levels near zero are marked. |
bias_count_warn |
Minimum flagged-bias row count required to flag a level. |
bias_abs_t_warn |
Absolute |
bias_abs_size_warn |
Absolute bias-size cutoff used when deriving bias-row flags from a raw bias bundle. |
bias_p_max |
Probability cutoff used when deriving bias-row flags from a raw bias bundle. |
The dashboard screens individual facet elements across four complementary criteria:
Severity: elements with
severity_warn logits are flagged as unusually harsh or lenient.
Misfit: elements with Infit or Outfit MnSq outside the
active MnSq screening band are flagged. The band defaults to the package
pair returned by mfrm_misfit_thresholds() (broad 0.5-1.5);
pass misfit_warn = 1.5 to request the older symmetric
misfit_warnmisfit_warn
form (0.67-1.5).
Central tendency: elements with
central_tendency_max logits
are flagged. Near-zero estimates may indicate a rater who avoids
extreme categories, producing artificially narrow score ranges.
Bias: elements involved in bias_count_warn
screen-positive interaction cells (from estimate_bias()) are flagged.
A flag density score counts how many of the four criteria each element triggers. Elements flagged on multiple criteria warrant priority review (e.g., rater retraining, data exclusion).
Default thresholds are designed for moderate-stakes rating contexts. Adjust for your application: stricter thresholds for high-stakes certification, more lenient for formative assessment.
An object of class mfrm_facet_dashboard (also inheriting from
mfrm_bundle and list). The object summarizes one target facet:
overview reports the facet-level screening totals, summary provides
aggregate estimates and flag counts, detail contains one row per facet
level with the computed screening indicators, ranked orders levels by
review priority, flagged keeps only levels requiring follow-up,
bias_sources records which bias-result bundles contributed to the
counts, settings stores the resolved thresholds, and notes gives short
interpretation messages about how to read the dashboard.
The returned object is a bundle-like list with class
mfrm_facet_dashboard and components:
facet: character scalar naming the dashboard's target facet
facet_source: character scalar describing whether the target
facet was inferred from the fit configuration or supplied
explicitly
overview: one-row structural overview
summary: one-row screening summary
detail: level-level detail table. When fit statistics are available,
MisfitDirection separates underfit (above the upper MnSq band),
overfit (below the lower band), mixed, and in_band.
ranked: detail ordered by flag density / severity
flagged: flagged levels only
bias_sources: per-bundle bias aggregation metadata
settings: resolved threshold settings
notes: short interpretation notes
diagnostics: the mfrm_diagnostics bundle the dashboard was
built from (echoed for downstream helpers that need to traverse
the same diagnostics object)
bias_results: the mfrm_bias bundle (or list of bundles)
when bias_results was supplied; NULL otherwise
diagnose_mfrm(), estimate_bias(), plot_qc_dashboard()
toy <- load_mfrmr_data("example_core") toy <- toy[toy$Person %in% unique(toy$Person)[1:8], ] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 50) diag <- diagnose_mfrm(fit, residual_pca = "none") dash <- facet_quality_dashboard(fit, diagnostics = diag) summary(dash) dash$detail[, c("Level", "Infit", "Outfit", "MisfitDirection", "FlagLabel")]toy <- load_mfrmr_data("example_core") toy <- toy[toy$Person %in% unique(toy$Person)[1:8], ] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 50) diag <- diagnose_mfrm(fit, residual_pca = "none") dash <- facet_quality_dashboard(fit, diagnostics = diag) summary(dash) dash$detail[, c("Level", "Infit", "Outfit", "MisfitDirection", "FlagLabel")]
Reports per-level observation counts, SE, and fit statistics for every
level of every facet in a fitted MFRM model, and classifies each level
as "sparse", "marginal", "standard", or "strong" against the
Linacre sample-size bands.
facet_small_sample_audit( fit, diagnostics = NULL, thresholds = c(sparse = 10, marginal = 30, standard = 50) )facet_small_sample_audit( fit, diagnostics = NULL, thresholds = c(sparse = 10, marginal = 30, standard = 50) )
fit |
An |
diagnostics |
Optional |
thresholds |
Named numeric vector of count bands. Defaults are
|
In mfrmr every facet is a fixed effect (see ?fit_mfrm, "Fixed
effects assumption"), so a level with very few ratings contributes an
estimate with wide SE but no shrinkage toward the facet mean. This
helper surfaces those levels up front so users can decide whether to
drop them, pool them, or move to a hierarchical model outside mfrmr.
A list of class mfrm_facet_sample_audit with:
table: one row per (Facet, Level) with N, Estimate, SE,
Infit, Outfit, and SampleCategory.
summary: counts of levels in each sample-size category, by facet.
facet_summary: smallest observed level count per facet.
thresholds: the applied count bands.
"sparse" (n < 10): level-level estimate is unstable; SE will be
wide; consider combining with adjacent levels or treating as
exploratory only.
"marginal" (10 <= n < 30): below Linacre (1994) 95% CI
+-1.0 logit threshold; usable as screening only.
"standard" (30 <= n < 50): meets baseline stability; reasonable
for publication if fit statistics are acceptable.
"strong" (n >= 50): well-targeted; facet estimate is robust.
Because mfrmr has no shrinkage by default, sparse and marginal levels
do not "borrow strength" from other levels. Jones and Wind (2018)
report that rater estimates are particularly sensitive to thin
linking; the Facet = "Person" row is usually less of a concern
because the person prior integrates out the uncertainty.
Fit with fit_mfrm(); optionally also produce diagnostics
with diagnose_mfrm() if you want per-level Infit/Outfit.
Call facet_small_sample_audit(fit, diagnostics).
Read the facet_summary first: it highlights the worst level
per facet. The summary table gives counts in each band.
If any facet is flagged as sparse or marginal, discuss it in the
Methods section; build_apa_outputs() already adds a sentence
about the band when fit$summary$FacetSampleSizeFlag is set.
Linacre, J. M. (2026). A User's Guide to FACETS, Version 4.5.0. Winsteps.com. https://www.winsteps.com/facets.htm
Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328. https://www.rasch.org/rmt/rmt74m.htm
Jones, E., & Wind, S. A. (2018). Using repeated ratings to improve measurement precision in incomplete rating designs. Journal of Applied Measurement, 19(2), 148-161.
detect_facet_nesting(), analyze_hierarchical_structure(),
compute_facet_icc(), compute_facet_design_effect(),
reporting_checklist().
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) audit <- facet_small_sample_audit(fit) summary(audit) # Custom thresholds (e.g. a stricter protocol). strict <- facet_small_sample_audit( fit, thresholds = c(sparse = 15, marginal = 40, standard = 100) ) strict$facet_summarytoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) audit <- facet_small_sample_audit(fit) summary(audit) # Custom thresholds (e.g. a stricter protocol). strict <- facet_small_sample_audit( fit, thresholds = c(sparse = 15, marginal = 40, standard = 100) ) strict$facet_summary
Build a facet statistics report (preferred alias)
facet_statistics_report( fit, diagnostics = NULL, metrics = c("Estimate", "Infit", "Outfit", "SE"), ruler_width = 41, distribution_basis = c("both", "sample", "population"), se_mode = c("both", "model", "fit_adjusted") )facet_statistics_report( fit, diagnostics = NULL, metrics = c("Estimate", "Infit", "Outfit", "SE"), ruler_width = 41, distribution_basis = c("both", "sample", "population"), se_mode = c("both", "model", "fit_adjusted") )
fit |
Output from |
diagnostics |
Optional output from |
metrics |
Numeric columns in |
ruler_width |
Width of the fixed-width ruler used for |
distribution_basis |
Which distribution basis to keep in the appended
precision summary: |
se_mode |
Which standard-error mode to keep in the appended precision
summary: |
summary(out) is supported through summary().
plot(out) is dispatched through plot() for class
mfrm_facet_statistics (type = "means", "sds", "ranges").
A named list with facet-statistics components. Class:
mfrm_facet_statistics.
facet-level means/SD/ranges of selected metrics (Estimate, fit indices, SE).
fixed-width ruler rows (M/S/Q/X) for compact profile scanning.
Run facet_statistics_report(fit).
Inspect summary/ranges for anomalous facets.
Cross-check flagged facets with fit and chi-square diagnostics. The returned bundle now includes:
precision_summary: facet precision/separation indices by
DistributionBasis and SEMode
variability_tests: fixed/random variability tests by facet
se_modes: compact list of available SE modes by facet
diagnose_mfrm(), summary.mfrm_fit(), plot_facets_chisq(),
mfrmr_reports_and_tables
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- facet_statistics_report(fit) summary(out) p_fs <- plot(out, draw = FALSE) p_fs$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- facet_statistics_report(fit) summary(out) p_fs <- plot(out, draw = FALSE) p_fs$data$plot
Build facet variability diagnostics with fixed/random reference tests
facets_chisq_table( fit, diagnostics = NULL, fixed_p_max = 0.05, random_p_max = 0.05, top_n = NULL )facets_chisq_table( fit, diagnostics = NULL, fixed_p_max = 0.05, random_p_max = 0.05, top_n = NULL )
fit |
Output from |
diagnostics |
Optional output from |
fixed_p_max |
Warning cutoff for fixed-effect chi-square p-values. |
random_p_max |
Warning cutoff for random-effect chi-square p-values. |
top_n |
Optional maximum number of facet rows to keep. |
This helper summarizes facet-level variability with fixed and random chi-square indices for spread and heterogeneity checks.
A named list with:
table: facet-level chi-square diagnostics
summary: one-row summary
thresholds: applied p-value thresholds
table: facet-level fixed/random chi-square and p-value flags.
summary: number of significant facets and overall magnitude indicators.
thresholds: p-value criteria used for flagging.
Use this table together with inter-rater and displacement diagnostics to distinguish global facet effects from local anomalies.
Run facets_chisq_table(fit, ...).
Inspect summary(chi) then facet rows in chi$table.
Visualize with plot_facets_chisq().
The table data.frame contains:
Facet name.
Number of estimated levels in this facet.
Mean and standard deviation of level measures.
Fixed-effect chi-square test (null hypothesis: all levels equal). Significant result means the facet elements differ more than measurement error alone.
Random-effect test (null hypothesis: variation equals that of a random sample from a single population). Significant result suggests systematic heterogeneity beyond sampling variation.
Logical flags for significance.
diagnose_mfrm(), interrater_agreement_table(), plot_facets_chisq()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) chi <- facets_chisq_table(fit) summary(chi) p_chi <- plot(chi, draw = FALSE) p_chi$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) chi <- facets_chisq_table(fit) summary(chi) p_chi <- plot(chi, draw = FALSE) p_chi$data$plot
GRAPH= / SCORE=)Build a legacy-compatible output-file bundle (GRAPH= / SCORE=)
facets_output_file_bundle( fit, diagnostics = NULL, include = c("graph", "score"), theta_range = c(-6, 6), theta_points = 241, digits = 4, include_fixed = FALSE, fixed_max_rows = 400, write_files = FALSE, output_dir = NULL, file_prefix = "mfrmr_output", overwrite = FALSE )facets_output_file_bundle( fit, diagnostics = NULL, include = c("graph", "score"), theta_range = c(-6, 6), theta_points = 241, digits = 4, include_fixed = FALSE, fixed_max_rows = 400, write_files = FALSE, output_dir = NULL, file_prefix = "mfrmr_output", overwrite = FALSE )
fit |
Output from |
diagnostics |
Optional output from |
include |
Output components to include: |
theta_range |
Theta/logit range for graph coordinates. |
theta_points |
Number of points on the theta grid for graph coordinates. |
digits |
Rounding digits for numeric fields. |
include_fixed |
If |
fixed_max_rows |
Maximum rows shown in fixed-width text blocks. |
write_files |
If |
output_dir |
Output directory used when |
file_prefix |
Prefix used for output file names. |
overwrite |
If |
Legacy-compatible output files often include:
graph coordinates for Table 8 curves (GRAPH= / Graphfile=), and
observation-level modeled score lines (SCORE=-style inspection).
This helper returns both as data frames and can optionally write CSV/fixed-width text files to disk.
summary(out) is supported through summary().
plot(out) is dispatched through plot() for class
mfrm_output_bundle (type = "graph_expected", "score_residuals",
"obs_probability").
A named list including:
graphfile / graphfile_syntactic when "graph" is requested
scorefile when "score" is requested
graphfile_fixed / scorefile_fixed when include_fixed = TRUE
written_files when write_files = TRUE
settings: applied options
graphfile: legacy-compatible wide curve coordinates (human-readable labels).
graphfile_syntactic: same curves with syntactic column names for programmatic use.
scorefile: observation-level observed/expected/residual diagnostics.
written_files: audit trail of files produced when write_files = TRUE.
For reproducible pipelines, prefer graphfile_syntactic and keep
written_files in run logs.
For new scripts, prefer category_curves_report() or
category_structure_report() for scale outputs, then use
export_mfrm_bundle() for file handoff. Use
facets_output_file_bundle() only when a legacy-compatible graphfile or
scorefile contract is required.
Fit and diagnose model.
Generate bundle with include = c("graph", "score").
Validate with summary(out) / plot(out).
Export with write_files = TRUE for reporting handoff.
category_curves_report(), diagnose_mfrm(), unexpected_response_table(),
export_mfrm_bundle(), mfrmr_reports_and_tables,
mfrmr_compatibility_layer
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- facets_output_file_bundle(fit, diagnostics = diagnose_mfrm(fit, residual_pca = "none")) summary(out) p_out <- plot(out, draw = FALSE) p_out$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- facets_output_file_bundle(fit, diagnostics = diagnose_mfrm(fit, residual_pca = "none")) summary(out) p_out <- plot(out, draw = FALSE) p_out$data$plot
Build a FACETS compatibility-contract audit
facets_parity_report( fit, diagnostics = NULL, bias_results = NULL, branch = c("facets", "original"), contract_file = NULL, include_metrics = TRUE, top_n_missing = 15L )facets_parity_report( fit, diagnostics = NULL, bias_results = NULL, branch = c("facets", "original"), contract_file = NULL, include_metrics = TRUE, top_n_missing = 15L )
fit |
Output from |
diagnostics |
Optional output from |
bias_results |
Optional output from |
branch |
Contract branch. |
contract_file |
Optional path to a custom contract CSV. |
include_metrics |
If |
top_n_missing |
Number of lowest-coverage contract rows to keep in
|
This function audits produced report components against a compatibility
contract specification (inst/references/facets_column_contract.csv) and
returns:
column-level coverage per contract row
table-level coverage summaries
optional metric-level consistency checks
It is intended for compatibility-layer QA and regression auditing. It does not establish external validity or software equivalence beyond the specific schema/metric contract encoded in the audit file.
Coverage interpretation in overall:
MeanColumnCoverage and MinColumnCoverage are computed across all
contract rows (unavailable rows count as 0 coverage).
MeanColumnCoverageAvailable and MinColumnCoverageAvailable summarize
only rows whose source component is available.
summary(out) is supported through summary().
plot(out) is dispatched through plot() for class
mfrm_parity_report (type = "column_coverage", "table_coverage",
"metric_status", "metric_by_table").
An object of class mfrm_parity_report with:
overall: one-row compatibility-audit summary
column_summary: coverage summary by table ID
column_audit: row-level contract audit
missing_preview: lowest-coverage rows
metric_summary: one-row metric-check summary
metric_by_table: metric-check summary by table ID
metric_audit: row-level metric checks
settings: branch/contract metadata
overall: high-level compatibility-contract coverage and metric-check pass
rates.
column_summary / column_audit: where compatibility-schema mismatches
occur.
metric_summary / metric_audit: numerical consistency checks tied to the
current contract.
missing_preview: quickest path to unresolved compatibility gaps.
Run facets_parity_report(fit, branch = "facets").
Inspect summary(contract_audit) and missing_preview.
Patch upstream table builders, then rerun the compatibility audit.
fit_mfrm(), diagnose_mfrm(), build_fixed_reports(),
mfrmr_compatibility_layer
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") contract_audit <- facets_parity_report(fit, diagnostics = diag, branch = "facets") summary(contract_audit) p <- plot(contract_audit, draw = FALSE)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") contract_audit <- facets_parity_report(fit, diagnostics = diag, branch = "facets") summary(contract_audit) p <- plot(contract_audit, draw = FALSE)
Build an adjusted-score reference table bundle
fair_average_table( fit, diagnostics = NULL, facets = NULL, totalscore = TRUE, umean = 0, uscale = 1, udecimals = 2, reference = c("both", "mean", "zero"), label_style = c("both", "native", "legacy"), omit_unobserved = FALSE, xtreme = 0 )fair_average_table( fit, diagnostics = NULL, facets = NULL, totalscore = TRUE, umean = 0, uscale = 1, udecimals = 2, reference = c("both", "mean", "zero"), label_style = c("both", "native", "legacy"), omit_unobserved = FALSE, xtreme = 0 )
fit |
Output from |
diagnostics |
Optional output from |
facets |
Optional subset of facets. |
totalscore |
Include all observations for score totals ( |
umean |
Additive score-to-report origin shift. |
uscale |
Multiplicative score-to-report scale. |
udecimals |
Rounding digits used in formatted output. |
reference |
Which adjusted-score reference to keep in formatted outputs:
|
label_style |
Column-label style for formatted outputs:
|
omit_unobserved |
If |
xtreme |
Extreme-score adjustment amount. |
This function wraps the package's adjusted-score calculations and returns
both facet-wise and stacked tables. Historical display columns such as
Fair(M) Average and Fair(Z) Average are retained for compatibility, and
package-native aliases such as AdjustedAverage,
StandardizedAdjustedAverage, ModelBasedSE, and FitAdjustedSE are
appended to the formatted outputs.
For the Rasch-family RSM / PCM branch, these tables follow the
standard FACETS Linacre construction: fair averages are
Rasch-measure-to-score transformations evaluated in a standardized
mean/zero-facet environment.
Bounded GPCM fits are supported under a slope-aware
element-conditional construction. For each slope-facet element
the per-row fair-average is the GPCM expected score
computed at that element's own discrimination
and threshold structure. Rows for non-slope facets (Person, Rater,
...) use the geometric-mean-one slope by the GPCM
identification convention, so those rows remain continuous with
the standard PCM Linacre fair-average and reduce to it exactly
when all slopes equal one.
Fair-average-specific conditional SE columns are computed with a
measure-only delta method. For bounded GPCM, the derivative is
for slope-facet rows and
for the geometric-mean-one non-slope rows.
The standard SE, Model S.E., and Real S.E. columns retain the
same meaning as for PCM (scaled facet-measure SEs); see the
"Standard-error caveat" section below before quoting intervals.
A named list with:
by_facet: named list of formatted data.frames
stacked: one stacked data.frame across facets
raw_by_facet: unformatted internal tables
settings: resolved options
stacked: cross-facet table for global comparison.
by_facet: per-facet formatted tables for reporting.
raw_by_facet: unformatted values for custom analyses/plots.
settings: scoring-transformation and filtering options used.
Larger observed-vs-fair gaps can indicate systematic scoring tendencies by specific facet levels.
Run fair_average_table(fit, ...).
Inspect summary(t12) and t12$stacked.
Visualize with plot_fair_average().
The stacked data.frame contains:
Facet name for this row.
Element label within the facet.
Observed raw-score average.
Model-adjusted reference average on the reported score scale.
Standardized adjusted reference average.
Conditional
measure-only delta-method SEs for the corresponding fair-average
values. Package-native aliases are AdjustedAverageConditionalSE
and StandardizedAdjustedAverageConditionalSE.
Package-native aliases for the three average columns above.
Estimated logit measure for this level.
Compatibility alias for the model-based standard error.
Package-native aliases for Model S.E. and Real S.E..
Fit statistics for this level.
The SE, Model S.E., ModelBasedSE, Real S.E., and FitAdjustedSE
columns in this table are the measure-level standard errors of the
underlying facet element (the same SE that would appear in
summary(fit)$facets), rescaled by the fair-average score scale factor
so the units line up with the reported Fair(M) Average / Fair(Z) Average
columns. They are not fair-average SEs.
The Fair(M) Cond. S.E. / Fair(Z) Cond. S.E. columns are
fair-average-specific conditional delta-method SEs, but they propagate
only the focal level's measure SE through the expected-score curve.
They do not propagate the joint covariance of the relevant facet
element, threshold parameters, slopes, and person-measure estimation
through . mfrmr does not
currently expose that full joint covariance (under MML the person
measure is integrated out of the structural Hessian; under JML no
joint Hessian is built). Treat these columns as conditional screening
intervals, not full model-uncertainty intervals.
Linacre, J. M. (1989). Many-Facet Rasch Measurement. MESA Press.
Linacre, J. M. (1994). Many-facet Rasch Measurement (2nd ed.). MESA Press.
Linacre, J. M. (2026). A user's guide to FACETS, version 4.5.0.
Winsteps.com. https://www.winsteps.com/facets.htm
(FACETS Table 12 corresponds to the fair-average
construction implemented here for RSM / PCM fits; the
slope-aware element-conditional construction for bounded GPCM
is documented in this help page.)
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573. doi:10.1007/BF02293814
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174. doi:10.1007/BF02296272
Muraki, E. (1992). A generalized partial credit model:
Application of an EM algorithm. Applied Psychological
Measurement, 16(2), 159-176. (Cited for the bounded GPCM
slope-aware extension.)
diagnose_mfrm(), unexpected_response_table(), displacement_table()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) t12 <- fair_average_table(fit, udecimals = 2) t12_native <- fair_average_table(fit, reference = "mean", label_style = "native") summary(t12) p_t12 <- plot(t12, draw = FALSE) p_t12$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) t12 <- fair_average_table(fit, udecimals = 2) t12_native <- fair_average_table(fit, reference = "mean", label_style = "native") summary(t12) p_t12 <- plot(t12, draw = FALSE) p_t12$data$plot
Build a compact direction table from fit_p_table() output. The table
separates underfit, overfit, mixed, and in_band labels so researchers
can report whether misfit is driven by noisy/unpredictable responses
(MnSq above the upper band) or overly predictable responses (MnSq below the
lower band).
fit_direction_summary( fit, diagnostics = NULL, scope = c("element", "person", "category"), p_adjust = "holm", alpha = 0.05, lower = NULL, upper = NULL, reference = c("mfrmr", "facets"), zstd_cap = c("auto", "none", "facets") )fit_direction_summary( fit, diagnostics = NULL, scope = c("element", "person", "category"), p_adjust = "holm", alpha = 0.05, lower = NULL, upper = NULL, reference = c("mfrmr", "facets"), zstd_cap = c("auto", "none", "facets") )
fit |
Output from |
diagnostics |
Optional output from |
scope |
Fit-statistic scope passed to |
p_adjust |
Multiplicity adjustment passed to |
alpha |
Screening alpha used for adjusted p-value counts. |
lower, upper
|
Optional MnSq screening band. Defaults to
|
reference |
Fit-statistic reference passed to |
zstd_cap |
ZSTD cap policy passed to |
Direction labels are MnSq-band labels. ZSTD and adjusted p values are counted
separately through PFlagN and PFlagRate; they do not define the
underfit/overfit direction. This keeps the substantive interpretation
stable when users compare reference = "mfrmr" with reference = "facets".
A data frame of class mfrm_fit_direction_summary with one row per
Scope x Facet x reference combination. Count columns report the number
of levels in each direction; rate columns use the number of classified
levels as denominator. The original fit-p table is stored in the detail
attribute.
fit_p_table(), plot_fit_direction_summary(),
summarize_simulation_misfit()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") dir_tab <- fit_direction_summary(fit, diagnostics = diag) dir_tab[, c("Facet", "UnderfitRate", "OverfitRate", "AnyMisfitRate")]toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") dir_tab <- fit_direction_summary(fit, diagnostics = diag) dir_tab[, c("Facet", "UnderfitRate", "OverfitRate", "AnyMisfitRate")]
This is the package entry point. It wraps mfrm_estimate() and defaults to
method = "MML". Any number of facet columns can be supplied via facets.
fit_mfrm( data, person, facets, score, rating_min = NULL, rating_max = NULL, weight = NULL, keep_original = FALSE, missing_codes = NULL, model = c("RSM", "PCM", "GPCM"), method = c("MML", "JML", "JMLE"), step_facet = NULL, slope_facet = NULL, facet_interactions = NULL, min_obs_per_interaction = 10, interaction_policy = c("warn", "error", "silent"), anchors = NULL, group_anchors = NULL, noncenter_facet = "Person", dummy_facets = NULL, positive_facets = NULL, anchor_policy = c("warn", "error", "silent"), min_common_anchors = 5L, min_obs_per_element = 30, min_obs_per_category = 10, quad_points = 31, maxit = 400, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid"), population_formula = NULL, person_data = NULL, person_id = NULL, population_policy = c("error", "omit"), facet_shrinkage = c("none", "empirical_bayes", "laplace"), facet_prior_sd = NULL, shrink_person = FALSE, attach_diagnostics = FALSE, checkpoint = NULL )fit_mfrm( data, person, facets, score, rating_min = NULL, rating_max = NULL, weight = NULL, keep_original = FALSE, missing_codes = NULL, model = c("RSM", "PCM", "GPCM"), method = c("MML", "JML", "JMLE"), step_facet = NULL, slope_facet = NULL, facet_interactions = NULL, min_obs_per_interaction = 10, interaction_policy = c("warn", "error", "silent"), anchors = NULL, group_anchors = NULL, noncenter_facet = "Person", dummy_facets = NULL, positive_facets = NULL, anchor_policy = c("warn", "error", "silent"), min_common_anchors = 5L, min_obs_per_element = 30, min_obs_per_category = 10, quad_points = 31, maxit = 400, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid"), population_formula = NULL, person_data = NULL, person_id = NULL, population_policy = c("error", "omit"), facet_shrinkage = c("none", "empirical_bayes", "laplace"), facet_prior_sd = NULL, shrink_person = FALSE, attach_diagnostics = FALSE, checkpoint = NULL )
data |
A data.frame in long format with one row per observed rating event. |
||||||||
person |
Column name for the person (character scalar). |
||||||||
facets |
Character vector of facet column names. |
||||||||
score |
Column name for the observed ordered category score. Values
must be coercible to numeric integer category codes. Fractional values are
rejected. Binary |
||||||||
rating_min |
Optional minimum category value. Supply this with
|
||||||||
rating_max |
Optional maximum category value. Supply this with
|
||||||||
weight |
Optional weight column name. |
||||||||
keep_original |
Logical. |
||||||||
missing_codes |
Optional pre-processing step that converts
sentinel missing-code values to
Replacement counts are recorded in |
||||||||
model |
|
||||||||
method |
|
||||||||
step_facet |
Step facet for |
||||||||
slope_facet |
Slope facet for the bounded |
||||||||
facet_interactions |
Optional confirmatory two-way interaction terms
between non-person facets, supplied as explicit character terms such as
|
||||||||
min_obs_per_interaction |
Minimum weighted observations recommended for
each interaction cell. Cells below this value are flagged in
|
||||||||
interaction_policy |
How to handle sparse interaction cells:
|
||||||||
anchors |
Optional anchor table. |
||||||||
group_anchors |
Optional group-anchor table. |
||||||||
noncenter_facet |
One facet to leave non-centered. |
||||||||
dummy_facets |
Facets to fix at zero. |
||||||||
positive_facets |
Facets with positive orientation. |
||||||||
anchor_policy |
How to handle anchor-audit issues: |
||||||||
min_common_anchors |
Minimum anchored levels per linking facet used in anchor-audit recommendations. |
||||||||
min_obs_per_element |
Minimum weighted observations per facet level used in anchor-audit recommendations. |
||||||||
min_obs_per_category |
Minimum weighted observations per score category used in anchor-audit recommendations. |
||||||||
quad_points |
Integer number of Gauss-Hermite quadrature points
used for MML integration over the person distribution. Default is
Internal benchmarks show the marginal log-likelihood still drifts
by ~0.5-1 logit between |
||||||||
maxit |
Maximum optimizer iterations. |
||||||||
reltol |
Optimization tolerance. |
||||||||
mml_engine |
MML optimization engine for |
||||||||
population_formula |
Optional one-sided formula for a person-level
latent-regression population model, for example |
||||||||
person_data |
Optional one-row-per-person data.frame holding background
variables for |
||||||||
person_id |
Optional person-ID column in |
||||||||
population_policy |
How missing background data are handled for a
latent-regression fit. |
||||||||
facet_shrinkage |
Character. |
||||||||
facet_prior_sd |
Optional numeric scalar. When supplied, the
shrinkage prior variance is fixed at |
||||||||
shrink_person |
Logical. When |
||||||||
attach_diagnostics |
Logical. When |
||||||||
checkpoint |
Optional |
Data must be in long format (one row per observed rating event).
An object of class mfrm_fit (named list) with:
summary: one-row model summary (LogLik, AIC, BIC, convergence)
including public Method, internal MethodUsed, and
MMLEngineRequested, MMLEngineUsed, and EMIterations for MML fits
facets$person: person estimates (Estimate; plus SD for MML)
facets$others: facet-level estimates for each facet
steps: estimated threshold/step parameters as a one-row-per-step
tibble with Estimate only. No SE column is currently
provided; standard errors for steps are not in the structural
Hessian block exposed by this release. Treat the values as
point estimates; for step-structure quality, use the
step-collapse and disordering warnings from diagnose_mfrm() and
category_structure_report().
slopes: estimated discrimination parameters for GPCM fits as
a one-row-per-slope-element tibble with LogEstimate and
Estimate only. No SE column is currently provided; the
identification convention pins the geometric mean of slopes at 1,
and the structural Hessian block exposed by this release does
not include slope SEs. Treat the values as point estimates.
interactions: model-estimated facet interaction effects and metadata
when facet_interactions is supplied
population: population-model metadata. Ordinary fits keep an inactive
scaffold (active = FALSE, posterior_basis = "legacy_mml"). Active
latent-regression fits store the fitted design matrix, regression
coefficients, residual variance, omission audit, the complete-case
estimation table (person_table), and the observed-person-aligned
replay/export provenance table retained before complete-case omission
(person_table_replay), plus stored categorical xlevels / contrasts
for model-matrix replay and scoring, together with
posterior_basis = "population_model".
config: resolved model configuration used for estimation
(includes config$anchor_audit)
prep: preprocessed data/level metadata
opt: raw optimizer result from stats::optim()
fit_mfrm() estimates the many-facet Rasch model (Linacre, 1989).
For a two-facet design (rater , criterion ) the model is:
where is person ability, rater severity,
criterion difficulty, and the -th
Rasch-Andrich threshold. Any number of facets may be specified via the
facets argument; each enters as an additive term in the linear
predictor .
With model = "RSM", thresholds are shared across all
levels of all facets.
With model = "PCM", each level of step_facet receives its own
threshold vector on the package's shared observed
score scale.
With only two ordered categories (), the same adjacent-category
formulation reduces to the usual binary Rasch logit for the single category
boundary:
With method = "MML", person parameters are integrated out using
Gauss-Hermite quadrature and EAP estimates are computed post-hoc.
With method = "JML", all parameters are estimated jointly as fixed
effects. "JMLE" remains an accepted compatibility alias, but package
output now uses "JML" as the public label. See the "Estimation methods"
section of mfrmr-package for details.
mfrmr treats RSM / PCM as the equal-weighting reference route for
operational many-facet measurement. In that Rasch-family branch,
discrimination is fixed, so the scoring model does not differentially
reweight item-facet combinations through estimated slopes.
bounded GPCM is supported as an alternative when users explicitly accept
discrimination-based reweighting. This often improves model fit, but the
package does not treat better fit alone as a sufficient reason to replace an
equal-weighting Rasch-family model.
The weight argument is separate from that modeling choice. It supplies an
observation-weight column; it does not create a free-form facet-weighting
scheme and does not change the fixed-discrimination contract of RSM /
PCM.
Minimum required columns are:
person identifier (person)
one or more facet identifiers (facets)
observed score (score)
Scores are treated as ordered categories. Non-numeric score labels are dropped with a warning after coercion, whereas fractional numeric scores are rejected with an error instead of being silently truncated.
MFRM assumes conditional independence of observations given the person
and facet parameters (Linacre, 1989). Repeated ratings of the same
person-criterion combination by the same rater violate this assumption.
When such structures may be present, follow fitting with
diagnose_mfrm(fit, diagnostic_mode = "both"); its
strict_pairwise_local_dependence screen is an exploratory check for
residual dependence beyond what the additive linear predictor absorbs.
Binary responses are therefore supported as ordered two-category scores
(for example 0/1 or 1/2) under the same RSM / PCM interface.
If your observed categories do not start at 0, set rating_min/rating_max
explicitly to avoid unintended recoding assumptions. For example, if the
intended instrument is a 1-5 scale but the current sample only uses 2-5,
set rating_min = 1, rating_max = 5 to retain the zero-count category 1
in the score support.
When keep_original = FALSE, observed gaps such as 1, 3, 5 are recoded
internally to a contiguous scale (1, 2, 3) and the mapping is stored in
fit$prep$score_map. To retain zero-count intermediate categories as part
of the original scale, set keep_original = TRUE in addition to supplying
the full rating_min / rating_max range.
fit_mfrm() follows the Linacre (1989) many-facet Rasch specification:
person ability is integrated out under a N(0, 1) prior (or under the
N(X\beta, \sigma^2) latent-regression population model when
population_formula is supplied), but every facet parameter
(Rater, Criterion, Task, ...) is estimated as a fixed effect
identified by a sum-to-zero constraint. There is no hierarchical
prior, no shrinkage, and no variance component for the facets.
Practical implication: when a facet has very few observed levels (for example 3 raters) or some of its levels have very few ratings (for example 5 ratings per rater), the fixed-effect estimates retain wide SEs, and extreme estimates are not pulled toward the facet mean. Jones and Wind (2018) note that rater estimates in particular are "more sensitive to link reductions" than examinee or task estimates. For a publication-workflow audit of this, use:
facet_small_sample_audit() for per-level N and SE bands against
Linacre (1994) sample-size guidelines.
detect_facet_nesting() and
analyze_hierarchical_structure() when raters are nested in
regions, schools, or other strata that the additive fixed-effects
MFRM cannot partition out.
compute_facet_icc() and
compute_facet_design_effect() for descriptive variance-
component summaries based on lme4 (optional).
fit$summary$FacetSampleSizeFlag summarizes the worst Linacre band
across non-person facet levels ("sparse" < 10, "marginal" < 30,
"standard" < 50, "strong" >= 50).
Joint maximum likelihood (method = "JML" / "JMLE") estimates
both the structural parameters (facets, thresholds, slopes) and
every person measure as fixed parameters in one optimization. This
is the incidental-parameter problem of Neyman & Scott (1948):
the structural parameter estimates are inconsistent as the number
of persons grows with the number of items per person held fixed,
carrying a bias of order (where is the number of
items per person) that does not vanish with sample size. Wright &
Stone (1979) and Wright & Masters (1982, ch. 5) document an
empirical correction that approximately removes the
bias for the dichotomous Rasch model; mfrmr does not apply
that correction (no bias_correction argument exists). The JML
branch also does not produce a profile-likelihood Hessian for the
structural parameters: SEs reported under JML are observation-table
approximations () and are
marked as exploratory in the diagnostics output.
Practical recommendation:
Use method = "MML" for any value reported in a manuscript
or operational decision. MML integrates the person measures out
under a population prior and produces consistent structural
estimates with marginal observed-information SEs.
Use method = "JML" only for fast exploratory iteration, the
classical FACETS-style workflow, or contexts where the bias is
tolerable (large per person, descriptive screening, or
teaching).
When a third-party CML estimator is needed (the only consistent
Rasch-family estimator under the incidental-parameter setting),
fit with eRm and import via import_erm_fit().
facet_interactions adds confirmatory fixed-effect interaction terms to the
linear predictor. For example, facet_interactions = "Rater:Criterion"
estimates a rater-by-criterion deviation matrix in the same likelihood as
the main MFRM fit. The additive reference is
and the interaction extension is
where the interaction block is identified by zero marginal sums:
With levels of the first facet and levels of the second
facet, this contributes free parameters. Positive
interaction estimates indicate scores higher than expected under the
additive main-effects model for that facet-level combination; negative
estimates indicate lower-than-expected scores.
This is a model-estimated interaction term, not the residual screening
reported by estimate_bias() or estimate_all_bias(). In line with the
MFRM bias-interaction literature, the facet pair should be named explicitly
before fitting. Exploratory use is possible, but should be reported as
screening, with sparse-cell and multiplicity caveats. The current
implementation is intentionally narrow: two-way non-person facet
interactions for RSM and PCM only, estimated as fixed effects. GPCM
interactions, person interactions, higher-order interactions, and
random-effect facet interactions are deferred.
This is ordered binary support, not a separate nominal-response model.
In PCM, a binary fit still uses one threshold per step_facet level on
the shared observed-score scale.
Supported model/estimation combinations in the current release:
model = "RSM" with method = "MML" or "JML"/"JMLE"
model = "PCM" with a designated step_facet (defaults to first facet)
facet_interactions with model = "RSM" or "PCM" for explicit
two-way non-person facet interactions
model = "GPCM" is currently implemented only for the narrow bounded
branch with slope_facet == step_facet; MML and JML fitting, core
summaries, fixed-calibration posterior scoring, compute_information(),
Wright/pathway/CCC fit plots, diagnose_mfrm(), residual-PCA follow-up,
interrater_agreement_table(), unexpected_response_table(),
displacement_table(), measurable_summary_table(),
rating_scale_table(), facet_quality_dashboard(),
reporting_checklist(), category_structure_report(),
category_curves_report(), and graph-only
facets_output_file_bundle() are available. Direct simulation
specifications and data generation are also supported through
build_mfrm_sim_spec(), extract_mfrm_sim_spec(), and
simulate_mfrm_data() when the slope-aware generator contract is stored
explicitly. Fair-average reporting, planning/forecasting, scorefile
exports, and broader APA/QC pipelines should still be treated as
unsupported unless documented otherwise. Use gpcm_capability_matrix() as
the formal boundary statement for the current GPCM scope.
Latent-regression status:
population_formula = NULL keeps the legacy unconditional MML / JML
behavior.
Supplying population_formula activates a first-version latent-regression
branch for method = "MML" only.
The current implementation assumes a one-dimensional conditional-normal
population model with person-specific quadrature nodes
.
Background variables must be supplied in person_data; numeric/logical
columns and categorical factor/character columns are expanded through
stats::model.matrix().
Current overlap with the ConQuest latent-regression documentation is
limited to direct estimation from response data under a unidimensional
MML population model with package-built model-matrix covariates. It
should not be described as parity for arbitrary imported design matrices,
multidimensional models, or the full ConQuest plausible-values workflow.
predict_mfrm_units() and sample_mfrm_plausible_values() can score
latent-regression fits under the fitted population model, but they require
one-row-per-person background data for scored units when the fitted
population model includes covariates. Intercept-only latent-regression
fits (population_formula = ~ 1) can reconstruct that minimal person
table internally during scoring.
For a first latent-regression run, keep the setup explicit:
Put response data in data, with one row per rating event.
Put background variables in person_data, with exactly one row per
person. The ID column must match person, or be supplied through
person_id.
Use method = "MML" and a one-sided formula such as
population_formula = ~ Grade + Group.
Numeric/logical and factor/character predictors are expanded with
stats::model.matrix(). After fitting, inspect
summary(fit)$population_coding to see the fitted levels, contrasts, and
encoded design columns that will be reused for scoring/replay.
Start with population_policy = "error" while preparing data. Use
"omit" only when complete-case removal is intended, and then inspect
summary(fit)$population_overview and summary(fit)$caveats before
reporting results.
Report summary(fit)$population_coefficients as coefficients of the
conditional-normal latent population model, not as a post hoc regression
on EAP or MLE scores.
summary(fit)$population_coefficients reports point estimates of
and only. mfrmr does
not currently compute standard errors, confidence intervals, or
asymptotic z / Wald statistics for the population-model parameters: no
Hessian on is extracted from the
marginal log-likelihood, and no vcov() method is exposed for these
coefficients. Treat the coefficient table as point estimates suitable
for descriptive reporting; do not quote bounds because the SE column is not provided. A
marginal-Hessian-based SE for is
planned for a future release.
Identification: the latent-regression intercept is identifiable only
under the default noncenter_facet = "Person" (which sum-to-zero-
centers all non-Person facets). If you re-anchor identification on a
non-Person facet, the intercept becomes confounded with the freed
Person-facet mean and the coefficient table becomes unidentified;
mfrmr does not currently warn about this failure mode in the
design-matrix audit.
Anchor inputs are optional:
anchors should contain facet/level/fixed-value information.
group_anchors should contain facet/level/group/group-value information.
Both are normalized internally, so column names can be flexible
(facet, level, anchor, group, groupvalue, etc.).
Anchor audit behavior:
fit_mfrm() runs an internal anchor audit.
invalid rows are removed before estimation.
duplicate rows keep the last occurrence for each key.
anchor_policy controls whether detected issues are warned, treated as
errors, or kept silent.
Facet sign orientation:
facets listed in positive_facets are treated as +1
all other facets are treated as -1
This affects interpretation of reported facet measures.
For exploratory work, method = "JML" is usually faster than method = "MML",
but it may require a larger maxit to converge on larger datasets.
For MML runs, quad_points is the main accuracy/speed trade-off.
The @param quad_points tier table is the authoritative reference;
in short:
quad_points = 7 is a lightweight setting for quick iteration.
quad_points = 15 is an intermediate option when runtime matters.
quad_points = 31 is the package default and the publication
tier: the marginal log-likelihood is stable enough for direct
manuscript reporting.
quad_points = 61 (or higher) is reserved for ultra-precise
benchmarking on very narrow score supports.
mml_engine = "direct" remains the most stable general-purpose path.
mml_engine = "em" or "hybrid" currently target RSM / PCM fits
without a latent-regression population model.
Benchmark your own workload before using mml_engine = "em" or
"hybrid" for final reporting; direct remains the safer default when
you have not compared engines for your data.
For RSM and PCM fits only, an opt-in C++ MML backend can be
enabled with options(mfrmr.use_cpp11_backend = TRUE). The
backend implements the same physicist Gauss-Hermite quadrature and
sum-to-zero identification as the pure-R engine, validated against
the pure-R reference at tolerance = 1e-12 on a fixed regression
fixture. It is opt-in for this release; the default flip to ON is
planned for a follow-up release after a cycle of community
testing. GPCM fits stay on the pure-R engine regardless of the
option.
Downstream diagnostics can also be staged:
use diagnose_mfrm(fit, residual_pca = "none") for a quick first pass
add residual PCA only when you need exploratory residual-structure evidence
Downstream diagnostics report ModelSE / RealSE columns and related
reliability indices. For MML, non-person facet ModelSE values are based
on the observed information of the marginal log-likelihood and person rows
use posterior SDs from EAP scoring. For JML, these quantities remain
exploratory approximations and should not be treated as equally formal.
For bounded GPCM, residual-based mean-square fit screens are also
best treated as exploratory diagnostics rather than strict Rasch-style
invariance tests, because the discrimination parameter is free.
A typical first-pass read is:
fit$summary for convergence and global fit indicators.
summary(fit) for human-readable overviews.
for RSM / PCM, diagnose_mfrm(fit) for element-level fit,
approximate separation/reliability, and warning tables.
for bounded GPCM, use diagnose_mfrm() and the residual-based
table helpers as exploratory screens, together with posterior scoring /
compute_information() where documented.
Fit the model with fit_mfrm(...).
Validate convergence and scale structure with summary(fit).
For RSM / PCM, run diagnose_mfrm() and proceed to reporting with
build_apa_outputs().
For bounded GPCM, use the fitted object, slope summary,
diagnose_mfrm(), residual-based table helpers, posterior scoring
helpers, and compute_information() while broader downstream
validation is still being completed. Use gpcm_capability_matrix() to
confirm which helper families are currently supported, caveated, blocked,
or deferred.
The ordered-category many-facet formulation follows Linacre (1989), with
the RSM and PCM branches grounded in Andrich (1978) and Masters (1982).
The bounded GPCM branch follows the generalized partial credit
formulation of Muraki (1992) under a package-specific positive
log-slope identification convention. The MML route follows the
quadrature-based marginal-likelihood framework of Bock and Aitkin (1981).
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176.
Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101-139.
diagnose_mfrm(), estimate_bias(), build_apa_outputs(),
gpcm_capability_matrix, mfrmr_workflow_methods,
mfrmr_reporting_and_apa
# Fast smoke run: a JML fit on the bundled `example_core` toy # dataset finishes in well under a second and returns a populated # `summary` overview ready for inspection. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) fit_quick$summary[, c("Model", "Method", "N", "Converged")] # Full run with the package default MML estimator (recommended for # final reporting because person parameters are integrated out under # an N(0, 1) prior). The default `quad_points = 31` is the # publication tier; `quad_points = 7` below is an exploratory speed # setting and should not be used as the final manuscript fit. fit <- fit_mfrm( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", model = "RSM", quad_points = 7, maxit = 25 ) fit$summary s_fit <- summary(fit) s_fit$overview[, c("Model", "Method", "Converged")] # Look for: Converged = TRUE. If FALSE, raise `maxit`, relax `reltol`, # or inspect `summary(fit)$key_warnings` for sparse-cell or # identification flags. s_fit$person_overview # Look for: Mean ~ 0 logits and SD ~ 1 logit are typical when the # sample is centred on the test difficulty. SD < 0.5 suggests the # test is too easy / hard for this group; SD > 1.5 suggests strong # targeting mismatch or extreme-score persons (see `Extreme` flag). s_fit$targeting # Look for: |Targeting| < ~0.5 logits is comfortable; larger absolute # values mean persons sit systematically above or below the facet # means under the package's sum-to-zero identification. p_fit <- plot(fit, draw = FALSE) p_fit$wright_map$data$plot # JML is available for exploratory / fast iteration passes: fit_jml <- fit_mfrm( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "JML", model = "RSM", maxit = 25 ) summary(fit_jml)$overview[, c("Model", "Method", "Converged")] # Latent regression (MML only) uses person-level background variables: person_tbl <- unique(toy[c("Person")]) person_tbl$Grade <- seq_len(nrow(person_tbl)) person_tbl$Group <- rep(c("A", "B"), length.out = nrow(person_tbl)) fit_pop <- fit_mfrm( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", population_formula = ~ Grade + Group, person_data = person_tbl ) summary(fit_pop)$population_overview summary(fit_pop)$population_coding # Binary responses are supported as ordered two-category scores: set.seed(1) binary_toy <- expand.grid( Person = paste0("P", 1:30), Item = paste0("I", 1:4), stringsAsFactors = FALSE ) theta <- stats::rnorm(length(unique(binary_toy$Person))) beta <- seq(-0.8, 0.8, length.out = length(unique(binary_toy$Item))) eta <- theta[match(binary_toy$Person, unique(binary_toy$Person))] - beta[match(binary_toy$Item, unique(binary_toy$Item))] binary_toy$Score <- stats::rbinom(nrow(binary_toy), 1, stats::plogis(eta)) fit_binary <- fit_mfrm( data = binary_toy, person = "Person", facets = "Item", score = "Score", model = "RSM", method = "JML", maxit = 50 ) fit_binary$summary[, c("Model", "Categories", "Converged")] # Next steps after fitting: diag <- diagnose_mfrm(fit, residual_pca = "none") chk <- reporting_checklist(fit, diagnostics = diag) head(chk$checklist[, c("Section", "Item", "DraftReady")])# Fast smoke run: a JML fit on the bundled `example_core` toy # dataset finishes in well under a second and returns a populated # `summary` overview ready for inspection. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) fit_quick$summary[, c("Model", "Method", "N", "Converged")] # Full run with the package default MML estimator (recommended for # final reporting because person parameters are integrated out under # an N(0, 1) prior). The default `quad_points = 31` is the # publication tier; `quad_points = 7` below is an exploratory speed # setting and should not be used as the final manuscript fit. fit <- fit_mfrm( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", model = "RSM", quad_points = 7, maxit = 25 ) fit$summary s_fit <- summary(fit) s_fit$overview[, c("Model", "Method", "Converged")] # Look for: Converged = TRUE. If FALSE, raise `maxit`, relax `reltol`, # or inspect `summary(fit)$key_warnings` for sparse-cell or # identification flags. s_fit$person_overview # Look for: Mean ~ 0 logits and SD ~ 1 logit are typical when the # sample is centred on the test difficulty. SD < 0.5 suggests the # test is too easy / hard for this group; SD > 1.5 suggests strong # targeting mismatch or extreme-score persons (see `Extreme` flag). s_fit$targeting # Look for: |Targeting| < ~0.5 logits is comfortable; larger absolute # values mean persons sit systematically above or below the facet # means under the package's sum-to-zero identification. p_fit <- plot(fit, draw = FALSE) p_fit$wright_map$data$plot # JML is available for exploratory / fast iteration passes: fit_jml <- fit_mfrm( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "JML", model = "RSM", maxit = 25 ) summary(fit_jml)$overview[, c("Model", "Method", "Converged")] # Latent regression (MML only) uses person-level background variables: person_tbl <- unique(toy[c("Person")]) person_tbl$Grade <- seq_len(nrow(person_tbl)) person_tbl$Group <- rep(c("A", "B"), length.out = nrow(person_tbl)) fit_pop <- fit_mfrm( data = toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", population_formula = ~ Grade + Group, person_data = person_tbl ) summary(fit_pop)$population_overview summary(fit_pop)$population_coding # Binary responses are supported as ordered two-category scores: set.seed(1) binary_toy <- expand.grid( Person = paste0("P", 1:30), Item = paste0("I", 1:4), stringsAsFactors = FALSE ) theta <- stats::rnorm(length(unique(binary_toy$Person))) beta <- seq(-0.8, 0.8, length.out = length(unique(binary_toy$Item))) eta <- theta[match(binary_toy$Person, unique(binary_toy$Person))] - beta[match(binary_toy$Item, unique(binary_toy$Item))] binary_toy$Score <- stats::rbinom(nrow(binary_toy), 1, stats::plogis(eta)) fit_binary <- fit_mfrm( data = binary_toy, person = "Person", facets = "Item", score = "Score", model = "RSM", method = "JML", maxit = 50 ) fit_binary$summary[, c("Model", "Categories", "Converged")] # Next steps after fitting: diag <- diagnose_mfrm(fit, residual_pca = "none") chk <- reporting_checklist(fit, diagnostics = diag) head(chk$checklist[, c("Section", "Item", "DraftReady")])
Builds a compact fit table with TAM-style columns for mfrmr non-person
element-level, person-level, or score-category fit statistics. The output
keeps the familiar
Outfit, Outfit_t, Outfit_p, Infit, Infit_t, and Infit_p
layout, while also adding p-value adjustment columns and the active
MisfitDirection label used elsewhere in mfrmr.
fit_p_table( fit, diagnostics = NULL, scope = c("element", "person", "category"), p_adjust = "holm", alpha = 0.05, lower = NULL, upper = NULL, reference = c("mfrmr", "facets"), zstd_cap = c("auto", "none", "facets") )fit_p_table( fit, diagnostics = NULL, scope = c("element", "person", "category"), p_adjust = "holm", alpha = 0.05, lower = NULL, upper = NULL, reference = c("mfrmr", "facets"), zstd_cap = c("auto", "none", "facets") )
fit |
Output from |
diagnostics |
Optional output from |
scope |
Which fit surface to report: |
p_adjust |
P-value adjustment method passed to |
alpha |
Significance level used for logical p-value flags. |
lower, upper
|
Optional MnSq screening band. |
reference |
Fit-standardization reference. |
zstd_cap |
ZSTD cap policy. |
fit_p_table() is intentionally a reporting and screening table, not a new
model-fit estimator. For a reported set of rows , let
, ,
, and be the case weight. mfrmr's
mean-square summaries are
and
The exported Outfit_t and Infit_t columns are mfrmr's existing
standardized transformations (OutfitZSTD, InfitZSTD). With the default
diagnostics these use the Wilson-Hilferty cube-root approximation
where for outfit and for
infit. The displayed p-values are two-sided normal-tail approximations,
followed by stats::p.adjust() for *_p_adj.
With reference = "facets", mfrmr recomputes only the df and ZSTD layer.
Let be the model fourth central moment. The
FACETS-style df values are
and
This branch is intended for FACETS migration and parity checks in RSM/PCM. It does not change the fitted estimates or the MnSq values.
The column names intentionally resemble TAM::tam.fit() output, but the
values are not guaranteed to equal TAM values. TAM's MML fit route is
simulation/posterior based and can evaluate item, facet, or contrast
hypotheses through its fit matrix interface. Likewise, mirt::itemfit()
treats S_X2, X2, G2, and infit as distinct item-fit families; the
mfrmr p-values here are not S_X2 chi-square p-values.
A data frame with TAM-style fit columns plus mfrmr screening columns:
Scope, parameter, Facet, Level, N, Outfit, Outfit_t,
Outfit_p, Outfit_p_adj, Infit, Infit_t, Infit_p, Infit_p_adj,
DF_Outfit, DF_Infit, DFMethod, FitReference, ZSTDCap,
MisfitDirection, MnSqFlag, PFlag, and PAdjustMethod. When
p_adjust = "holm", compatibility aliases Outfit_pholm and
Infit_pholm are also included.
Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. doi:10.18637/jss.v048.i06
mirt itemfit() reference:
https://philchalmers.github.io/mirt/docs/reference/itemfit.html
TAM tam.fit() reference:
https://alexanderrobitzsch.github.io/TAM/reference/tam.fit.html
TAM msq.itemfit() reference:
https://alexanderrobitzsch.github.io/TAM/reference/msq.itemfit.html
diagnose_mfrm(), plot_empirical_fit(), plot_person_fit()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") tab <- fit_p_table(fit, diagnostics = diag) head(tab[, c("parameter", "Outfit", "Outfit_p", "Outfit_p_adj", "Infit", "Infit_p", "Infit_p_adj", "MisfitDirection")]) facets_tab <- fit_p_table(fit, diagnostics = diag, reference = "facets") head(facets_tab[, c("parameter", "DFMethod", "Outfit_t", "Outfit_p")]) fit_p_table(fit, diagnostics = diag, scope = "person")[1:3, ] ## Not run: # Optional orientation against mirt/TAM on the same wide item matrix. # Similar estimates can still yield different fit p values because each # package uses its own estimation, scoring, and fit-test conventions. toy$Score0 <- toy$Score - 1L toy$Item <- paste(toy$Rater, toy$Criterion, sep = "__") wide <- reshape(toy[, c("Person", "Item", "Score0")], idvar = "Person", timevar = "Item", direction = "wide") rownames(wide) <- wide$Person resp <- wide[, setdiff(names(wide), "Person")] names(resp) <- sub("^Score0\\.", "", names(resp)) if (requireNamespace("TAM", quietly = TRUE)) { tam_fit <- TAM::tam.mml(resp = resp, irtmodel = "PCM2", verbose = FALSE) summary(TAM::tam.fit(tam_fit, progress = FALSE)) } if (requireNamespace("mirt", quietly = TRUE)) { mirt_fit <- mirt::mirt(resp, 1, itemtype = "Rasch", verbose = FALSE) mirt::itemfit(mirt_fit, fit_stats = "infit", method = "ML") mirt::itemfit(mirt_fit, fit_stats = "S_X2") } ## End(Not run)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") tab <- fit_p_table(fit, diagnostics = diag) head(tab[, c("parameter", "Outfit", "Outfit_p", "Outfit_p_adj", "Infit", "Infit_p", "Infit_p_adj", "MisfitDirection")]) facets_tab <- fit_p_table(fit, diagnostics = diag, reference = "facets") head(facets_tab[, c("parameter", "DFMethod", "Outfit_t", "Outfit_p")]) fit_p_table(fit, diagnostics = diag, scope = "person")[1:3, ] ## Not run: # Optional orientation against mirt/TAM on the same wide item matrix. # Similar estimates can still yield different fit p values because each # package uses its own estimation, scoring, and fit-test conventions. toy$Score0 <- toy$Score - 1L toy$Item <- paste(toy$Rater, toy$Criterion, sep = "__") wide <- reshape(toy[, c("Person", "Item", "Score0")], idvar = "Person", timevar = "Item", direction = "wide") rownames(wide) <- wide$Person resp <- wide[, setdiff(names(wide), "Person")] names(resp) <- sub("^Score0\\.", "", names(resp)) if (requireNamespace("TAM", quietly = TRUE)) { tam_fit <- TAM::tam.mml(resp = resp, irtmodel = "PCM2", verbose = FALSE) summary(TAM::tam.fit(tam_fit, progress = FALSE)) } if (requireNamespace("mirt", quietly = TRUE)) { mirt_fit <- mirt::mirt(resp, 1, itemtype = "Rasch", verbose = FALSE) mirt::itemfit(mirt_fit, fit_stats = "infit", method = "ML") mirt::itemfit(mirt_fit, fit_stats = "S_X2") } ## End(Not run)
Public capability map for the current GPCM scope in mfrmr.
Use this helper when you need to answer a practical question quickly:
which GPCM workflows are formally supported in the current core package,
which are available only with explicit caveats, and which helpers remain
blocked or deferred.
The matrix is intentionally conservative. It is a release-scope statement,
not a list of every internal code path that happens to run. If a helper is
not yet covered by the current validation boundary, it is listed as
blocked or deferred even when some lower-level components already exist.
gpcm_capability_matrix( status = c("all", "supported", "supported_with_caveat", "blocked", "deferred") )gpcm_capability_matrix( status = c("all", "supported", "supported_with_caveat", "blocked", "deferred") )
status |
Which rows to return: |
The current release treats GPCM as a bounded supported scope inside the
core R package:
fitting and core summaries are supported,
posterior-scoring and information helpers are supported,
residual-based diagnostics and strict marginal follow-up are supported as exploratory screens,
direct slope-aware simulation-spec generation is supported,
fair-average, visual-summary, and QC routes are available with explicit GPCM caveats,
APA writer, package-native export/replay bundles, and simulation/refit planning / forecasting helpers are available with explicit caveats,
FACETS score-side compatibility remains outside the validated GPCM
boundary.
Why some helpers remain blocked:
FACETS compatibility-contract score exports depend on Rasch-family
measure-to-score semantics that are not yet generalized to the
free-discrimination GPCM branch;
fair-average, visual-summary, QC, APA, and export helpers are therefore exposed as caveated GPCM routes, not as FACETS/Rasch fair-M invariance or full joint-uncertainty evidence;
planning and forecasting are simulation/refit screening routes under the current role-based person x rater-like x criterion-like contract, not a fully arbitrary-facet planner.
This boundary is aligned with the package's current validation evidence,
including the targeted GPCM recovery snapshot and the public-workflow
regression tests.
A data.frame with one row per public helper family and columns:
Area
Helpers
Status
PrimaryUse
Boundary
Evidence
Call gpcm_capability_matrix() before using GPCM in a new workflow.
Stay on rows marked supported or supported_with_caveat for the
current release.
Treat blocked rows as explicit non-support, not as temporary omissions.
Treat deferred rows as future-extension targets rather than part of the
current package promise.
fit_mfrm(), diagnose_mfrm(), compute_information(),
predict_mfrm_units(), sample_mfrm_plausible_values(),
reporting_checklist(), mfrmr_workflow_methods, mfrmr-package
gpcm_capability_matrix() gpcm_capability_matrix("supported") gpcm_capability_matrix("blocked")gpcm_capability_matrix() gpcm_capability_matrix("supported") gpcm_capability_matrix("blocked")
eRm fit to an mfrmr-compatible bundleExtracts item / person parameters from an eRm::PCM() /
eRm::RM() fit. Same caveats as import_mirt_fit().
import_erm_fit(fit, model = c("RSM", "PCM", "GPCM"), item_facet = "Item")import_erm_fit(fit, model = c("RSM", "PCM", "GPCM"), item_facet = "Item")
fit |
An object returned by |
model |
Same as |
item_facet |
Name to assign to the item facet. |
An mfrm_imported_fit object.
import_mirt_fit(), import_tam_fit()
mirt fit to an mfrmr-compatible bundleExtracts item, step, and person parameters from a mirt::mirt()
fit and returns an mfrm_imported_fit object. The returned
object has the public slots summary, facets$person,
facets$others, steps, config, and source that the mfrmr
plot and table helpers expect. With compute_fit = TRUE the
importer also runs mirt::itemfit() and mirt::personfit() so
Infit / Outfit columns are populated, and synthesises a
mfrm_diagnostics-shape diagnostics slot consumable by
downstream plot helpers (Wright map, QC dashboard, etc.).
import_mirt_fit( fit, model = c("RSM", "PCM", "GPCM"), item_facet = "Item", compute_fit = FALSE )import_mirt_fit( fit, model = c("RSM", "PCM", "GPCM"), item_facet = "Item", compute_fit = FALSE )
fit |
An object returned by |
model |
One of |
item_facet |
Name to assign to the item facet in the
imported bundle (default |
compute_fit |
Logical. When |
An mfrm_imported_fit object. Slots:
summaryModel / method / N / LogLik / AIC / BIC.
facets$personPerson ID, Estimate, SE, Extreme, plus
Infit / Outfit / OutfitZSTD / Zh when compute_fit = TRUE.
facets$othersItem-level estimates and slopes; with
compute_fit = TRUE, also Infit / Outfit / S_X2 / RMSEA / df
from mirt::itemfit().
stepsPer-item threshold parameters extracted from the
IRT parameterisation (b1, ..., b(K-1)).
configList with the resolved model and item_facet
used for the import; downstream plot and table helpers consult
this to dispatch correctly on the imported bundle.
diagnosticsmfrm_diagnostics-shape bundle when
compute_fit = TRUE; NULL otherwise.
sourceImported-from metadata.
Bundles bias / DIF / anchor / replay slots are explicitly not populated; full bidirectional import / export is planned for a future release.
import_tam_fit(), import_erm_fit()
TAM fit to an mfrmr-compatible bundleExtracts item / step / person parameters from a TAM::tam.mml(),
TAM::tam.jml(), or TAM::tam.mml.mfr() fit. The multi-facet
tam.mml.mfr() path is detected automatically and each
non-person facet is mapped onto a row of fit$facets$others
so downstream MFRM helpers (e.g. plot_qc_dashboard()) work
on the imported object.
import_tam_fit( fit, model = c("RSM", "PCM", "GPCM"), item_facet = "Item", compute_fit = FALSE )import_tam_fit( fit, model = c("RSM", "PCM", "GPCM"), item_facet = "Item", compute_fit = FALSE )
fit |
An object returned by |
model |
Same as |
item_facet |
Name to assign to the item facet for the
single-facet path. Ignored when the input is a multi-facet
|
compute_fit |
Logical. When |
An mfrm_imported_fit object. Slots mirror
import_mirt_fit().
import_mirt_fit(), import_erm_fit()
interaction_effect_table() returns the fixed-effect interaction block
estimated by fit_mfrm() when facet_interactions is supplied. These are
model-estimated deviations from the additive main-effects MFRM, not the
residual screening statistics returned by estimate_bias().
interaction_effect_table(fit)interaction_effect_table(fit)
fit |
An |
The current release supports two-way interactions between non-person facets,
for example facet_interactions = "Rater:Criterion". Each interaction matrix
is identified by zero marginal sums across both participating facets, so the
interaction estimates are separable from the two main effects. Positive values
indicate higher-than-expected scores for the facet-level combination under the
additive model; negative values indicate lower-than-expected scores.
Use this table for confirmatory model review after specifying the facet pair
of substantive interest. For exploratory screening without adding parameters
to the fitted model, use estimate_bias() or estimate_all_bias().
A tibble with one row per interaction cell. Returns an empty tibble when the fit has no model-estimated facet interactions.
fit_mfrm(), estimate_bias(), compare_mfrm()
Build an inter-rater agreement report
interrater_agreement_table( fit, diagnostics = NULL, rater_facet = NULL, context_facets = NULL, exact_warn = 0.5, corr_warn = 0.3, include_precision = TRUE, top_n = NULL )interrater_agreement_table( fit, diagnostics = NULL, rater_facet = NULL, context_facets = NULL, exact_warn = 0.5, corr_warn = 0.3, include_precision = TRUE, top_n = NULL )
fit |
Output from |
diagnostics |
Optional output from |
rater_facet |
Name of the rater facet. If |
context_facets |
Optional context facets used to match observations for
agreement. If |
exact_warn |
Warning threshold for exact agreement. |
corr_warn |
Warning threshold for pairwise correlation. |
include_precision |
If |
top_n |
Optional maximum number of pair rows to keep. |
This helper computes pairwise rater agreement on matched contexts and returns both a pair-level table and a one-row summary. The output is package-native and does not require knowledge of legacy report numbering.
A named list with:
summary: one-row inter-rater summary
pairs: pair-level agreement table
settings: applied options and thresholds
summary: overall agreement level, number/share of flagged pairs.
pairs: pairwise exact agreement, correlation, and direction/size gaps.
settings: applied facet matching and warning thresholds.
Pairs flagged by both low exact agreement and low correlation generally deserve highest calibration priority.
Run with explicit rater_facet (and context_facets if needed).
Review summary(ir) and top flagged rows in ir$pairs.
Visualize with plot_interrater_agreement().
The pairs data.frame contains:
Rater pair identifiers.
Number of matched-context observations for this pair.
Proportion of exact score agreements.
Expected exact agreement under chance.
Proportion of adjacent (+/- 1 category) agreements.
Signed mean score difference (Rater1 - Rater2).
Mean absolute score difference.
Pearson correlation between paired scores.
Logical; TRUE when Exact < exact_warn or Corr < corr_warn.
Raw counts behind the agreement proportions.
The summary data.frame contains:
Name of the rater facet analyzed.
Number of rater pairs evaluated.
Mean exact agreement across all pairs.
Observed exact agreement minus expected exact agreement.
Mean pairwise correlation.
Count and proportion of flagged pairs.
Severity-spread indices for the rater facet, reported separately from agreement.
diagnose_mfrm(), facets_chisq_table(), plot_interrater_agreement(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) ir <- interrater_agreement_table(fit, rater_facet = "Rater") # One-row overview: ExactAgreement, ExpectedExactAgreement, MeanCorr, # RaterSeparation, and RaterReliability are the headline reportable # statistics. ir$summary # Per-pair detail (Rater1 vs Rater2 with Exact, Adjacent, Corr, MAD). head(ir$pairs) p_ir <- plot(ir, draw = FALSE) p_ir$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) ir <- interrater_agreement_table(fit, rater_facet = "Rater") # One-row overview: ExactAgreement, ExpectedExactAgreement, MeanCorr, # RaterSeparation, and RaterReliability are the headline reportable # statistics. ir$summary # Per-pair detail (Rater1 vs Rater2 with Exact, Adjacent, Corr, MAD). head(ir$pairs) p_ir <- plot(ir, draw = FALSE) p_ir$data$plot
List available simulation metrics.
list_mfrm_sim_metrics(x, component = NULL, design_id = NULL)list_mfrm_sim_metrics(x, component = NULL, design_id = NULL)
x |
A simulation specification, simulation summary table, simulation evaluation object, or data frame. |
component |
Optional component to inspect. Defaults to the most useful table for each object class: design-grid summaries for arbitrary specifications, |
design_id |
Optional design rows used when |
A data frame with metric names, source component, role, direction, and suggested-default flags.
plot_mfrm_sim_dashboard(),
summarize_mfrm_sim_grid(),
evaluate_mfrm_design()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) list_mfrm_sim_metrics(spec)spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) list_mfrm_sim_metrics(spec)
List packaged simulation datasets
list_mfrmr_data()list_mfrmr_data()
Use this helper when you want to select packaged data programmatically (e.g., inside scripts, loops, or shiny/streamlit wrappers).
Typical pattern:
call list_mfrmr_data() to see available keys.
pass one key to load_mfrmr_data().
Character vector of dataset keys accepted by load_mfrmr_data().
Returned values are canonical dataset keys accepted by load_mfrmr_data().
Capture keys in a script (keys <- list_mfrmr_data()).
Select one key by index or name.
Load data via load_mfrmr_data() and continue analysis.
load_mfrmr_data(), ej2021_data
keys <- list_mfrmr_data() keys d <- load_mfrmr_data(keys[1]) head(d)keys <- list_mfrmr_data() keys d <- load_mfrmr_data(keys[1]) head(d)
Load a packaged simulation dataset
load_mfrmr_data( name = c("example_core", "example_bias", "study1", "study2", "combined", "study1_itercal", "study2_itercal", "combined_itercal") )load_mfrmr_data( name = c("example_core", "example_bias", "study1", "study2", "combined", "study1_itercal", "study2_itercal", "combined_itercal") )
name |
Dataset key. One of values from |
load_mfrmr_data("<key>") is the canonical loader for the packaged
datasets and the entry point used across the package help and
vignettes. The equivalent base-R alternative
data("mfrmr_<key>", package = "mfrmr") remains available for users
who prefer the full data() spelling; both paths return identical
long-format data frames and are supported long-term.
All returned datasets include the core long-format columns
Study, Person, Rater, Criterion, and Score.
Some datasets, such as the packaged documentation examples, also include
auxiliary variables like Group for DIF/bias demonstrations.
A data.frame in long format.
The return value is a plain long-format data.frame, ready for direct use
in fit_mfrm() without additional reshaping.
list valid names with list_mfrmr_data().
load one dataset key with load_mfrmr_data(name).
fit a model with fit_mfrm() and inspect with summary() / plot().
list_mfrmr_data(), ej2021_data
data("mfrmr_example_core", package = "mfrmr") head(mfrmr_example_core) d <- load_mfrmr_data("example_core") fit <- fit_mfrm( data = d, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "JML", maxit = 25 ) summary(fit)data("mfrmr_example_core", package = "mfrmr") head(mfrmr_example_core) d <- load_mfrmr_data("example_core") fit <- fit_mfrm( data = d, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "JML", maxit = 25 ) summary(fit)
Build an anchor table from fitted estimates
make_anchor_table(fit, facets = NULL, include_person = FALSE, digits = 6)make_anchor_table(fit, facets = NULL, include_person = FALSE, digits = 6)
fit |
Output from |
facets |
Optional subset of facets to include. |
include_person |
Include person estimates as anchors. |
digits |
Rounding digits for anchor values. |
This function exports estimated facet parameters as an anchor table for use in subsequent calibrations. This is the standard approach for linking across administrations: a reference run establishes the measurement scale, and anchored re-analyses place new data on that same scale.
Anchor values should be exported from a well-fitting reference run
with adequate sample size. If the reference model has convergence
issues or large misfit, the exported anchors may propagate
instability. Re-run audit_mfrm_anchors() on the receiving data
to verify compatibility before estimation.
The digits parameter controls rounding precision. Use at least 4
digits for research applications; excessive rounding (e.g., 1 digit)
can introduce avoidable calibration error.
A data.frame with Facet, Level, and Anchor.
Facet: facet name to be anchored in later runs.
Level: specific element/level name inside that facet.
Anchor: fixed logit value (rounded by digits).
Fit a reference run with fit_mfrm().
Export anchors with make_anchor_table(fit).
Pass selected rows back into fit_mfrm(..., anchors = ...).
fit_mfrm(), audit_mfrm_anchors()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) anchors_tbl <- make_anchor_table(fit) head(anchors_tbl) summary(anchors_tbl$Anchor)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) anchors_tbl <- make_anchor_table(fit) head(anchors_tbl) summary(anchors_tbl$Anchor)
Build a measurable-data summary
measurable_summary_table(fit, diagnostics = NULL)measurable_summary_table(fit, diagnostics = NULL)
fit |
Output from |
diagnostics |
Optional output from |
This helper consolidates measurable-data diagnostics into a dedicated report bundle: run-level summary, facet coverage, category usage, and subset (connected-component) information.
summary(t5) is supported through summary().
plot(t5) is dispatched through plot() for class
mfrm_measurable (type = "facet_coverage", "category_counts",
"subset_observations").
A named list with:
summary: one-row measurable-data summary
facet_coverage: per-facet coverage summary
category_stats: category-level usage/fit summary
subsets: subset summary table (when available)
summary: overall measurable design status.
facet_coverage: spread/precision by facet.
category_stats: category usage and fit context.
subsets: connectivity diagnostics (fragmented subsets reduce comparability).
Run measurable_summary_table(fit).
Check summary(t5) for subset/connectivity warnings.
Use plot(t5, ...) to inspect facet/category/subset views.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
The summary data.frame (one row) contains:
Total observations and summed weight.
Design dimensions.
Number of connected subsets.
Largest subset coverage.
The facet_coverage data.frame contains:
Facet name.
Number of estimated levels.
Mean standard error across levels.
Mean fit statistics across levels.
Measure range for this facet.
The category_stats data.frame contains:
Score category value.
Observed count and percentage.
Category-level fit.
Expected-observed comparison and low-count flag.
diagnose_mfrm(), rating_scale_table(), describe_mfrm_data(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) t5 <- measurable_summary_table(fit) summary(t5) p_t5 <- plot(t5, draw = FALSE) p_t5$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) t5 <- measurable_summary_table(fit) summary(t5) p_t5 <- plot(t5, draw = FALSE) p_t5$data$plot
Re-fits the rating data underlying an mfrm_fit as a crossed
random-effects model
Score ~ 1 + (1 | Person) + (1 | Facet1) + ... + Residual
via lme4::lmer, and returns the canonical G-theory variance
components plus G / Phi coefficients. Useful when reviewers ask
for a generalizability-theory complement to the Rasch-style
separation / reliability statistics that diagnose_mfrm()
already emits.
mfrm_generalizability( fit, data = NULL, object_facet = "Person", random_facets = NULL, reml = TRUE )mfrm_generalizability( fit, data = NULL, object_facet = "Person", random_facets = NULL, reml = TRUE )
fit |
An |
data |
Optional data frame. When |
object_facet |
Facet that plays the role of the "object of
measurement" – typically |
random_facets |
Character vector of non-person facets to
treat as random conditions of measurement. Default uses every
facet other than |
reml |
Logical, passed to |
An object of class mfrm_generalizability with:
variance_componentsOne row per random effect plus
residual, with columns Source, Variance, and
ProportionVariance.
coefficientsOne-row data frame with G
(generalizability coefficient, relative decision) and
Phi (dependability coefficient, absolute decision).
designDescription of the crossed-random model.
G is appropriate for relative decisions (rank-ordering
persons): G = sigma2(p) / (sigma2(p) + sigma2(Residual)).
Phi is appropriate for absolute decisions (cut-score
classification): Phi = sigma2(p) / (sigma2(p) + sigma2(facet main effects) + sigma2(Residual)).
Reporting bands follow Brennan (2001): G / Phi >= 0.8 for high-stakes decisions, >= 0.7 for routine reporting.
This helper formulates the random-effects model with main effects
only (Score ~ 1 + (1|Person) + (1|Facet1) + ... + Residual); no
explicit (1 | Person:Rater), (1 | Person:Criterion), or
(1 | Rater:Criterion) interaction terms are estimated. All
two-way and higher interaction variance is therefore folded into
the Residual term – the standard one-observation-per-cell
approximation – which can bias G downward when person x facet
interactions are substantively large. The reported Phi does
not apply Brennan (2001) D-study scalings (1/n_r,
1/n_i, 1/(n_r * n_i)); it treats each random source as
contributing one full observation, so it matches the canonical
Phi only when the operational reporting design is also one rating
per condition. For a full p x r x i decomposition with D-study
scaling, treat this output as a screening summary and re-estimate
externally.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. Wiley.
Brennan, R. L. (2001). Generalizability theory. Springer.
compute_facet_icc(), diagnose_mfrm()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) if (requireNamespace("lme4", quietly = TRUE)) { gt <- mfrm_generalizability(fit) gt$variance_components # Look for: a Person variance component well above any single # non-person facet's variance share. Large rater or criterion # variance shares mean those conditions add measurement error # relative to person spread. gt$coefficients # Look for: G >= 0.7 for routine reporting, >= 0.8 for high-stakes. # G < Phi means absolute decisions are noisier than relative # decisions; review whether facet main effects need anchoring. }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) if (requireNamespace("lme4", quietly = TRUE)) { gt <- mfrm_generalizability(fit) gt$variance_components # Look for: a Person variance component well above any single # non-person facet's variance share. Large rater or criterion # variance shares mean those conditions add measurement error # relative to person spread. gt$coefficients # Look for: G >= 0.7 for routine reporting, >= 0.8 for high-stakes. # G < Phi means absolute decisions are noisier than relative # decisions; review whether facet main effects need anchoring. }
Returns the lower / upper bounds that mfrmr screens treat as the acceptable mean-square (Infit / Outfit MnSq) band when flagging element-level misfit. Defaults follow Linacre's published 0.5-1.5 broad screening band; both ends can be overridden via R options. The returned band is a configurable screening convention, not a universal definition of misfit. Applied MFRM studies sometimes use narrower bands (for example 0.7-1.3 or 0.75-1.3) when the reporting purpose, sample size, or stakes justify a stricter operational screen.
mfrm_misfit_thresholds(lower = NULL, upper = NULL)mfrm_misfit_thresholds(lower = NULL, upper = NULL)
lower |
Optional lower bound. When |
upper |
Optional upper bound. |
Helpers that consume the band include
summary.mfrm_diagnostics() (misfit_flagged block and
key_warnings auto-flag), build_misfit_casebook() (the new
element_fit source family), the bias / misfit narrative inside
build_apa_outputs(), facet_quality_dashboard() when
misfit_warn = NULL, plot_person_fit() when lower / upper
are NULL, and plot_bubble() when fit_range = NULL. Setting the
options once at the top of an analysis script therefore changes every
downstream screen at once. Directional outputs use the same band: MnSq
values above the upper bound are labelled underfit, and values below the
lower bound are labelled overfit.
A named numeric vector c(lower = ..., upper = ...) with
lower < upper.
Two scalar R options drive the band:
mfrmr.misfit_lowerLower acceptance bound. Default 0.5.
mfrmr.misfit_upperUpper acceptance bound. Default 1.5.
Pass scalar arguments to override the options for a single call,
e.g. mfrm_misfit_thresholds(lower = 0.7, upper = 1.3) for the
tighter Bond & Fox (2015) reporting band.
summary.mfrm_diagnostics(), build_misfit_casebook(),
facet_quality_dashboard()
mfrm_misfit_thresholds() old <- options(mfrmr.misfit_lower = 0.7, mfrmr.misfit_upper = 1.3) mfrm_misfit_thresholds() options(old)mfrm_misfit_thresholds() old <- options(mfrmr.misfit_lower = 0.7, mfrmr.misfit_upper = 1.3) mfrm_misfit_thresholds() options(old)
List literature-based warning threshold profiles
mfrm_threshold_profiles()mfrm_threshold_profiles()
Use this function to inspect available profile presets before calling
build_visual_summaries().
profiles contains thresholds used by warning logic
(sample size, fit ratios, PCA cutoffs, etc.).
pca_reference_bands contains literature-oriented descriptive bands used in
summary text.
An object of class mfrm_threshold_profiles with
profiles (strict, standard, lenient) and pca_reference_bands.
profiles: numeric threshold presets (strict, standard, lenient).
pca_reference_bands: narrative reference bands for PCA interpretation.
Review presets with mfrm_threshold_profiles().
Pick a default profile for project policy.
Override only selected fields in build_visual_summaries() when needed.
profiles <- mfrm_threshold_profiles() s_profiles <- summary(profiles) s_profiles$overviewprofiles <- mfrm_threshold_profiles() s_profiles <- summary(profiles) s_profiles$overview
Guide to the legacy-compatible wrappers and text/file exports in mfrmr.
Use this page when you need continuity with older compatibility-oriented workflows,
fixed-width reports, or graph/score file style outputs.
This compatibility layer currently applies mainly to diagnostics-based
RSM / PCM workflows. First-release GPCM fits now also support
graph-only compatibility-style exports, while scorefile and
diagnostics-driven compatibility outputs remain limited to RSM / PCM.
Treat this layer as a presentation/contract surface, not as a claim of
FACETS or ConQuest numerical equivalence.
SPSS is treated differently from FACETS and ConQuest: mfrmr currently
supports table/data-frame/CSV handoff for SPSS-oriented reporting workflows,
but it does not generate SPSS syntax, write native SPSS system files, execute
SPSS estimators, or claim SPSS numerical parity.
You are reproducing an older workflow that expects one-shot wrappers.
You need fixed-width text blocks for console, logs, or archival handoff.
You need graphfile or scorefile style outputs for downstream legacy tools.
You are checking column coverage and metric consistency against a compatibility contract.
For standard estimation, use fit_mfrm() plus diagnose_mfrm().
For report bundles, use mfrmr_reports_and_tables.
For manuscript text, use build_apa_outputs() and reporting_checklist().
For visual follow-up, use mfrmr_visual_diagnostics.
run_mfrm_facets()One-shot legacy-compatible wrapper that fits, diagnoses, and returns key tables in one object.
mfrmRFacets()Alias for run_mfrm_facets() kept for continuity.
build_fixed_reports()Fixed-width interaction and pairwise text blocks. Best when a text-only compatibility artifact is required.
facets_output_file_bundle()Graphfile/scorefile style CSV and fixed-width exports for legacy pipelines.
facets_parity_report()Column and metric contract audit against the compatibility specification. Use only when an explicit compatibility contract audit is part of the task; the function name is historical and does not by itself imply external FACETS equivalence.
Instead of run_mfrm_facets(), prefer:
fit_mfrm() -> diagnose_mfrm() -> reporting_checklist().
Instead of build_fixed_reports(), prefer:
bias_interaction_report() -> build_apa_outputs().
Instead of facets_output_file_bundle(), prefer:
category_curves_report() or category_structure_report() plus
export_mfrm_bundle().
Instead of facets_parity_report() for routine QA, prefer:
reference_case_audit() for package-native completeness auditing or
reference_case_benchmark() for internal benchmark cases.
Keep compatibility wrappers only where a downstream consumer truly needs the old layout or fixed-width format.
For new scripts, start from package-native bundles and add compatibility outputs only at the export boundary.
Treat compatibility outputs as presentation contracts, not as the primary analysis objects.
Use compatibility_alias_table() when you need to check which aliases are
still retained and which package-native names should be used in new code.
Use reporting_checklist(fit)$software_scope to review the current
FACETS, ConQuest, and SPSS relationship wording for a fitted analysis.
Legacy handoff:
run_mfrm_facets() -> build_fixed_reports() ->
facets_output_file_bundle().
Mixed workflow:
RSM / PCM:
fit_mfrm() -> diagnose_mfrm() -> build_apa_outputs() ->
compatibility export only if required.
bounded GPCM:
fit_mfrm() -> diagnose_mfrm() -> reporting_checklist() ->
graph-only compatibility export only when a legacy handoff truly requires
it.
Compatibility-contract audit:
fit_mfrm() -> diagnose_mfrm() -> facets_parity_report().
For standard reports/tables, see mfrmr_reports_and_tables.
For manuscript-draft reporting, see mfrmr_reporting_and_apa.
For visual diagnostics, see mfrmr_visual_diagnostics.
For linking and DFF workflows, see mfrmr_linking_and_dff.
For end-to-end routes, see mfrmr_workflow_methods.
toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] run <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 10 ) summary(run) compatibility_alias_table("functions") fixed <- build_fixed_reports( estimate_bias( run$fit, run$diagnostics, facet_a = "Rater", facet_b = "Criterion", max_iter = 1 ), branch = "original" ) names(fixed)toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] run <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 10 ) summary(run) compatibility_alias_table("functions") fixed <- build_fixed_reports( estimate_bias( run$fit, run$diagnostics, facet_a = "Rater", facet_b = "Criterion", max_iter = 1 ), branch = "original" ) names(fixed)
Compact synthetic many-facet datasets designed for documentation examples.
Both datasets are large enough to avoid tiny-sample toy behavior while
remaining fast in R CMD check examples.
A data.frame with 6 columns:
Example dataset label ("ExampleCore" or "ExampleBias").
Person/respondent identifier.
Rater identifier.
Criterion facet label.
Observed category score on a four-category scale (1–4).
Balanced grouping variable used in DFF/DIF examples ("A" / "B").
Available data objects:
mfrmr_example_core
mfrmr_example_bias
mfrmr_example_core is generated from a single latent trait plus rater and
criterion main effects, making it suitable for general fitting, plotting, and
reporting examples.
mfrmr_example_bias starts from the same basic design but adds:
a known Group x Criterion effect (Group B is advantaged on Language)
a known Rater x Criterion interaction (R04 x Accuracy)
This lets differential-functioning and bias-analysis help pages demonstrate non-null findings.
| Dataset | Rows | Persons | Raters | Criteria | Groups |
| example_core | 768 | 48 | 4 | 4 | 2 |
| example_bias | 384 | 48 | 4 | 4 | 2 |
Use mfrmr_example_core for fitting, diagnostics, design-weighted precision curves,
and generic plots/reports.
Use mfrmr_example_bias for analyze_dff(), analyze_dif(), dif_interaction_table(),
plot_dif_heatmap(), and estimate_bias().
Both objects can be loaded either with load_mfrmr_data() or directly via
data("mfrmr_example_core", package = "mfrmr") /
data("mfrmr_example_bias", package = "mfrmr").
Synthetic documentation data generated from rating-scale Rasch facet
designs with fixed seeds in data-raw/make-example-data.R.
data("mfrmr_example_core", package = "mfrmr") table(mfrmr_example_core$Score) table(mfrmr_example_core$Group)data("mfrmr_example_core", package = "mfrmr") table(mfrmr_example_core$Score) table(mfrmr_example_core$Group)
Package-native guide to checking connectedness, building anchor-based links,
monitoring drift, and screening differential facet functioning (DFF) in
mfrmr.
"Is the design connected enough to support a common scale?"
Use subset_connectivity_report() and plot(..., type = "design_matrix").
"Which elements can I export as anchors from an existing fit?"
Use make_anchor_table() and audit_mfrm_anchors().
"How do I anchor a new administration to a baseline?"
Use anchor_to_baseline().
"Have common elements drifted across separately fitted waves?"
Use detect_anchor_drift() and plot_anchor_drift().
"Can I synthesize anchor audit, drift, and chain evidence into one review?"
Use build_linking_review().
"Do specific facet levels function differently across groups?"
Use analyze_dff(), plot_dif_heatmap(), and plot_dif_summary().
Fit with fit_mfrm() and diagnose with diagnose_mfrm().
Check connectedness with subset_connectivity_report().
Build or audit anchors with make_anchor_table() and
audit_mfrm_anchors().
Use anchor_to_baseline() when you need to place raw new data onto a
baseline scale.
Use build_equating_chain() only as a screened linking aid across
already fitted waves.
Use detect_anchor_drift() for stability monitoring on separately fitted
waves.
Use build_linking_review() when you need one operational synthesis
object rather than separate anchor/drift/chain tables.
Run analyze_dff() only after checking connectivity and common-scale
evidence.
subset_connectivity_report()Summarizes connected subsets, bottleneck facets, and design-matrix coverage.
make_anchor_table()Extracts reusable anchor candidates from a fit.
anchor_to_baseline()Anchors new raw data to a baseline fit and returns anchored diagnostics plus a consistency check against the baseline scale.
detect_anchor_drift()Compares fitted waves directly to flag unstable anchor elements.
build_equating_chain()Accumulates screened pairwise links across a series of administrations or forms.
build_linking_review()Synthesizes anchor-audit, drift, and screened-chain evidence into one operational review surface.
analyze_dff()Screens differential facet functioning with residual or refit methods, using screening-only language unless linking and precision support stronger interpretation.
Check connectedness before interpreting subgroup or wave differences.
Use DFF outputs as screening results when common-scale linking is weak.
Always name the facet, facet level, and group pair involved in a DFF contrast. A generic "DIF exists" statement is not interpretable in a many-facet design.
Residual-method DFF classifications are screening labels. ETS A/B/C
labels require refit output whose ClassificationSystem is "ETS".
Treat drift flags as prompts for review, not automatic evidence that an anchor must be removed.
Treat LinkSupportAdequate = FALSE as a weak-link warning: at least one
linking facet retained fewer than 5 common elements after screening.
Rebuild anchors from a defensible baseline rather than chaining unstable links by hand.
Cross-sectional linkage review:
fit_mfrm() -> diagnose_mfrm() -> subset_connectivity_report() ->
plot(..., type = "design_matrix").
Baseline placement review:
make_anchor_table() -> anchor_to_baseline() -> diagnose_mfrm().
Multi-wave drift review:
fit each wave separately -> detect_anchor_drift() ->
build_linking_review() -> plot_anchor_drift().
Group comparison route:
subset_connectivity_report() -> analyze_dff() ->
dif_report() -> plot_dif_heatmap() / plot_dif_summary().
For visual follow-up, see mfrmr_visual_diagnostics.
For report/table selection, see mfrmr_reports_and_tables.
For end-to-end routes, see mfrmr_workflow_methods.
For a longer walkthrough, see
vignette("mfrmr-linking-and-dff", package = "mfrmr").
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") subsets <- subset_connectivity_report(fit, diagnostics = diag) subsets$summary[, c("Subset", "Observations", "ObservationPercent")] dff <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy) head(dff$dif_table[, c("Level", "Group1", "Group2", "Classification", "ClassificationSystem")])toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") subsets <- subset_connectivity_report(fit, diagnostics = diag) subsets$summary[, c("Subset", "Observations", "ObservationPercent")] dff <- analyze_dff(fit, diag, facet = "Rater", group = "Group", data = toy) head(dff$dif_table[, c("Level", "Group1", "Group2", "Classification", "ClassificationSystem")])
Package-native guide to moving from fitted model objects to
manuscript-draft text, tables, notes, and revision checklists in mfrmr.
This guide applies fully to diagnostics-based RSM / PCM workflows.
First-release GPCM fits support reporting_checklist(),
precision_audit_report(), build_visual_summaries(),
run_qc_pipeline(), direct curve/graph and residual table helpers, and
caveated build_apa_outputs() plus package-native export/replay bundles.
Use gpcm_capability_matrix() when you need the formal boundary for the
current GPCM reporting path.
Bounded GPCM APA/export outputs are manuscript and reproducibility
scaffolds, not FACETS/Rasch score-side compatibility evidence. Keep the
returned caveats with any fair-average, bias, or conditional-SE language.
This page is for manuscript assembly: prose, caveats, notes/captions, and readiness review. Use mfrmr_reports_and_tables when the task is choosing a specific table/report helper or appendix export format.
"Which parts of this run are draft-complete, and with what caveats?"
Use reporting_checklist().
"How should I phrase the model, fit, and precision sections?"
For RSM / PCM, use build_apa_outputs().
"Which tables should I hand off to a manuscript or appendix?"
Use build_summary_table_bundle(), export_summary_appendix(),
apa_table(), and
facet_statistics_report().
"How do I explain model-based vs exploratory precision?"
Use precision_audit_report() and summary(diagnose_mfrm(...)).
"Which caveats need to appear in the write-up?"
Use reporting_checklist() first, then build_apa_outputs().
"How should I start figure captions or visual-results wording?"
Use visual_reporting_template() for conservative caption and results
sentence starters, then verify availability with
reporting_checklist()$visual_scope.
Fit with fit_mfrm().
Build diagnostics with diagnose_mfrm().
Review precision strength with precision_audit_report() when
inferential language matters.
Run reporting_checklist() to identify missing sections, caveats, and
next actions. Use the "Visual Displays" rows as the figure-routing
layer for the current run.
When strict marginal rows are available, follow up with
plot_marginal_fit() and plot_marginal_pairwise() before finalizing
the narrative around local misfit.
Create manuscript-draft prose and metadata with build_apa_outputs().
For bounded GPCM, retain the returned support_status and caveat and
keep fair-average/bias language at the screening tier.
Convert summary outputs to reusable table bundles with
build_summary_table_bundle(), review the bundle with summary() /
plot(), then convert specific components to handoff tables with
apa_table() or export them directly with export_summary_appendix().
reporting_checklist()Turns current analysis objects into a
prioritized revision guide with DraftReady, Priority, and
NextAction. DraftReady means "ready to draft with the documented
caveats"; ReadyForAPA is retained as a backward-compatible alias, and
neither field means "formal inference is automatically justified". The
"Visual Displays" rows also mirror the public plot family, so the
checklist doubles as a figure-routing surface.
build_apa_outputs()Builds shared-contract prose, table notes, captions, and a section map from the current fit and diagnostics.
build_summary_table_bundle()Turns supported summary() outputs
into named data.frame tables plus an index for manuscript or appendix
handoff, and now also supports bundle-level summary() / plot() for
role coverage and numeric QC.
export_summary_appendix()Writes those validated summary-table bundles to CSV and optional HTML appendix artifacts without requiring a full fit-based export bundle.
apa_table()Produces reproducible base-R tables with APA-oriented labels, notes, and captions.
precision_audit_report()Summarizes whether precision claims are model-based, hybrid, or exploratory.
facet_statistics_report()Provides facet-level summaries that often feed result tables and appendix material.
build_visual_summaries()Prepares publication-oriented figure payloads that can be cited from the report text.
visual_reporting_template()Provides conservative figure placement, caption-starter, results-wording, and overclaim-avoidance guidance for public visual helpers.
Treat reporting_checklist() as the gap finder and
build_apa_outputs() as the writing engine.
Copy apa$report_text (or print apa interactively) for draft
Method / Results prose; use summary(apa) only as a QA/completeness
check before handoff.
Use apa$section_map for section-level editing and
apa$table_figure_notes / apa$table_figure_captions for appendix
notes and captions.
Use the checklist's "Visual Displays" rows to decide whether the next
follow-up should be plot_qc_dashboard(), plot_marginal_fit(),
plot_residual_pca(), plot_bias_interaction(), or another public plot.
Use visual_reporting_template() to draft visual captions and
results-sentence starters, but do not paste the skeletons without checking
the actual fit, diagnostics, and study context.
Phrase formal inferential claims only when the precision tier is model-based.
Keep bias and differential-functioning outputs in screening language unless the current precision layer and linking evidence justify stronger claims.
Treat DraftReady (and the legacy alias ReadyForAPA) as a
drafting-readiness flag, not as a substitute for methodological review.
Rebuild APA outputs after major model changes instead of editing old text by hand.
For bounded GPCM, retain the helper caveats in manuscript prose and
export bundles; APA, visual-summary, QC, and package-native export/replay
routes are available with screening-tier limits.
Manuscript-first route:
fit_mfrm() -> diagnose_mfrm() -> reporting_checklist() ->
build_apa_outputs() -> build_summary_table_bundle() -> summary() /
plot() -> apa_table(), export_summary_appendix(), or
export_mfrm_bundle()(include = c("apa", "summary_tables", "html")).
For RSM / PCM final reports, prefer method = "MML" and
diagnostic_mode = "both" in the diagnostics step.
For bounded GPCM, retain the caveats from build_apa_outputs(),
build_mfrm_manifest(), build_mfrm_replay_script(), and
export_mfrm_bundle(), and avoid FACETS/Rasch score-side invariance
language.
Appendix-first route:
facet_statistics_report() -> apa_table() plus
build_visual_summaries() and build_apa_outputs() as parallel
products from the same fit / diagnostics objects.
Precision-sensitive route:
diagnose_mfrm() -> precision_audit_report() ->
reporting_checklist() -> build_apa_outputs().
bounded GPCM route:
diagnose_mfrm() -> precision_audit_report() ->
reporting_checklist() -> build_visual_summaries() /
run_qc_pipeline() -> build_apa_outputs() ->
export_mfrm_bundle(), keeping all GPCM caveats in downstream prose.
For report/table selection, see mfrmr_reports_and_tables.
For end-to-end analysis routes, see mfrmr_workflow_methods.
For visual follow-up, see mfrmr_visual_diagnostics.
For the bounded GPCM support statement, see gpcm_capability_matrix.
For a longer walkthrough, see
vignette("mfrmr-reporting-and-apa", package = "mfrmr").
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") checklist <- reporting_checklist(fit, diagnostics = diag) visual_reporting_template("manuscript")[, c("FigureFamily", "CaptionSkeleton")] head(checklist$checklist[, c("Section", "Item", "DraftReady", "NextAction")]) subset( checklist$checklist, Section == "Visual Displays", c("Item", "Available", "NextAction") ) apa <- build_apa_outputs(fit, diagnostics = diag) summary(apa) apa apa$section_map[, c("SectionId", "Available")] tbl <- apa_table(fit, which = "summary") tbl$caption bundle <- build_summary_table_bundle(checklist) bundle$table_index apa_from_bundle <- apa_table(bundle, which = "section_summary") apa_from_bundle$captiontoy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") checklist <- reporting_checklist(fit, diagnostics = diag) visual_reporting_template("manuscript")[, c("FigureFamily", "CaptionSkeleton")] head(checklist$checklist[, c("Section", "Item", "DraftReady", "NextAction")]) subset( checklist$checklist, Section == "Visual Displays", c("Item", "Available", "NextAction") ) apa <- build_apa_outputs(fit, diagnostics = diag) summary(apa) apa apa$section_map[, c("SectionId", "Available")] tbl <- apa_table(fit, which = "summary") tbl$caption bundle <- build_summary_table_bundle(checklist) bundle$table_index apa_from_bundle <- apa_table(bundle, which = "section_summary") apa_from_bundle$caption
Quick guide to choosing the right report or table helper in mfrmr.
Use this page when you know the reporting question but have not yet decided
which bundle, table, or reporting helper to call.
This page is a helper-selection map. For manuscript prose, caveat review,
note/caption handoff, and summary(apa) QA, use
mfrmr_reporting_and_apa.
"How should I document the model setup and run settings?"
Use specifications_report().
"Was data filtered, dropped, or mapped in unexpected ways?"
Use data_quality_report() and describe_mfrm_data().
"Did estimation converge cleanly and how formal is the precision layer?"
Use estimation_iteration_report() and precision_audit_report().
"Which facets are measurable, variable, or weakly separated?"
Use facet_statistics_report(), measurable_summary_table(), and
facets_chisq_table().
"Are score categories functioning in a usable sequence?"
Use rating_scale_table(), category_structure_report(), and
category_curves_report().
"Is the design linked well enough across subsets, forms, or waves?"
Use subset_connectivity_report() and plot_anchor_drift().
"What should go into the manuscript text and tables?"
For RSM / PCM, use reporting_checklist(), build_apa_outputs(),
and build_summary_table_bundle() or export_summary_appendix(). For
bounded GPCM, use the package-native reporting_checklist(),
build_visual_summaries(), run_qc_pipeline(), build_apa_outputs(),
and export_mfrm_bundle() routes with their returned caveats retained;
FACETS score-side compatibility remains outside the validated GPCM
boundary.
Start with specifications_report() and data_quality_report() to
document the run and confirm usable data.
Continue with estimation_iteration_report() and
precision_audit_report() to judge convergence and inferential strength.
Use facet_statistics_report() and subset_connectivity_report() to
describe spread, linkage, and measurability.
Add rating_scale_table(), category_structure_report(), and
category_curves_report() to document scale functioning.
Finish with reporting_checklist() and build_apa_outputs() for
manuscript-oriented output, then build_summary_table_bundle() for
reusable handoff tables or export_summary_appendix() for direct
appendix export. For bounded GPCM, retain the caveats returned by
build_apa_outputs() and export_mfrm_bundle(), and keep
fair-average/bias language at the screening tier.
specifications_report()Documents model type, estimation method, anchors, and core run settings. Best for method sections and audit trails.
data_quality_report()Summarizes retained and dropped rows, missingness, and unknown elements. Best for data cleaning narratives.
estimation_iteration_report()Shows replayed convergence trajectories. Best for diagnosing slow or unstable estimation.
precision_audit_report()Summarizes whether SE, CI, and
reliability indices are model-based, hybrid, or exploratory. Best for
deciding how strongly to phrase inferential claims.
facet_statistics_report()Bundles facet summaries, precision summaries, and variability tests. Best for facet-level reporting.
subset_connectivity_report()Summarizes disconnected subsets and coverage bottlenecks. Best for linking and anchor strategy review.
rating_scale_table()Gives category counts, average measures, and threshold diagnostics. Best for first-pass category evaluation.
category_structure_report()Adds transition points and compact category warnings. Best for category-order interpretation.
category_curves_report()Returns category-probability curve coordinates and summaries. Best for downstream graphics and report drafts.
reporting_checklist()Turns analysis status into an action list with priorities and next steps. Best for closing reporting gaps.
build_apa_outputs()Creates manuscript-draft text, notes, captions, and section maps from a shared reporting contract.
build_summary_table_bundle()Converts supported summary()
outputs into named data.frame tables with a compact index for appendix
or manuscript handoff, and now supports bundle-level summary() /
plot() for QC before export.
export_summary_appendix()Exports those validated summary-table bundles as CSV and optional HTML appendix artifacts without requiring the broader fit-based export bundle.
apa_table()Can now take those summary-table bundles directly,
so a selected component can move from summary() to a formatted handoff
table without rebuilding the analysis object path.
Use bundle summaries first, then drill down into component tables.
Treat precision_audit_report() as the gatekeeper for formal inference.
Treat category and bias outputs as complementary layers rather than substitutes for overall fit review.
Treat zero-count score categories as scale-functioning caveats. Boundary
zero-count categories can be retained with explicit rating_min /
rating_max; intermediate zero-count categories require
keep_original = TRUE and make adjacent thresholds weakly identified.
summary(describe_mfrm_data(...)) exposes these in Notes, printed
Caveats, and $caveats; summary(fit) carries full structured caveats
into printed Caveats and $caveats, with Key warnings as a short
triage subset. Summary-table exports use score_category_caveats and
analysis_caveats.
Use reporting_checklist() before build_apa_outputs() when a report
still needs missing diagnostics or clearer caveats.
Run documentation:
fit_mfrm() -> specifications_report() -> data_quality_report().
Precision and facet review:
diagnose_mfrm() -> precision_audit_report() ->
facet_statistics_report().
Scale review:
rating_scale_table() -> category_structure_report() ->
category_curves_report().
Manuscript handoff (RSM / PCM):
reporting_checklist() -> build_apa_outputs() ->
build_summary_table_bundle() -> summary() / plot() -> apa_table()
or export_summary_appendix() /
export_mfrm_bundle()(include = c("apa", "summary_tables")).
Bounded GPCM handoff:
reporting_checklist() -> build_visual_summaries() /
run_qc_pipeline() -> build_apa_outputs() /
export_mfrm_bundle() with returned caveats retained ->
build_summary_table_bundle() -> export_summary_appendix().
For visual follow-up, see mfrmr_visual_diagnostics.
For one-shot analysis routes, see mfrmr_workflow_methods.
For manuscript assembly, see mfrmr_reporting_and_apa.
For linking and DFF review, see mfrmr_linking_and_dff.
For legacy-compatible wrappers and exports, see mfrmr_compatibility_layer.
toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] fit <- fit_mfrm( toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") spec <- specifications_report(fit) summary(spec)$overview prec <- precision_audit_report(fit, diagnostics = diag) summary(prec)$checks checklist <- reporting_checklist(fit, diagnostics = diag) subset(checklist$checklist, Section == "Visual Displays", c("Item", "NextAction")) apa <- build_apa_outputs(fit, diagnostics = diag) summary(apa) apa apa$section_map[, c("Heading", "Available")] bundle <- build_summary_table_bundle(checklist) bundle$table_indextoy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] fit <- fit_mfrm( toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") spec <- specifications_report(fit) summary(spec)$overview prec <- precision_audit_report(fit, diagnostics = diag) summary(prec)$checks checklist <- reporting_checklist(fit, diagnostics = diag) subset(checklist$checklist, Section == "Visual Displays", c("Item", "NextAction")) apa <- build_apa_outputs(fit, diagnostics = diag) summary(apa) apa apa$section_map[, c("Heading", "Available")] bundle <- build_summary_table_bundle(checklist) bundle$table_index
Quick guide to choosing the right base-R diagnostic plot in mfrmr.
Use this page when you know the analysis question but do not yet know
which plotting helper or plot() method to call.
If you are preparing figures for a report, start with
reporting_checklist() and inspect the "Visual Displays" rows first.
Those rows now map directly onto the public plotting family covered on this
page, so the checklist can act as a plot-readiness router rather than just a
manuscript checklist.
This guide is primarily written for diagnostics-based RSM / PCM
workflows. GPCM fits also use the residual-based diagnostics stack
through diagnose_mfrm(), plot_unexpected(), plot_displacement(),
plot_interrater_agreement(), plot_facets_chisq(),
plot_residual_pca(), check_residual_dimensionality(),
plot_residual_dimensionality(), plot_qc_dashboard(),
build_visual_summaries(), and run_qc_pipeline(), plus the
posterior-scoring, design-weighted-information path via
compute_information() / plot_information(), and the Wright /
pathway / CCC fit plots. Two GPCM-specific caveats apply when
interpreting these residual-based screens:
The free discrimination parameter means MnSq mean-square screens
carry weaker invariance evidence than they do under RSM / PCM.
Treat MnSq flags from GPCM as exploratory pointers to cells that
merit closer inspection rather than as Rasch-style violations of
strict invariance.
Slope-aware fair_average_table() and estimate_bias() are available
under GPCM, but their SE columns keep the caveats documented in those
help pages. Package-native build_apa_outputs() and
export_mfrm_bundle() can carry the caveats forward; FACETS score-side
compatibility exports remain outside the validated GPCM boundary.
Use gpcm_capability_matrix() for the formal per-helper boundary
before choosing a GPCM follow-up plot route.
"Do persons and facet levels overlap on the same logit scale?"
Use plot(fit, type = "wright") or plot_wright_unified().
"Where do score categories transition across theta?"
Use plot(fit, type = "pathway") and plot(fit, type = "ccc").
"Is the design linked well enough across subsets or administrations?"
Use plot(subset_connectivity_report(...), type = "design_matrix") and
plot_anchor_drift().
"Which responses or levels look locally problematic?"
Use plot_unexpected() and plot_displacement().
"Which facet/category cells drive strict marginal misfit?"
Use plot_marginal_fit().
"Which level pairs drive strict local-dependence follow-up?"
Use plot_marginal_pairwise().
"Do raters agree and do facets separate meaningfully?"
Use plot_interrater_agreement() and plot_facets_chisq().
"Is there notable residual structure after the main Rasch dimension?"
Use plot_residual_pca() for scree/loadings; use
check_residual_dimensionality() and plot_residual_dimensionality()
when you need a Horn/Glorfeld-style parallel-analysis threshold.
"Which interaction cells or facet levels drive bias screening results?"
Use plot_bias_interaction().
"Which group-by-facet contrasts drive DFF / DIF screening results?"
Use plot_dif_heatmap() and plot_dif_summary() after
analyze_dff().
"Do person response rows follow the expected Guttman-style
ordering once persons and items are sorted on the logit scale?"
Use plot_guttman_scalogram() as a teaching-oriented screen.
"Do person-level standardized residuals look Gaussian, or are
there heavy tails that warrant follow-up?"
Use plot_residual_qq().
"Is rater severity drifting across waves or training sessions
(assuming the waves are already on a common anchored scale)?"
Use plot_rater_trajectory() together with plot_anchor_drift()
for the linking-scale review.
"I have many raters and want a compact pairwise agreement / correlation
overview instead of the bar chart?"
Use plot_rater_agreement_heatmap().
"Are there pairs of facet levels whose residuals co-move beyond the
main-effects MFRM? (Q3-style local-dependence screen)"
Use plot_local_dependence_heatmap().
"How distinguishable is each facet on a single page (separation,
strata, reliability)?"
Use plot_reliability_snapshot().
"Where do persons with the largest residual aggregates accumulate
across facet levels?"
Use plot_residual_matrix().
"How much did empirical-Bayes shrinkage move each facet level?"
Use plot_shrinkage_funnel() on a fit augmented via
apply_empirical_bayes_shrinkage().
"I need one compact triage screen first."
Use plot_qc_dashboard() for RSM / PCM. The bounded GPCM
branch can also call plot_qc_dashboard(); its fair-average panel
uses the slope-aware element-conditional table from diagnose_mfrm()
and should be read with the caveat documented in
fair_average_table(), not as Rasch-family fair-M invariance
evidence.
"Which figures are already supported by my current run?"
Use reporting_checklist() and review the "Visual Displays" rows before
choosing the next plot.
"Where should this figure go in a paper or appendix?"
Use visual_reporting_template() for a static reporting-use table, then
cross-check run-specific availability with reporting_checklist()$visual_scope.
"Do I need a 3D-style category probability surface?"
Use plot(fit, type = "ccc_surface", draw = FALSE) to get a
theta-by-category-by-probability payload for exploratory teaching or
downstream interactive rendering. Keep 2D pathway/CCC plots as the
default reporting figures.
If you are drafting a report, run reporting_checklist() first and read
the "Visual Displays" rows as the plot-readiness layer.
Start with plot_qc_dashboard() for one-page triage.
Move to plot_unexpected(), plot_displacement(),
plot_marginal_fit(), plot_marginal_pairwise(), and
plot_interrater_agreement() for flagged local issues.
Use plot(fit, type = "wright"), plot(fit, type = "pathway"),
plot(fit, type = "ccc"), plot_residual_pca(), and
plot_residual_dimensionality() for structural interpretation.
Use plot_bias_interaction(), plot_dif_heatmap(),
plot_dif_summary(), plot_anchor_drift(), and
plot_information() when the checklist or dashboard points to
interaction, differential-functioning, linking, or precision
follow-up.
Use plot(..., draw = FALSE) when you want reusable plotting payloads
instead of immediate graphics.
Use plot(fit, type = "ccc_surface", draw = FALSE) only when you need
a 3D-ready category-probability payload; mfrmr intentionally does not
add a package-native plotly/rgl renderer for this route.
Use preset = "publication" when you want the package's cleaner
manuscript-oriented styling.
This release treats the plotting layer as sufficient when the current run supports all of the following follow-up roles through public helpers:
First-pass triage:
plot_qc_dashboard() or the "Visual Displays" rows from
reporting_checklist().
Structural interpretation:
plot(fit, type = "wright"), plot(fit, type = "pathway"),
plot(fit, type = "ccc"), plot_residual_pca(), and
plot_residual_dimensionality().
Local issue follow-up:
plot_unexpected(), plot_displacement(),
plot_interrater_agreement(), plot_bias_interaction(),
plot_dif_heatmap(), and plot_dif_summary().
Strict marginal follow-up:
plot_marginal_fit() and plot_marginal_pairwise() for
diagnostic_mode = "both".
Reporting/export handoff:
build_visual_summaries() and draw = FALSE routes that return reusable
mfrm_plot_data payloads for downstream review and export. When step
estimates are available, build_visual_summaries() also exposes
$plot_payloads$category_probability_surface.
3D-ready exploratory handoff:
plot(fit, type = "ccc_surface", draw = FALSE) returns a
theta-by-category-by-probability mfrm_plot_data payload. This is not a
default APA/reporting figure and does not load plotly/rgl.
The package currently treats 3D as an exploratory data handoff, not as a
default plotting layer. The supported route is
plot(fit, type = "ccc_surface", draw = FALSE), which returns
surface, categories, category_support, groups, axis_contract,
renderer_contract, interpretation_guide, and reporting_policy tables
inside an mfrm_plot_data object. These columns can be passed to an
external renderer if needed, while category_support and
interpretation_guide should be checked before interpreting retained
zero-frequency categories or adjacent threshold ridges.
Do not replace the standard 2D Wright map, pathway map, CCC plot, heatmap/profile diagnostics, or information curves with 3D figures in routine reports. In particular, 3D Wright maps are discouraged because perspective and occlusion obscure the shared-scale comparison that the Wright map is meant to support.
plot(fit, type = "wright")Shared logit map of persons, facet levels, and step thresholds. Best for targeting and spread.
plot(fit, type = "pathway")Expected score by theta, with dominant-category strips. Best for scale progression.
plot(fit, type = "ccc")Category probability curves. Best for checking whether categories peak in sequence.
plot_unexpected()Observation-level surprises. Best for case review and local misfit triage.
plot_displacement()Level-wise anchor movement. Best for anchor robustness and residual calibration tension.
plot_marginal_fit()Posterior-integrated first-order category residuals. Best for seeing which facet/category cells drive strict marginal flags.
plot_marginal_pairwise()Posterior-integrated exact/adjacent agreement residuals. Best for exploratory local-dependence follow-up after strict marginal flags.
plot_empirical_fit()mirt-style observed-vs-expected empirical
fit overlay for a selected facet level. Best after fit_p_table(),
plot_bubble(), or summary(diagnose_mfrm(...)) identifies the level
worth inspecting.
plot_fit_direction_summary()Compact underfit / overfit /
mixed / in-band rate chart built from fit_direction_summary(). Best
when the report needs to say whether fit problems are mostly high MnSq
underfit, low MnSq overfit, or mixed.
plot_interrater_agreement()Exact agreement, expected agreement, pairwise correlation, and agreement gaps. Best for rater consistency.
plot_facets_chisq()Facet variability and chi-square summaries. Best for checking whether a facet contributes meaningful spread.
plot_residual_pca()Residual structure after the Rasch dimension is removed. Best for exploratory residual-structure review, not as a standalone unidimensionality test.
plot_bias_interaction()Interaction-bias screening views for cells and facet profiles. Best for systematic departure from the additive main-effects model.
plot_dif_heatmap() / plot_dif_summary()
DFF / DIF screening views for facet-level x group contrasts. Best for showing which facet and group pair is involved before writing substantive interpretations.
plot_anchor_drift()Anchor drift and screened linking-chain visuals. Best for multi-form or multi-wave linking review after checking retained common-element support.
plot_guttman_scalogram()Person x facet-level response matrix with unexpected-response overlay. Best for teaching-oriented scalogram intuition and visual triage of where the data depart from the expected ordering.
plot_residual_qq()Normal Q-Q plot of person-level standardized residual aggregates. Best for checking the tail behavior of residuals as exploratory follow-up after a fit screen.
plot_rater_trajectory()Per-rater severity trajectory across named waves / occasions. Best for rater-training or drift feedback when the supplied fits have already been placed on a common anchored scale; the helper itself does not perform linking.
plot_rater_agreement_heatmap()Compact pairwise rater x rater
heatmap of exact agreement (default) or Pearson-style correlation. Best
when the rater count makes the bar-chart form of
plot_interrater_agreement() too busy.
plot_local_dependence_heatmap()Yen Q3-style heatmap of pairwise residual correlations between facet levels. Best for exploratory local-dependence screening; pairs with very strong off-diagonal residual correlation merit content-level review.
plot_reliability_snapshot()One-figure facet x reliability /
separation / strata bar overview built from diagnostics$reliability.
Best as a single small figure for "which facets are statistically
distinguishable?".
plot_residual_matrix()Person x facet-level standardized
residual heatmap. Best as a follow-up to plot_guttman_scalogram() when
the residual sign and magnitude matter, not just the response code.
plot_shrinkage_funnel()Empirical-Bayes shrinkage caterpillar /
funnel showing raw versus shrunken facet estimates. Best on fits
produced via apply_empirical_bayes_shrinkage() for reviewing how
much each level moved under the prior.
For users coming from the standard Rasch-measurement software packages, the closest mfrmr helper for each table or figure family is summarised below. The mapping is approximate; mfrmr is designed for many-facet workflows, so column subsets and column names differ.
plot(fit, type = "wright") and
plot_wright_unified() correspond to FACETS Table 6 / Winsteps
"Person-Item map".
plot(fit, type = "pathway")
and plot(fit, type = "ccc") correspond to Winsteps Table 21
("Probability category curves") and FACETS category-probability
curves.
compute_information() +
plot_information() correspond to Winsteps Table 17 ("Test
characteristic curve, test information function").
diagnose_mfrm() and the Largest
|ZSTD| / MnSq misfit blocks of summary(diag) correspond to
Winsteps Table 10/13/14 (Misfit order) and FACETS Tables 7/8.
fit_p_table() gives a TAM-style Infit/Outfit p-value table, and
plot_empirical_fit() gives a mirt-style empirical follow-up plot.
fit_direction_summary() and plot_fit_direction_summary() summarize
the same MnSq direction labels as rates.
estimate_bias() +
plot_bias_interaction() correspond to FACETS Table 14
("Bias / Interaction calibration report").
analyze_dff() /
analyze_dif() + plot_dif_heatmap() / plot_dif_summary()
cover the FACETS DIF / bias-by-group route and the Winsteps DIF
(Table 30 group differences) report.
interrater_agreement_table() +
plot_interrater_agreement() / plot_rater_agreement_heatmap()
correspond to FACETS Table 7-style observed-vs-expected agreement
reports.
plot_anchor_drift() and
plot_information() cover the FACETS / Winsteps anchored-run
review route; full equating-chain helpers are exposed via
build_equating_chain().
Wright map: look for gaps between person density and facet/step locations; large gaps indicate weaker targeting.
Pathway / CCC: look for monotone progression and clear category dominance bands; flat or overlapping curves suggest weak category separation.
3D-ready category surface: use as an exploratory view of the same
category-probability information, not as a replacement for the 2D
pathway/CCC figures in reports. Read category_support first when a
retained category has zero observed responses.
Unexpected / displacement: use as screening tools, not final evidence by themselves.
Strict marginal and pairwise local-dependence plots are exploratory
follow-up layers for diagnostic_mode = "both", not standalone
inferential tests.
Fit p-value tables from fit_p_table() use the mfrmr residual
mean-square and ZSTD normal-tail approximation by default. Use
reference = "facets" when a FACETS-migration table should use the
Wright-Masters/FACETS moment df and ZSTD cap. The output uses
TAM-style column names, but it is not TAM::tam.fit()
simulation/posterior fit.
Underfit / overfit direction summaries from fit_direction_summary() are
MnSq-band summaries. ZSTD and p-value flags are reported separately so
changing the df/ZSTD reference does not silently redefine the direction.
plot_empirical_fit() is a descriptive observed-vs-expected bin overlay.
It is not mirt::itemfit(..., fit_stats = "S_X2"), does not condition on
the same sum-score tables, and does not report S_X2, RMSEA.S_X2, or
p.S_X2.
Inter-rater agreement and facet variability address different questions: agreement concerns scoring consistency, whereas variability concerns whether facet elements are statistically distinguishable.
Residual PCA and bias plots should be interpreted as follow-up layers after the main fit screen, not as first-pass diagnostics.
DFF residual-method plots are screening visuals. ETS A/B/C labels
should be claimed only for rows whose refit output reports
ClassificationSystem == "ETS".
Figure-readiness route:
fit_mfrm() -> diagnose_mfrm() -> reporting_checklist() ->
inspect "Visual Displays" rows -> chosen public plot helper.
Quick screening:
fit_mfrm() -> diagnose_mfrm() -> plot_qc_dashboard().
Strict marginal follow-up:
diagnose_mfrm() with diagnostic_mode = "both" ->
plot_marginal_fit() ->
plot_marginal_pairwise().
Fit follow-up:
fit_p_table() -> fit_direction_summary() ->
plot_fit_direction_summary() / plot_empirical_fit() ->
build_misfit_casebook().
Simulation fit-screening:
evaluate_mfrm_design() -> summarize_simulation_misfit() ->
plot_simulation_misfit_rates().
Scale and targeting review:
plot(fit, type = "wright") -> plot(fit, type = "pathway") ->
plot(fit, type = "ccc").
Linking review:
subset_connectivity_report() -> plot(..., type = "design_matrix") ->
plot_anchor_drift().
Interaction review:
estimate_bias() -> plot_bias_interaction() ->
reporting_checklist().
DFF / DIF review:
analyze_dff() -> plot_dif_heatmap() / plot_dif_summary() ->
inspect the explicit facet, level, and group-pair columns before
writing interpretation.
For a longer, plot-first walkthrough, run
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
mfrmr_workflow_methods, mfrmr_reports_and_tables,
mfrmr_reporting_and_apa, mfrmr_linking_and_dff,
gpcm_capability_matrix, visual_reporting_template(),
plot.mfrm_fit(), plot_qc_dashboard(),
plot_unexpected(), plot_displacement(), plot_marginal_fit(),
plot_marginal_pairwise(), plot_interrater_agreement(),
plot_facets_chisq(), plot_residual_pca(), plot_bias_interaction(),
fit_p_table(), plot_empirical_fit(),
plot_dif_heatmap(), plot_dif_summary(), plot_anchor_drift(),
plot_guttman_scalogram(),
plot_residual_qq(), plot_rater_trajectory(),
plot_rater_agreement_heatmap()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") checklist <- reporting_checklist(fit, diagnostics = diag) visual_reporting_template("manuscript") subset( checklist$checklist, Section == "Visual Displays" & Item %in% c("QC / facet dashboard", "Strict marginal visuals"), c("Item", "Available", "NextAction") ) qc <- plot_qc_dashboard(fit, diagnostics = diag, draw = FALSE, preset = "publication") qc$data$plot p_marg <- plot_marginal_fit(diag, draw = FALSE, preset = "publication") p_marg$data$preset wright <- plot(fit, type = "wright", draw = FALSE, preset = "publication") wright$data$preset pca <- analyze_residual_pca(diag, mode = "overall") scree <- plot_residual_pca(pca, plot_type = "scree", draw = FALSE, preset = "publication") scree$data$preset dim_check <- check_residual_dimensionality( pca, mode = "overall", method = "residual_normal", reps = 5, seed = 1 ) dim_plot <- plot_residual_dimensionality(dim_check, draw = FALSE, preset = "publication") dim_plot$data$presettoy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") checklist <- reporting_checklist(fit, diagnostics = diag) visual_reporting_template("manuscript") subset( checklist$checklist, Section == "Visual Displays" & Item %in% c("QC / facet dashboard", "Strict marginal visuals"), c("Item", "Available", "NextAction") ) qc <- plot_qc_dashboard(fit, diagnostics = diag, draw = FALSE, preset = "publication") qc$data$plot p_marg <- plot_marginal_fit(diag, draw = FALSE, preset = "publication") p_marg$data$preset wright <- plot(fit, type = "wright", draw = FALSE, preset = "publication") wright$data$preset pca <- analyze_residual_pca(diag, mode = "overall") scree <- plot_residual_pca(pca, plot_type = "scree", draw = FALSE, preset = "publication") scree$data$preset dim_check <- check_residual_dimensionality( pca, mode = "overall", method = "residual_normal", reps = 5, seed = 1 ) dim_plot <- plot_residual_dimensionality(dim_check, draw = FALSE, preset = "publication") dim_plot$data$preset
Quick reference for end-to-end mfrmr analysis and for checking which
output objects support summary() and plot().
For the clearest default route in RSM / PCM, use
fit_mfrm() with method = "MML" ->
diagnose_mfrm() with diagnostic_mode = "both" ->
reporting_checklist() ->
plot_qc_dashboard() and, when flagged, plot_marginal_fit() /
plot_marginal_pairwise() ->
build_apa_outputs() ->
build_summary_table_bundle() -> apa_table() or
export_summary_appendix().
Use JML only when you explicitly want a faster exploratory pass and are
willing to defer strict marginal follow-up and formal precision language to
a later MML run.
When the main question is scale maintenance rather than manuscript reporting,
branch after diagnose_mfrm() into:
audit_mfrm_anchors() and/or detect_anchor_drift() ->
build_equating_chain() when adjacent-link review is needed ->
build_linking_review() ->
inspect review$group_view_index for stable wave / link / facet rollups and
summary(review)$plot_routes for the next plot helper ->
plot_anchor_drift() or plot(anchor_audit, ...) for the specific flagged
evidence family.
For bounded GPCM, keep anchor/drift helpers as direct exploratory support
only. build_linking_review() remains outside the current formal GPCM
route.
When the main question is which observations, facet levels, or pairwise
structures deserve follow-up, branch after diagnose_mfrm() into:
build_misfit_casebook() ->
inspect casebook$group_view_index, casebook$group_views, and
summary(casebook)$plot_routes for stable person / facet / wave rollups and
the next plot helper ->
plot_unexpected(), plot_displacement(), plot_marginal_fit(), or
plot_marginal_pairwise() according to casebook$plot_map ->
build_summary_table_bundle() / export_summary_appendix() when the
flagged cases need appendix-style reporting support.
build_misfit_casebook() can still be used for bounded GPCM, but it
should be read as an operational exploratory screen rather than as a strict
Rasch-style invariance report.
When the fit uses population_formula = ..., keep the distinction between
the estimator and the forecast helpers explicit:
fit_mfrm() estimates the current narrow latent-regression MML branch.
In the returned fit object, fit$population$person_table is the
complete-case estimation table, while
fit$population$person_table_replay retains the observed-person-aligned
pre-omit background-data table for replay/export provenance.
predict_mfrm_units() and sample_mfrm_plausible_values() can then score
under the fitted population model when scored units also supply
one-row-per-person background data. That scoring-time person_data
contract remains separate from the fit object's stored replay table.
predict_mfrm_population() remains a scenario-level simulation/refit
helper rather than the latent-regression estimator itself.
If the intended rating scale includes categories not observed in the current
data, make that support explicit. For example, use
rating_min = 1, rating_max = 5 for a 1-5 scale with only 2-5 observed.
If an intermediate category is unobserved (for example 1, 2, 4, 5 with no
3), also set keep_original = TRUE if the zero-count category should remain
in the fitted support. summary(describe_mfrm_data(...)) reports retained
zero-count categories in Notes, printed Caveats, and $caveats;
summary(fit) carries full structured rows into printed Caveats and
$caveats, with Key warnings as a short triage subset. Summary-table
exports route those rows through score_category_caveats or
analysis_caveats. Adjacent threshold estimates should still be treated as
weakly identified when an intermediate category is unobserved.
Fit a model with fit_mfrm().
For final reporting, prefer method = "MML" unless you explicitly want
a fast exploratory JML pass.
(Optional) Use run_mfrm_facets() or mfrmRFacets() for a
legacy-compatible one-shot workflow wrapper.
For RSM / PCM, build diagnostics with diagnose_mfrm().
For final reporting, prefer diagnostic_mode = "both" so the legacy
residual path and the strict marginal screen remain visible side by side.
For bounded GPCM, diagnostics are now available through
diagnose_mfrm() together with analyze_residual_pca(),
interrater_agreement_table(), unexpected_response_table(),
displacement_table(), measurable_summary_table(),
rating_scale_table(), facet_quality_dashboard(),
reporting_checklist(), build_visual_summaries(),
run_qc_pipeline(), and plot_qc_dashboard(). Treat those
residual-based summaries as exploratory screens because the
discrimination parameter is free.
Slope-aware fair_average_table() and estimate_bias() are available
with explicit SE caveats; FACETS-compatibility score exports remain
blocked for bounded GPCM.
Posterior scoring with predict_mfrm_units() /
sample_mfrm_plausible_values(), design-weighted information via
compute_information() / plot_information(), Wright/pathway/CCC plots
via plot.mfrm_fit(), direct category reports via
category_structure_report() / category_curves_report(), and direct
data generation through build_mfrm_sim_spec(), extract_mfrm_sim_spec(),
and simulate_mfrm_data() are also available when the simulation
specification stores both thresholds and slopes. Caveated
planning/forecasting, APA, and package-native replay/export bundles are
available under the role-based bounded GPCM contract; FACETS score
exports remain outside the validated GPCM boundary. Use
gpcm_capability_matrix() as the formal capability map before branching
into less common helpers.
(Optional) Estimate interaction bias with estimate_bias(). For bounded
GPCM, read the returned caveat before using the SE / t / probability
columns.
(Optional) Choose a downstream branch:
reporting_checklist() for manuscript/report preparation, or
build_weighting_audit() for Rasch-versus-bounded-GPCM
weighting review, or build_misfit_casebook() for operational case
review. build_linking_review() remains RSM / PCM only.
(Optional) Generate reporting bundles. RSM / PCM can use the full
manuscript table route through build_summary_table_bundle(),
apa_table(), export_summary_appendix(), build_fixed_reports(), and
build_visual_summaries(). Bounded GPCM should use the package-native
build_apa_outputs(), build_visual_summaries(), run_qc_pipeline(),
build_mfrm_manifest(), build_mfrm_replay_script(), and
export_mfrm_bundle() routes with the returned caveats retained.
FACETS score-side compatibility exports remain RSM / PCM only.
(Optional, RSM / PCM) Audit report completeness with
reference_case_audit(). Use facets_parity_report() only when you
explicitly need the compatibility layer.
(Optional, RSM / PCM) For operational linking follow-up, combine
audit_mfrm_anchors(), detect_anchor_drift(), and
build_equating_chain() inside build_linking_review() before
exporting appendix-style tables.
(Optional) Check packaged reference cases with
reference_case_benchmark() when you want package-side reference checks.
(Optional) For design planning or future scoring, move to the
simulation/prediction layer:
build_mfrm_sim_spec() / extract_mfrm_sim_spec() ->
evaluate_mfrm_design() / predict_mfrm_population() ->
predict_mfrm_units() / sample_mfrm_plausible_values(). Current
fit-derived simulation specs, design-evaluation helpers, and forecasting
helpers include the bounded GPCM route with explicit caveats and still
target the role-based person x rater-like x criterion-like contract.
Unit scoring can use an ordinary MML fit directly, a latent-regression
MML fit when you also supply one-row-per-person background data for the
scored units, or a JML fit when a post hoc reference-prior EAP layer is
acceptable. Intercept-only latent-regression fits
(population_formula = ~ 1) can reconstruct that minimal person table
from the scored person IDs. Keep predict_mfrm_population()
conceptually separate from that scoring layer: it is a simulation-based
scenario forecast helper, not the latent-regression estimator itself.
Prediction export still requires actual prediction objects in addition to
include = "predictions".
Use summary() for compact text checks and plot() (or dedicated plot
helpers) for base-R visual diagnostics.
Quick first pass:
RSM / PCM: fit_mfrm() -> diagnose_mfrm() -> plot_qc_dashboard() ->
reporting_checklist() when you want the package to route the next figures.
bounded GPCM: fit_mfrm() -> diagnose_mfrm() ->
plot_qc_dashboard() / unexpected_response_table() ->
rating_scale_table() ->
compute_information() -> plot_information() ->
plot.mfrm_fit() / category_curves_report() ->
build_visual_summaries() / run_qc_pipeline(). For bounded GPCM,
keep the caveats visible and use build_apa_outputs(),
build_mfrm_manifest(), build_mfrm_replay_script(), and
export_mfrm_bundle() only as package-native bounded-GPCM routes, not as
FACETS score-side compatibility evidence.
Linking and coverage review:
subset_connectivity_report() -> plot(..., type = "design_matrix") ->
plot_wright_unified().
Manuscript prep:
RSM / PCM:
reporting_checklist() -> inspect the "Visual Displays" and
"Method Section" rows -> build_apa_outputs() ->
build_summary_table_bundle() -> apa_table() or
export_summary_appendix().
First-release GPCM:
reporting_checklist() -> build_visual_summaries() /
run_qc_pipeline() -> build_apa_outputs() /
export_mfrm_bundle() with the returned caveats retained.
Weighting-policy review:
compare_mfrm() -> build_weighting_audit() ->
compute_information() / plot_information() when you want to inspect
whether bounded GPCM is introducing substantively acceptable
discrimination-based reweighting relative to the Rasch-family reference.
Design planning and forecasting:
build_mfrm_sim_spec() or extract_mfrm_sim_spec() ->
evaluate_mfrm_design() -> predict_mfrm_population() ->
predict_mfrm_units() or sample_mfrm_plausible_values() under the
fitted scoring basis (ordinary MML, latent-regression MML with
person-level background data, or JML with the documented post hoc EAP
approximation). Here again, predict_mfrm_population() is the
scenario-level forecast helper, whereas predict_mfrm_units() /
sample_mfrm_plausible_values() are the scoring layer. Prediction export
requires actual prediction objects. First-release GPCM now supports
direct data generation via
build_mfrm_sim_spec(), extract_mfrm_sim_spec(), and
simulate_mfrm_data(), residual diagnostics, and direct curve/report
helpers, and the bounded planning/forecasting route with explicit caveats.
The PCM/GPCM planning layer remains role-based for two non-person facets,
while the RSM branch now also exposes build_mfrm_arbitrary_sim_spec(),
extract_mfrm_arbitrary_sim_spec(), simulate_mfrm_arbitrary_data(),
summarize_mfrm_sim_design(), plot_mfrm_sim_design(),
summarize_mfrm_sim_grid(), plot_mfrm_sim_grid(),
list_mfrm_sim_metrics(), plot_mfrm_sim_dashboard(), and
evaluate_mfrm_bias_detection() for arbitrary-facet design inspection,
design-grid tradeoff visualization, user-selected multi-metric dashboards,
and bias-screening sensitivity checks.
This help page is a map, not an estimator:
use it to decide function order,
confirm which objects have summary()/plot() defaults,
identify when dedicated helper functions are needed,
and treat reporting_checklist() as the package's readiness router for
plot and report follow-up.
summary() and plot() routesmfrm_fit: summary(fit) and plot(fit, ...).
mfrm_diagnostics: summary(diag); plotting via dedicated helpers
such as plot_unexpected(), plot_displacement(), plot_qc_dashboard().
mfrm_bias: summary(bias) and plot_bias_interaction().
mfrm_data_description: summary(ds) and plot(ds, ...).
mfrm_anchor_audit: summary(aud) and plot(aud, ...).
mfrm_misfit_casebook: summary(casebook) and print(casebook), with
grouping views available through casebook$group_view_index and
casebook$group_views, source-specific plotting routed through
summary(casebook)$plot_routes and casebook$plot_map, and
appendix/report handoff available through
build_summary_table_bundle() and export_summary_appendix().
mfrm_weighting_audit: summary(audit) and print(audit), with
information follow-up routed through compute_information() and
plot_information() according to audit$plot_map, and appendix/report
handoff available through build_summary_table_bundle() and
export_summary_appendix().
mfrm_linking_review: summary(review) and print(review), with
grouping views available through review$group_view_index and
review$group_views, and plotting routed through summary(review)$plot_routes,
plot_anchor_drift(), and plot(anchor_audit, ...) according to
review$plot_map.
mfrm_facets_run: summary(run) and plot(run, type = c("fit", "qc"), ...).
apa_table: summary(tbl) and plot(tbl, ...).
mfrm_apa_outputs: print apa for concise Method / Results draft text;
use summary(apa) for compact diagnostics of report text.
mfrm_summary_table_bundle: print(bundle) for manuscript-oriented table
index plus named tables from supported summary() outputs,
summary(bundle) for table-role/numeric coverage, and plot(bundle, ...)
for table-size or numeric-column QC.
mfrm_threshold_profiles: summary(profiles) for preset threshold grids.
mfrm_population_prediction: summary(pred) for design-level forecast
tables.
mfrm_unit_prediction: summary(pred) for unit-level posterior summaries
under the fitted scoring basis.
mfrm_plausible_values: summary(pv) for draw-level uncertainty
summaries.
mfrm_bundle families:
summary() and class-aware plot(bundle, ...).
Key bundle classes now also use class-aware summary(bundle):
mfrm_unexpected, mfrm_fair_average, mfrm_displacement,
mfrm_interrater, mfrm_facets_chisq, mfrm_bias_interaction,
mfrm_rating_scale, mfrm_category_structure, mfrm_category_curves,
mfrm_measurable, mfrm_unexpected_after_bias, mfrm_output_bundle,
mfrm_residual_pca, mfrm_specifications, mfrm_data_quality,
mfrm_iteration_report, mfrm_subset_connectivity,
mfrm_facet_statistics, mfrm_parity_report, mfrm_reference_audit,
mfrm_reference_benchmark.
plot.mfrm_bundle() coverageDefault dispatch now covers:
mfrm_unexpected, mfrm_fair_average, mfrm_displacement
mfrm_interrater, mfrm_facets_chisq, mfrm_bias_interaction
mfrm_bias_count, mfrm_fixed_reports, mfrm_visual_summaries
mfrm_category_structure, mfrm_category_curves, mfrm_rating_scale
mfrm_measurable, mfrm_unexpected_after_bias, mfrm_output_bundle
mfrm_residual_pca, mfrm_specifications, mfrm_data_quality
mfrm_iteration_report, mfrm_subset_connectivity, mfrm_facet_statistics
mfrm_parity_report, mfrm_reference_audit, mfrm_reference_benchmark
For unknown bundle classes, use dedicated plotting helpers or custom base-R plots from component tables.
fit_mfrm(), run_mfrm_facets(), mfrmRFacets(),
diagnose_mfrm(), estimate_bias(), mfrmr_visual_diagnostics,
mfrmr_reports_and_tables, mfrmr_reporting_and_apa,
gpcm_capability_matrix, mfrmr_linking_and_dff,
mfrmr_compatibility_layer,
summary.mfrm_fit(), summary(diag),
summary(), plot.mfrm_fit(), plot()
toy_full <- load_mfrmr_data("example_core") keep_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% keep_people, , drop = FALSE] fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) summary(fit)$next_actions diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") summary(diag)$next_actions chk <- reporting_checklist(fit, diagnostics = diag) subset( chk$checklist, Section == "Visual Displays", c("Item", "DraftReady", "NextAction") ) qc <- plot_qc_dashboard(fit, diagnostics = diag, draw = FALSE, preset = "publication") qc$data$preset p_marg <- plot_marginal_fit(diag, draw = FALSE, preset = "publication") p_marg$data$preset sc <- subset_connectivity_report(fit, diagnostics = diag) p_design <- plot(sc, type = "design_matrix", draw = FALSE, preset = "publication") p_design$data$plot bundle <- build_summary_table_bundle(chk, appendix_preset = "recommended") summary(bundle)$role_summary plot(bundle, type = "appendix_presets", draw = FALSE)$data$plottoy_full <- load_mfrmr_data("example_core") keep_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% keep_people, , drop = FALSE] fit <- fit_mfrm( toy, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", maxit = 200 ) summary(fit)$next_actions diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") summary(diag)$next_actions chk <- reporting_checklist(fit, diagnostics = diag) subset( chk$checklist, Section == "Visual Displays", c("Item", "DraftReady", "NextAction") ) qc <- plot_qc_dashboard(fit, diagnostics = diag, draw = FALSE, preset = "publication") qc$data$preset p_marg <- plot_marginal_fit(diag, draw = FALSE, preset = "publication") p_marg$data$preset sc <- subset_connectivity_report(fit, diagnostics = diag) p_design <- plot(sc, type = "design_matrix", draw = FALSE, preset = "publication") p_design$data$plot bundle <- build_summary_table_bundle(chk, appendix_preset = "recommended") summary(bundle)$role_summary plot(bundle, type = "appendix_presets", draw = FALSE)$data$plot
mfrmr audit contractNormalize extracted ConQuest overlap files to the mfrmr audit contract
normalize_conquest_overlap_files( population_file, item_file, case_file, population_delimiter = c("auto", "comma", "tab", "semicolon", ",", "\t", ";"), item_delimiter = c("auto", "comma", "tab", "semicolon", ",", "\t", ";"), case_delimiter = c("auto", "comma", "tab", "semicolon", ",", "\t", ";"), conquest_population_term = "auto", conquest_population_estimate = "auto", conquest_item_id = "auto", conquest_item_estimate = "auto", conquest_case_person = "auto", conquest_case_estimate = "auto", keep_extra_columns = TRUE )normalize_conquest_overlap_files( population_file, item_file, case_file, population_delimiter = c("auto", "comma", "tab", "semicolon", ",", "\t", ";"), item_delimiter = c("auto", "comma", "tab", "semicolon", ",", "\t", ";"), case_delimiter = c("auto", "comma", "tab", "semicolon", ",", "\t", ";"), conquest_population_term = "auto", conquest_population_estimate = "auto", conquest_item_id = "auto", conquest_item_estimate = "auto", conquest_case_person = "auto", conquest_case_estimate = "auto", keep_extra_columns = TRUE )
population_file |
Path to an extracted ConQuest population-parameter table in CSV/TSV/TXT form. |
item_file |
Path to an extracted ConQuest item-estimate table in CSV/TSV/TXT form. |
case_file |
Path to an extracted ConQuest case-level EAP table in CSV/TSV/TXT form. |
population_delimiter |
Delimiter for |
item_delimiter |
Delimiter for |
case_delimiter |
Delimiter for |
conquest_population_term |
Column in |
conquest_population_estimate |
Column in |
conquest_item_id |
Column in |
conquest_item_estimate |
Column in |
conquest_case_person |
Column in |
conquest_case_estimate |
Column in |
keep_extra_columns |
If |
This helper is a thin file-wrapper around normalize_conquest_overlap_tables().
It is intentionally limited to already extracted tabular files and does not
parse raw ConQuest report text.
The recommended workflow is:
export an exact-overlap bundle with build_conquest_overlap_bundle();
extract the relevant ConQuest tables to CSV/TSV/TXT files;
call normalize_conquest_overlap_files() on those files;
pass the result to audit_conquest_overlap().
Read summary(normalized)$normalization_scope before auditing to confirm
that the files were treated as extracted tables, not raw ConQuest report
text, and to check duplicate-ID / non-numeric-estimate pre-audit flags.
A named list with class mfrm_conquest_overlap_tables.
normalize_conquest_overlap_tables(), audit_conquest_overlap()
bundle <- build_conquest_overlap_bundle() tmp_dir <- tempdir() pop_path <- file.path(tmp_dir, "cq_pop.csv") item_path <- file.path(tmp_dir, "cq_item.tsv") case_path <- file.path(tmp_dir, "cq_case.csv") utils::write.csv( data.frame( Term = bundle$mfrmr_population$Parameter, Est = bundle$mfrmr_population$Estimate ), pop_path, row.names = FALSE ) utils::write.table( data.frame( Item = bundle$mfrmr_item_estimates$ResponseVar, Est = bundle$mfrmr_item_estimates$Estimate ), item_path, sep = "\t", row.names = FALSE ) utils::write.csv( data.frame( PID = bundle$mfrmr_case_eap$Person, EAP = bundle$mfrmr_case_eap$Estimate ), case_path, row.names = FALSE ) normalized <- normalize_conquest_overlap_files( population_file = pop_path, item_file = item_path, case_file = case_path, conquest_population_term = "Term", conquest_population_estimate = "Est", conquest_item_id = "Item", conquest_item_estimate = "Est", conquest_case_person = "PID", conquest_case_estimate = "EAP" ) summary(normalized)$normalization_scope audit <- audit_conquest_overlap(bundle, normalized) summary(audit)$summarybundle <- build_conquest_overlap_bundle() tmp_dir <- tempdir() pop_path <- file.path(tmp_dir, "cq_pop.csv") item_path <- file.path(tmp_dir, "cq_item.tsv") case_path <- file.path(tmp_dir, "cq_case.csv") utils::write.csv( data.frame( Term = bundle$mfrmr_population$Parameter, Est = bundle$mfrmr_population$Estimate ), pop_path, row.names = FALSE ) utils::write.table( data.frame( Item = bundle$mfrmr_item_estimates$ResponseVar, Est = bundle$mfrmr_item_estimates$Estimate ), item_path, sep = "\t", row.names = FALSE ) utils::write.csv( data.frame( PID = bundle$mfrmr_case_eap$Person, EAP = bundle$mfrmr_case_eap$Estimate ), case_path, row.names = FALSE ) normalized <- normalize_conquest_overlap_files( population_file = pop_path, item_file = item_path, case_file = case_path, conquest_population_term = "Term", conquest_population_estimate = "Est", conquest_item_id = "Item", conquest_item_estimate = "Est", conquest_case_person = "PID", conquest_case_estimate = "EAP" ) summary(normalized)$normalization_scope audit <- audit_conquest_overlap(bundle, normalized) summary(audit)$summary
mfrmr audit contractNormalize extracted ConQuest overlap tables to the mfrmr audit contract
normalize_conquest_overlap_tables( conquest_population, conquest_item_estimates, conquest_case_eap, conquest_population_term = "auto", conquest_population_estimate = "auto", conquest_item_id = "auto", conquest_item_estimate = "auto", conquest_case_person = "auto", conquest_case_estimate = "auto", keep_extra_columns = TRUE )normalize_conquest_overlap_tables( conquest_population, conquest_item_estimates, conquest_case_eap, conquest_population_term = "auto", conquest_population_estimate = "auto", conquest_item_id = "auto", conquest_item_estimate = "auto", conquest_case_person = "auto", conquest_case_estimate = "auto", keep_extra_columns = TRUE )
conquest_population |
Extracted ConQuest population-parameter table as a data.frame. |
conquest_item_estimates |
Extracted ConQuest item-estimate table as a data.frame. |
conquest_case_eap |
Extracted ConQuest case-level EAP table as a data.frame. |
conquest_population_term |
Column in |
conquest_population_estimate |
Column in |
conquest_item_id |
Column in |
conquest_item_estimate |
Column in |
conquest_case_person |
Column in |
conquest_case_estimate |
Column in |
keep_extra_columns |
If |
This helper does not parse raw ConQuest text output. It standardizes already
extracted tables to the contract used by audit_conquest_overlap():
population parameters become columns Parameter, Estimate, and
EstimateNonNumeric;
item estimates become columns ItemID, Estimate, and
EstimateNonNumeric;
case summaries become columns Person, Estimate, and
EstimateNonNumeric.
The resulting object is intentionally conservative. It does not infer
whether item IDs correspond to exported response variables or original item
levels; that matching step remains part of audit_conquest_overlap(), where
the standardized ConQuest tables are compared against a concrete overlap
bundle.
A named list with class mfrm_conquest_overlap_tables.
The returned object has class mfrm_conquest_overlap_tables and includes:
summary: one-row normalization summary
conquest_population: standardized population table
conquest_item_estimates: standardized item table
conquest_case_eap: standardized case table
settings: source-column metadata
notes: interpretation notes
Read summary(normalized)$normalization_scope before auditing to confirm
that the object contains extracted tabular inputs, not parsed raw ConQuest
report text, and to check duplicate-ID / non-numeric-estimate pre-audit
flags.
build_conquest_overlap_bundle(), audit_conquest_overlap()
normalized <- normalize_conquest_overlap_tables( conquest_population = data.frame( Term = c("(Intercept)", "GroupB", "sigma2"), Est = c(0, 0.2, 1) ), conquest_item_estimates = data.frame( Item = c("I1", "I2"), Est = c(-0.2, 0.2) ), conquest_case_eap = data.frame( PID = c("P001", "P002"), EAP = c(-0.1, 0.1) ), conquest_population_term = "Term", conquest_population_estimate = "Est", conquest_item_id = "Item", conquest_item_estimate = "Est", conquest_case_person = "PID", conquest_case_estimate = "EAP" ) summary(normalized)$normalization_scopenormalized <- normalize_conquest_overlap_tables( conquest_population = data.frame( Term = c("(Intercept)", "GroupB", "sigma2"), Est = c(0, 0.2, 1) ), conquest_item_estimates = data.frame( Item = c("I1", "I2"), Est = c(-0.2, 0.2) ), conquest_case_eap = data.frame( PID = c("P001", "P002"), EAP = c(-0.1, 0.1) ), conquest_population_term = "Term", conquest_population_estimate = "Est", conquest_item_id = "Item", conquest_item_estimate = "Est", conquest_case_person = "PID", conquest_case_estimate = "EAP" ) summary(normalized)$normalization_scope
Creates base-R plots for inspecting anchor drift across calibration waves or visualising the cumulative offset in a screened linking chain.
plot_anchor_drift( x, type = c("drift", "chain", "heatmap", "forest"), facet = NULL, ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE, ... )plot_anchor_drift( x, type = c("drift", "chain", "heatmap", "forest"), facet = NULL, ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE, ... )
x |
An |
type |
Plot type: |
facet |
Optional character vector to filter drift plots to specific facets. |
ci_level |
Confidence level used by |
preset |
Visual preset ( |
draw |
If |
... |
Additional graphical parameters passed to base plotting functions. |
Three plot types are supported:
"drift" (for mfrm_anchor_drift objects): A dot plot of each
element's drift value, grouped by facet. Horizontal reference lines
mark the drift threshold. Red points indicate flagged elements.
"heatmap" (for mfrm_anchor_drift objects): A wave-by-element
heat matrix showing drift magnitude. Darker cells represent larger
absolute drift. Useful for spotting systematic patterns (e.g., all
criteria shifting in the same direction).
"chain" (for mfrm_equating_chain objects): A line plot of
cumulative offsets across the screened linking chain. A flatter line
indicates smaller between-wave shifts; steep segments suggest larger
link offsets that deserve review.
A plotting-data object of class mfrm_plot_data. With
draw = FALSE, result$data$table contains the filtered drift or chain
table, result$data$matrix contains the heatmap matrix when requested,
and the payload includes package-native title, subtitle, legend,
and reference_lines.
Use type = "drift" with an mfrm_anchor_drift object to review flagged
elements directly.
Use type = "heatmap" with an mfrm_anchor_drift object to spot
wave-by-element patterns.
Use type = "chain" with an mfrm_equating_chain object after
build_equating_chain() to inspect cumulative offsets across waves.
Drift is the change in an element's estimated measure between calibration waves, after accounting for the screened common-element link offset. An element is flagged when its absolute drift exceeds a threshold (typically 0.5 logits) and the drift-to-SE ratio exceeds a secondary criterion (typically 2.0), ensuring that only practically noticeable and relatively precise shifts are flagged.
In drift and heatmap plots, red or dark-shaded elements exceed both thresholds. Common causes include rater drift over time, item exposure effects, or curriculum changes.
In chain plots, uneven spacing between waves suggests differential
shifts in the screened linking offsets. The -axis shows cumulative
logit-scale offsets; flatter segments indicate more stable adjacent links.
Steep segments should be checked alongside LinkSupportAdequate and the
retained common-element counts before making longitudinal claims.
For drift objects, it is usually best to read summary(x) first
and then use the plot to see where the flagged values sit.
Build a drift or screened-linking object with detect_anchor_drift() or
build_equating_chain().
Start with draw = FALSE if you want the plotting data for custom
reporting.
Use the base-R plot for quick screening and then inspect the underlying tables for exact values.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
detect_anchor_drift(), build_equating_chain(),
plot_dif_heatmap(), plot_bubble(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") people <- unique(toy$Person) d1 <- toy[toy$Person %in% people[1:12], , drop = FALSE] d2 <- toy[toy$Person %in% people[13:24], , drop = FALSE] fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) drift <- detect_anchor_drift(list(W1 = fit1, W2 = fit2)) drift_plot <- plot_anchor_drift(drift, type = "drift", draw = FALSE) class(drift_plot) names(drift_plot$data) chain <- build_equating_chain(list(F1 = fit1, F2 = fit2)) chain_plot <- plot_anchor_drift(chain, type = "chain", draw = FALSE) head(chain_plot$data$table) if (interactive()) { plot_anchor_drift(drift, type = "heatmap", preset = "publication") }toy <- load_mfrmr_data("example_core") people <- unique(toy$Person) d1 <- toy[toy$Person %in% people[1:12], , drop = FALSE] d2 <- toy[toy$Person %in% people[13:24], , drop = FALSE] fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) drift <- detect_anchor_drift(list(W1 = fit1, W2 = fit2)) drift_plot <- plot_anchor_drift(drift, type = "drift", draw = FALSE) class(drift_plot) names(drift_plot$data) chain <- build_equating_chain(list(F1 = fit1, F2 = fit2)) chain_plot <- plot_anchor_drift(chain, type = "chain", draw = FALSE) head(chain_plot$data$table) if (interactive()) { plot_anchor_drift(drift, type = "heatmap", preset = "publication") }
Builds a 2x2 publication composite for an mfrm_fit, suitable for a
"Figure 1" of a Rasch-MFRM analysis. Panels: (1) Wright map, (2)
rater severity profile with CI whiskers, (3) threshold ladder, (4)
a one-line reliability / separation summary block. Each panel reuses
the standalone plot helper so the visual language is consistent
with the rest of the package.
plot_apa_figure_one( fit, diagnostics = NULL, rater_facet = "Rater", ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE )plot_apa_figure_one( fit, diagnostics = NULL, rater_facet = "Rater", ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
rater_facet |
Facet name to use as the "rater" axis (default
|
ci_level |
Confidence level for the rater severity panel. |
preset |
Visual preset. |
draw |
If |
Invisibly, an mfrm_plot_data object whose data slot
bundles the four panel payloads under wright, severity,
threshold, summary.
Designed for a single-figure Methods or Results overview. The summary panel prints the model class, sample size, log-likelihood, AIC/BIC, and the largest non-Person facet's separation / reliability if available.
plot.mfrm_fit() (type = "wright"),
plot_rater_severity_profile(), plot_threshold_ladder(),
build_apa_outputs(), visual_reporting_template(),
reporting_checklist(), mfrmr_reporting_and_apa,
mfrmr_visual_diagnostics.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_apa_figure_one(fit, draw = FALSE) names(p$data)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_apa_figure_one(fit, draw = FALSE) names(p$data)
Plot bias interaction diagnostics (preferred alias)
plot_bias_interaction( x, plot = c("scatter", "ranked", "heatmap", "abs_t_hist", "facet_profile"), diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, top_n = 40, abs_t_warn = 2, abs_bias_warn = 0.5, p_max = 0.05, sort_by = c("abs_t", "abs_bias", "prob"), show_ci = FALSE, ci_level = 0.95, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )plot_bias_interaction( x, plot = c("scatter", "ranked", "heatmap", "abs_t_hist", "facet_profile"), diagnostics = NULL, facet_a = NULL, facet_b = NULL, interaction_facets = NULL, top_n = 40, abs_t_warn = 2, abs_bias_warn = 0.5, p_max = 0.05, sort_by = c("abs_t", "abs_bias", "prob"), show_ci = FALSE, ci_level = 0.95, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )
x |
Output from |
plot |
Plot type: |
diagnostics |
Optional output from |
facet_a |
First facet name (required when |
facet_b |
Second facet name (required when |
interaction_facets |
Character vector of two or more facets. |
top_n |
Maximum number of ranked rows to keep. |
abs_t_warn |
Warning cutoff for absolute t statistics. |
abs_bias_warn |
Warning cutoff for absolute bias size. |
p_max |
Warning cutoff for p-values. |
sort_by |
Ranking key: |
show_ci |
Logical. When |
ci_level |
Confidence level used when |
main |
Optional plot title override. |
palette |
Optional named color overrides ( |
label_angle |
Label angle hint for ranked/profile labels. |
preset |
Visual preset ( |
draw |
If |
Visualization front-end for bias_interaction_report() with multiple views.
A plotting-data object of class mfrm_plot_data.
"scatter" (default)Scatter plot of bias size (x) vs
screening t-statistic (y). Points colored by flag status. Dashed reference
lines at abs_bias_warn and abs_t_warn. Use for overall triage
of interaction effects.
"ranked"Ranked bar chart of top top_n interactions sorted
by sort_by criterion (absolute t, absolute bias, or probability).
Bars colored red for flagged cells.
"abs_t_hist"Histogram of absolute screening t-statistics across all
interaction cells. Dashed reference line at abs_t_warn. Use for
assessing the overall distribution of interaction effect sizes.
"facet_profile"Per-facet-level aggregation showing mean absolute bias and flag rate. Useful for identifying which individual facet levels drive systematic interaction patterns.
Start with "scatter" or "ranked" for triage, then confirm pattern shape
using "abs_t_hist" and "facet_profile".
Consistent flags across multiple views are stronger screening signals of systematic interaction bias than a single extreme row, but they do not by themselves establish formal inferential evidence.
Estimate bias with estimate_bias() or pass mfrm_fit directly.
Plot with plot = "ranked" for top interactions.
Cross-check using plot = "scatter" and plot = "facet_profile".
bias_interaction_report(), estimate_bias(), plot_displacement()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_bias_interaction( fit, diagnostics = diagnose_mfrm(fit, residual_pca = "none"), facet_a = "Rater", facet_b = "Criterion", preset = "publication", draw = FALSE )toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_bias_interaction( fit, diagnostics = diagnose_mfrm(fit, residual_pca = "none"), facet_a = "Rater", facet_b = "Criterion", preset = "publication", draw = FALSE )
Produces a Rasch-convention bubble chart where each element is a circle positioned at its measure estimate (x) and fit mean-square (y). Bubble radius reflects approximate measurement precision or sample size.
plot_bubble( x, diagnostics = NULL, fit_stat = c("Infit", "Outfit"), view = c("measure", "infit_outfit"), bubble_size = NULL, facets = NULL, fit_range = NULL, top_n = 60, main = NULL, palette = NULL, draw = TRUE, preset = c("standard", "publication", "compact") )plot_bubble( x, diagnostics = NULL, fit_stat = c("Infit", "Outfit"), view = c("measure", "infit_outfit"), bubble_size = NULL, facets = NULL, fit_range = NULL, top_n = 60, main = NULL, palette = NULL, draw = TRUE, preset = c("standard", "publication", "compact") )
x |
Output from |
diagnostics |
Optional output from |
fit_stat |
Fit statistic for the y-axis: |
view |
Layout. |
bubble_size |
Variable controlling bubble radius: |
facets |
Character vector of facets to include. |
fit_range |
Numeric length-2 vector defining the heuristic fit-review
band shown as a shaded region. |
top_n |
Maximum number of elements to plot (default 60). |
main |
Optional custom plot title. |
palette |
Optional named colour vector keyed by facet name. |
draw |
If |
preset |
Visual preset ( |
When x is an mfrm_fit object and diagnostics is omitted,
the function computes diagnostics internally via diagnose_mfrm().
For repeated plotting in the same workflow, passing a precomputed diagnostics
object avoids that extra work.
The x-axis shows element measure estimates on the logit scale
(one logit = one unit change in log-odds of responding in a higher
category). The y-axis shows the selected fit mean-square statistic.
A shaded band between fit_range[1] and fit_range[2]
highlights the active or manually supplied heuristic review range.
Bubble radius options:
"SE": inversely proportional to standard error—larger
circles indicate more precisely estimated elements under the current
SE approximation.
"N": proportional to observation count—larger
circles indicate elements with more data.
"equal": uniform size, useful when SE or N differences
distract from the fit pattern.
Person estimates are excluded by default because they typically outnumber facet elements and obscure the display.
Invisibly, an object of class mfrm_plot_data.
Points near the horizontal reference line at 1.0 are closer to model expectation on the selected MnSq scale. Points above the upper band suggest underfit relative to the current review heuristic; these elements may have inconsistent scoring. Points below the lower band suggest overfit relative to the current review heuristic; these may indicate redundancy or restricted range. Points are colored by facet for easy identification.
Fit a model with fit_mfrm().
Compute diagnostics once with diagnose_mfrm().
Call plot_bubble(fit, diagnostics = diag) to inspect the most extreme elements.
diagnose_mfrm, plot_unexpected,
plot_fair_average
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_bubble(fit, diagnostics = diag, draw = FALSE) head(p$data$table[, c("Facet", "Level", "Estimate", "Infit", "Outfit")]) # Look for (default `view = "measure"`): bubbles inside the shaded # active fit-review band. Bubbles above the band are underfit # (noisy elements); below the band are overfit (overly predictable). # # For the Winsteps Table 30 layout pass `view = "infit_outfit"`: p_io <- plot_bubble(fit, diagnostics = diag, view = "infit_outfit", draw = FALSE) p_io$data$view # Look for: bubbles clustered inside the central active-band square. # Points outside the upper-right corner have both Infit and Outfit # above the upper band (consistent underfit); points outside the # lower-left have both below the lower band (consistent overfit). # Bubble size in # this view defaults to N (observation count) so the visual # weighting matches how seriously the misfit should be taken.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_bubble(fit, diagnostics = diag, draw = FALSE) head(p$data$table[, c("Facet", "Level", "Estimate", "Infit", "Outfit")]) # Look for (default `view = "measure"`): bubbles inside the shaded # active fit-review band. Bubbles above the band are underfit # (noisy elements); below the band are overfit (overly predictable). # # For the Winsteps Table 30 layout pass `view = "infit_outfit"`: p_io <- plot_bubble(fit, diagnostics = diag, view = "infit_outfit", draw = FALSE) p_io$data$view # Look for: bubbles clustered inside the central active-band square. # Points outside the upper-right corner have both Infit and Outfit # above the upper band (consistent underfit); points outside the # lower-left have both below the lower band (consistent overfit). # Bubble size in # this view defaults to N (observation count) so the visual # weighting matches how seriously the misfit should be taken.
Draws curves across theta for ordered response categories.
These are cumulative ordered-category curves, not mirt's empirical plot and
not an S-X2 item-fit test.
plot_cumulative_category_curve( fit, curve_group = NULL, theta_range = c(-6, 6), theta_points = 241L, draw = TRUE )plot_cumulative_category_curve( fit, curve_group = NULL, theta_range = c(-6, 6), theta_points = 241L, draw = TRUE )
fit |
An |
curve_group |
Optional curve group label to retain. |
theta_range |
Numeric length-2 theta range. |
theta_points |
Number of theta grid points. |
draw |
If |
For each non-minimum category threshold , the displayed curve is
An mfrm_plot_data object with a cumulative data frame.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) cc <- plot_cumulative_category_curve(fit, draw = FALSE) head(cc$data$cumulative)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) cc <- plot_cumulative_category_curve(fit, draw = FALSE) head(cc$data$cumulative)
Visualizes the interaction between a facet and a grouping variable as a heatmap. Rows represent facet levels, columns represent group values, and cell color indicates the selected metric.
plot_dif_heatmap( x, metric = c("obs_exp", "t", "contrast"), draw = TRUE, show_values = TRUE, value_digits = 2L, flag_threshold = NULL, scale_limit = NULL, flag_color = "black", ... )plot_dif_heatmap( x, metric = c("obs_exp", "t", "contrast"), draw = TRUE, show_values = TRUE, value_digits = 2L, flag_threshold = NULL, scale_limit = NULL, flag_color = "black", ... )
x |
Output from |
metric |
Which metric to plot: |
draw |
If |
show_values |
Logical. If |
value_digits |
Non-negative integer number of digits after the decimal point for cell labels. |
flag_threshold |
Optional non-negative absolute-value threshold. When
supplied, cells with |
scale_limit |
Optional positive scalar for a symmetric color scale
from |
flag_color |
Border color for cells meeting |
... |
Additional graphical parameters passed to |
Invisibly, an mfrm_plot_data payload whose data slot bundles
the row x column metric matrix ($matrix), the source long table
($pairs), and the metric label. Earlier 0.1.x releases returned the
bare matrix; consume $data$matrix to keep code forward-compatible.
Warm colors (red) indicate positive Obs-Exp values (the model underestimates the facet level for that group).
Cool colors (blue) indicate negative Obs-Exp values (the model overestimates).
White/neutral indicates no systematic difference.
The "contrast" view is best for pairwise differential-functioning
summaries, whereas
"obs_exp" and "t" are best for cell-level diagnostics.
Compute interaction with dif_interaction_table() or differential-
functioning contrasts with analyze_dff().
Plot with plot_dif_heatmap(...).
Identify extreme cells or contrasts for follow-up.
dif_interaction_table(), analyze_dff(), analyze_dif(), dif_report()
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") int <- dif_interaction_table(fit, diag, facet = "Rater", group = "Group", data = toy, min_obs = 2) heat <- plot_dif_heatmap(int, metric = "obs_exp", draw = FALSE) dim(heat$data$matrix) # Look for (`metric = "obs_exp"`): cells near 0 are aligned with # model expectation; |Obs - Exp| > 0.5 logits is a substantive # gap. With `metric = "t"` the cell scale becomes a standardized # residual where |t| > 2 is a screening flag, not a standalone # hypothesis test. With `metric = "contrast"` the layout switches # to Level x GroupPair and reads as the pairwise differential- # functioning contrast (use `analyze_dff()`).toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") int <- dif_interaction_table(fit, diag, facet = "Rater", group = "Group", data = toy, min_obs = 2) heat <- plot_dif_heatmap(int, metric = "obs_exp", draw = FALSE) dim(heat$data$matrix) # Look for (`metric = "obs_exp"`): cells near 0 are aligned with # model expectation; |Obs - Exp| > 0.5 logits is a substantive # gap. With `metric = "t"` the cell scale becomes a standardized # residual where |t| > 2 is a screening flag, not a standalone # hypothesis test. With `metric = "contrast"` the layout switches # to Level x GroupPair and reads as the pairwise differential- # functioning contrast (use `analyze_dff()`).
Compact effect-size summary for a analyze_dff() / analyze_dif()
result. Shows each contrast's signed effect size as a horizontal bar
with a vertical reference at zero, coloured by the method-appropriate
classification. ETS-style A / B / C colours are used only when they
are actually available; residual-method screening labels otherwise use
the neutral colour.
plot_dif_summary( x, top_n = 30L, sort_by = c("abs_effect", "effect", "classification"), preset = c("standard", "publication", "compact"), draw = TRUE, ci_level = NULL, effect_thresholds = NULL, effect_axis_label = NULL )plot_dif_summary( x, top_n = 30L, sort_by = c("abs_effect", "effect", "classification"), preset = c("standard", "publication", "compact"), draw = TRUE, ci_level = NULL, effect_thresholds = NULL, effect_axis_label = NULL )
x |
Output from |
top_n |
Maximum rows shown (default |
sort_by |
|
preset |
Visual preset. |
draw |
If |
ci_level |
Optional confidence level for approximate normal
intervals drawn from |
effect_thresholds |
Optional numeric vector of absolute effect-size
guide lines to draw at |
effect_axis_label |
Optional x-axis label override. When |
An mfrm_plot_data object whose data slot contains
columns Pair, Effect, SE, Classification, Color.
Bars are anchored at zero. Width corresponds to effect size on the
contrast's native scale. For method = "residual", this is the
observed-minus-expected average screening contrast between groups. For
method = "refit", this is the subgroup parameter difference on the
fitted logit scale when linking support allows a comparable contrast.
The ETS classification (A negligible, B moderate, C large) drives bar
colour only when ClassificationSystem == "ETS"; otherwise the bar
uses the preset's neutral.
analyze_dff(), analyze_dif(), plot_dif_heatmap().
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") dff <- analyze_dff(fit, diagnostics = diag, facet = "Rater", group = "Group", data = toy) unique(dff$dif_table$ClassificationSystem) p <- plot_dif_summary(dff, draw = FALSE) head(p$data$data)toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") dff <- analyze_dff(fit, diagnostics = diag, facet = "Rater", group = "Group", data = toy) unique(dff$dif_table$ClassificationSystem) p <- plot_dif_summary(dff, draw = FALSE) head(p$data$data)
Plot displacement diagnostics using base R
plot_displacement( x, diagnostics = NULL, anchored_only = FALSE, facets = NULL, plot_type = c("lollipop", "hist"), top_n = 40, show_ci = FALSE, ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE, ... )plot_displacement( x, diagnostics = NULL, anchored_only = FALSE, facets = NULL, plot_type = c("lollipop", "hist"), top_n = 40, show_ci = FALSE, ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE, ... )
x |
Output from |
diagnostics |
Optional output from |
anchored_only |
Keep only anchored/group-anchored levels. |
facets |
Optional subset of facets. |
plot_type |
|
top_n |
Maximum levels shown in |
show_ci |
Logical. When |
ci_level |
Confidence level used when |
preset |
Visual preset ( |
draw |
If |
... |
Additional arguments passed to |
Displacement quantifies how much a single element's calibration would shift the overall model if it were allowed to move freely. It is computed as:
where the sums run over all observations involving element .
The standard error is , and
a t-statistic flags
elements whose observed residual pattern is inconsistent with the
current anchor structure.
Displacement is most informative after anchoring: large values suggest that anchored values may be drifting from the current sample. For non-anchored analyses, displacement reflects residual calibration tension.
A plotting-data object of class mfrm_plot_data.
"lollipop" (default)Dot-and-line chart of displacement values.
X-axis: displacement (logits). Y-axis: element labels. Points
colored red when flagged (default:
logits). Dashed lines at threshold. Ordered by
absolute displacement.
"hist"Histogram of displacement values with Freedman-Diaconis
breaks. Dashed reference lines at threshold. Use for
inspecting the overall distribution shape.
Lollipop: top absolute displacement levels; flagged points indicate larger movement from anchor expectations.
Histogram: overall displacement distribution and threshold lines. A symmetric distribution centred near zero indicates good anchor stability; heavy tails or skew suggest systematic drift.
Use anchored_only = TRUE when your main question is anchor robustness.
Run with plot_type = "lollipop" and anchored_only = TRUE.
Inspect distribution with plot_type = "hist".
Drill into flagged rows via displacement_table().
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
displacement_table(), plot_unexpected(), plot_fair_average(),
plot_qc_dashboard(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_displacement(fit, anchored_only = FALSE, draw = FALSE) if (interactive()) { plot_displacement( fit, anchored_only = FALSE, plot_type = "lollipop", preset = "publication" ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_displacement(fit, anchored_only = FALSE, draw = FALSE) if (interactive()) { plot_displacement( fit, anchored_only = FALSE, plot_type = "lollipop", preset = "publication" ) }
Plots observed response behavior against model-expected behavior across person-measure bins for one facet level. This gives a mirt-inspired empirical fit view for many-facet models: points show empirical bin means (or empirical category proportions), and the line shows the corresponding fitted-model expectation in the same bins.
plot_empirical_fit( fit, diagnostics = NULL, facet = NULL, level = NULL, category = NULL, bins = 10L, min_bin_n = 5L, draw = TRUE, preset = c("standard", "publication", "compact"), main = NULL )plot_empirical_fit( fit, diagnostics = NULL, facet = NULL, level = NULL, category = NULL, bins = 10L, min_bin_n = 5L, draw = TRUE, preset = c("standard", "publication", "compact"), main = NULL )
fit |
Output from |
diagnostics |
Optional output from |
facet |
Facet to inspect. |
level |
Level within |
category |
Optional score category. When |
bins |
Number of approximately equal-count person-measure bins. |
min_bin_n |
Minimum weighted bin size flagged in the returned table. |
draw |
If |
preset |
Visual preset. |
main |
Optional plot title. |
This is a descriptive empirical overlay, not mirt's S_X2 test and not a
replacement for strict marginal diagnostics. It is designed for the common
research workflow: identify a level with fit_p_table(),
plot_bubble(), or summary(diagnose_mfrm(...)), then inspect where the
observed response curve departs from the fitted model across the person
measure scale.
The plot forms approximately equal-count bins after sorting rows
by PersonMeasure. For the mean-score view, each bin reports
The displayed standard error is
For the category-probability view, is replaced by
, by
, and by
.
This resembles the empirical overlays in mirt::itemfit(empirical.plot=...)
because both compare empirical bin behavior to expected model curves.
However, mirt's S_X2.plot is constructed from conditional sum-score
information used by the S_X2 statistic, while this mfrmr plot bins by
estimated person measure for a selected facet level and does not return a
chi-square statistic, degrees of freedom, RMSEA, or p.S_X2.
An mfrm_plot_data object. The bin_table / data slot contains
Bin, N, MeanPersonMeasure, Observed, Expected, Residual, SE,
StdResidual, and LowN; raw_table contains the filtered row-level
metrics used to build the bins.
Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. doi:10.18637/jss.v048.i06
mirt itemfit() reference:
https://philchalmers.github.io/mirt/docs/reference/itemfit.html
fit_p_table(), plot_bubble(), plot_person_fit(),
plot_qc_dashboard()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_empirical_fit(fit, diagnostics = diag, facet = "Rater", bins = 5, draw = FALSE) p$data$target p$data$bin_table p_cat <- plot_empirical_fit(fit, diagnostics = diag, facet = "Rater", category = 4, bins = 5, draw = FALSE) p_cat$data$metric ## Not run: # mirt's empirical plot is item-based; this mfrmr plot is facet-level. # Use it as an observed-vs-expected diagnostic layer, not as S_X2. plot_empirical_fit(fit, diagnostics = diag, facet = "Rater", level = "R01", bins = 6) ## End(Not run)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_empirical_fit(fit, diagnostics = diag, facet = "Rater", bins = 5, draw = FALSE) p$data$target p$data$bin_table p_cat <- plot_empirical_fit(fit, diagnostics = diag, facet = "Rater", category = 4, bins = 5, draw = FALSE) p_cat$data$metric ## Not run: # mirt's empirical plot is item-based; this mfrmr plot is facet-level. # Use it as an observed-vs-expected diagnostic layer, not as S_X2. plot_empirical_fit(fit, diagnostics = diag, facet = "Rater", level = "R01", bins = 6) ## End(Not run)
Draws model-implied expected score curves across theta for the active
rating-scale / partial-credit curve groups. This is the user-facing classic
expected-score front door; the underlying coordinates are the same scale
curves returned by category_curves_report().
plot_expected_score_curve( fit, curve_group = NULL, theta_range = c(-6, 6), theta_points = 241L, draw = TRUE )plot_expected_score_curve( fit, curve_group = NULL, theta_range = c(-6, 6), theta_points = 241L, draw = TRUE )
fit |
An |
curve_group |
Optional curve group label to retain. |
theta_range |
Numeric length-2 theta range. |
theta_points |
Number of theta grid points. |
draw |
If |
For each curve group and theta grid point, the expected score is
This is the model-implied expected category score, not an empirical fit smoother.
An mfrm_plot_data object with an expected data frame.
category_curves_report(), plot_test_characteristic_curve(),
plot_cumulative_category_curve()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_expected_score_curve(fit, draw = FALSE) head(p$data$expected)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_expected_score_curve(fit, draw = FALSE) head(p$data$expected)
Plot facet-equivalence results
plot_facet_equivalence( x, diagnostics = NULL, facet = NULL, type = c("forest", "rope"), draw = TRUE, ... )plot_facet_equivalence( x, diagnostics = NULL, facet = NULL, type = c("forest", "rope"), draw = TRUE, ... )
x |
Output from |
diagnostics |
Optional output from |
facet |
Facet to analyze when |
type |
Plot type: |
draw |
If |
... |
Additional graphical arguments passed to base plotting functions. |
plot_facet_equivalence() is a visual companion to
analyze_facet_equivalence(). It does not recompute the equivalence
analysis; it only reshapes and displays the returned results.
Invisibly returns the plotting data. If draw = FALSE, the plotting
data are returned without drawing.
"forest" places each level on the logit scale with its confidence
interval and shades the practical-equivalence region around the weighted
grand mean.
"rope" shows the percentage of each level's uncertainty mass that falls
inside the ROPE.
In the forest plot, the shaded band marks the ROPE
(equivalence_bound around the weighted grand mean).
Levels whose entire confidence interval lies inside this band are
close to the facet grand mean under this descriptive screen. Levels whose
interval extends outside the band are more displaced from the facet average.
Overlapping intervals between two elements suggest they are not
reliably separable, but overlap alone does not establish formal
equivalence—use the TOST results for that.
In the ROPE bar chart, each bar shows the proportion of the element's normal-approximation distribution that falls inside the ROPE-style grand-mean proximity. Values > 95\ the element's normal-approximation uncertainty falls near the facet average; 50–95\ meaningfully displaced from that average.
Start with type = "forest" to see the facet on the logit scale.
Switch to type = "rope" when you want a ranking of levels by
grand-mean proximity.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) eq <- analyze_facet_equivalence(fit, facet = "Rater") pdat <- plot_facet_equivalence(eq, type = "forest", draw = FALSE) c(pdat$facet, pdat$type)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) eq <- analyze_facet_equivalence(fit, facet = "Rater") pdat <- plot_facet_equivalence(eq, type = "forest", draw = FALSE) c(pdat$facet, pdat$type)
Plot a facet-quality dashboard
plot_facet_quality_dashboard( x, diagnostics = NULL, facet = NULL, bias_results = NULL, severity_warn = 1, misfit_warn = NULL, central_tendency_max = 0.25, bias_count_warn = 1L, bias_abs_t_warn = 2, bias_abs_size_warn = 0.5, bias_p_max = 0.05, plot_type = c("severity", "flags"), top_n = 20, main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )plot_facet_quality_dashboard( x, diagnostics = NULL, facet = NULL, bias_results = NULL, severity_warn = 1, misfit_warn = NULL, central_tendency_max = 0.25, bias_count_warn = 1L, bias_abs_t_warn = 2, bias_abs_size_warn = 0.5, bias_p_max = 0.05, plot_type = c("severity", "flags"), top_n = 20, main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )
x |
Output from |
diagnostics |
Optional output from |
facet |
Optional facet name. |
bias_results |
Optional bias bundle or list of bundles. |
severity_warn |
Absolute estimate cutoff used to flag severity outliers. |
misfit_warn |
Mean-square cutoff used to flag misfit. |
central_tendency_max |
Absolute estimate cutoff used to flag central tendency. |
bias_count_warn |
Minimum flagged-bias row count required to flag a level. |
bias_abs_t_warn |
Absolute |
bias_abs_size_warn |
Absolute bias-size cutoff used when deriving bias-row flags from a raw bias bundle. |
bias_p_max |
Probability cutoff used when deriving bias-row flags from a raw bias bundle. |
plot_type |
Plot type, |
top_n |
Number of rows to keep in the plot data. |
main |
Optional plot title. |
palette |
Optional named color overrides. |
label_angle |
Label angle hint for the |
draw |
If |
... |
Reserved for generic compatibility. |
A plotting-data object of class mfrm_plot_data.
facet_quality_dashboard(), summary.mfrm_facet_dashboard()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_facet_quality_dashboard(fit, diagnostics = diag, draw = FALSE) p$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_facet_quality_dashboard(fit, diagnostics = diag, draw = FALSE) p$data$plot
Plot facet variability diagnostics using base R
plot_facets_chisq( x, diagnostics = NULL, fixed_p_max = 0.05, random_p_max = 0.05, plot_type = c("fixed", "random", "variance"), main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )plot_facets_chisq( x, diagnostics = NULL, fixed_p_max = 0.05, random_p_max = 0.05, plot_type = c("fixed", "random", "variance"), main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )
x |
Output from |
diagnostics |
Optional output from |
fixed_p_max |
Warning cutoff for fixed-effect chi-square p-values. |
random_p_max |
Warning cutoff for random-effect chi-square p-values. |
plot_type |
|
main |
Optional custom plot title. |
palette |
Optional named color overrides ( |
label_angle |
X-axis label angle for bar-style plots. |
preset |
Visual preset ( |
draw |
If |
Facet chi-square tests assess whether the elements within each facet differ significantly.
Fixed-effect chi-square tests the null hypothesis
(all element
measures are equal). A flagged result ( fixed_p_max)
suggests detectable between-element spread under the fitted model, but
it should be interpreted alongside design quality, sample size, and other
diagnostics.
Random-effect chi-square tests whether element heterogeneity exceeds what would be expected from measurement error alone, treating element measures as random draws. A flagged result is screening evidence that the facet may not be exchangeable under the current model.
Random variance is the estimated between-element variance component after removing measurement error. It quantifies the magnitude of true heterogeneity on the logit scale.
A plotting-data object of class mfrm_plot_data.
"fixed" (default)Bar chart of fixed-effect chi-square by
facet. Bars colored red when the null hypothesis is rejected at
fixed_p_max. A flagged (red) bar means the facet shows spread worth
reviewing under the fitted model.
"random"Bar chart of random-effect chi-square by facet.
Bars colored red when rejected at random_p_max.
"variance"Bar chart of estimated random variance
(logit) by facet. Reference line at 0. Larger values
indicate greater true heterogeneity among elements.
Colored flags reflect configured p-value thresholds (fixed_p_max,
random_p_max). For the fixed test, a flagged (red) result suggests
facet spread worth reviewing under the current model. For the random test, a
flagged result is screening evidence that the facet may contribute
non-trivial heterogeneity beyond measurement error.
Review "fixed" and "random" panels for flagged facets.
Check "variance" to contextualize heterogeneity.
Cross-check with inter-rater and element-level fit diagnostics.
facets_chisq_table(), plot_interrater_agreement(), plot_qc_dashboard()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_facets_chisq(fit, draw = FALSE) if (interactive()) { plot_facets_chisq( fit, draw = TRUE, plot_type = "fixed", preset = "publication", main = "Facet Chi-square (Customized)", palette = c(fixed_ok = "#2b8cbe", fixed_flag = "#cb181d"), label_angle = 45 ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_facets_chisq(fit, draw = FALSE) if (interactive()) { plot_facets_chisq( fit, draw = TRUE, plot_type = "fixed", preset = "publication", main = "Facet Chi-square (Customized)", palette = c(fixed_ok = "#2b8cbe", fixed_flag = "#cb181d"), label_angle = 45 ) }
Plot fair-average diagnostics using base R
plot_fair_average( x, diagnostics = NULL, facet = NULL, metric = c("AdjustedAverage", "StandardizedAdjustedAverage", "FairM", "FairZ"), plot_type = c("difference", "scatter"), top_n = 40, show_ci = FALSE, ci_level = 0.95, draw = TRUE, preset = c("standard", "publication", "compact"), ... )plot_fair_average( x, diagnostics = NULL, facet = NULL, metric = c("AdjustedAverage", "StandardizedAdjustedAverage", "FairM", "FairZ"), plot_type = c("difference", "scatter"), top_n = 40, show_ci = FALSE, ci_level = 0.95, draw = TRUE, preset = c("standard", "publication", "compact"), ... )
x |
Output from |
diagnostics |
Optional output from |
facet |
Optional facet name for level-wise lollipop plots. |
metric |
Adjusted-score metric. Accepts legacy names ( |
plot_type |
|
top_n |
Maximum levels shown for |
show_ci |
Logical. When |
ci_level |
Confidence level used when |
draw |
If |
preset |
Visual preset ( |
... |
Additional arguments passed to |
Fair-average plots compare observed scoring tendency against model-based fair metrics.
FairM is the model-predicted mean score for each element, adjusting for the ability distribution of persons actually encountered. It answers: "What average score would this rater/criterion produce if all raters/criteria saw the same mix of persons?"
FairZ standardises FairM to a z-score across elements within each facet, making it easier to compare relative severity across facets with different raw-score scales.
Use FairM when the raw-score metric is meaningful (e.g., reporting average ratings on the original 1–4 scale). Use FairZ when comparing standardised severity ranks across facets.
A plotting-data object of class mfrm_plot_data.
With draw = FALSE, the payload includes title, subtitle,
legend, reference_lines, and the stacked fair-average data.
"difference" (default)Lollipop chart showing the gap between observed and fair-average score for each element. X-axis: Observed - Fair metric. Y-axis: element labels. Points colored teal (lenient, gap >= 0) or orange (severe, gap < 0). Ordered by absolute gap.
"scatter"Scatter plot of fair metric (x) vs observed average (y) with an identity line. Points colored by facet. Useful for checking overall alignment between observed and model-adjusted scores.
Difference plot: ranked element-level gaps (Observed - Fair), useful
for triage of potentially lenient/severe levels.
Scatter plot: global agreement pattern relative to the identity line.
Larger absolute gaps suggest stronger divergence between observed and model-adjusted scoring.
Start with plot_type = "difference" to find largest discrepancies.
Use plot_type = "scatter" to check overall alignment pattern.
Follow up with facet-level diagnostics for flagged levels.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
fair_average_table(), plot_unexpected(), plot_displacement(),
plot_qc_dashboard(), mfrmr_visual_diagnostics
toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) p <- plot_fair_average(fit, metric = "AdjustedAverage", draw = FALSE) if (interactive()) { plot_fair_average(fit, metric = "AdjustedAverage", plot_type = "difference") }toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) p <- plot_fair_average(fit, metric = "AdjustedAverage", draw = FALSE) if (interactive()) { plot_fair_average(fit, metric = "AdjustedAverage", plot_type = "difference") }
Visualize the direction table returned by fit_direction_summary(). The
default plot is a stacked base-R bar chart; with draw = FALSE, the function
returns a stable mfrm_plot_data payload for custom graphics.
plot_fit_direction_summary( x, diagnostics = NULL, facet = NULL, directions = c("underfit", "overfit", "mixed", "in_band"), value = c("rate", "count"), draw = TRUE, ... )plot_fit_direction_summary( x, diagnostics = NULL, facet = NULL, directions = c("underfit", "overfit", "mixed", "in_band"), value = c("rate", "count"), draw = TRUE, ... )
x |
Output from |
diagnostics |
Optional diagnostics passed through when |
facet |
Optional facet filter. |
directions |
Direction labels to include. |
value |
Plot rates or counts. |
draw |
If |
... |
Passed to |
An mfrm_plot_data object with summary and long-form data
slots.
fit_direction_summary(), fit_p_table(),
plot_simulation_misfit_rates()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_fit_direction_summary(fit, diagnostics = diag, draw = FALSE) p$data$datatoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") p <- plot_fit_direction_summary(fit, diagnostics = diag, draw = FALSE) p$data$data
Draws a person x item (or person x facet-level) matrix coloured by observed category, with rows ordered by person measure and columns ordered by location measure. Unexpected responses (those that fall far from the expected category at a given theta) are highlighted with a heavy border so the visual reads as a Rasch-convention Guttman scalogram.
plot_guttman_scalogram( fit, diagnostics = NULL, column_facet = NULL, top_n_persons = 40L, highlight_unexpected = TRUE, preset = c("standard", "publication", "compact"), draw = TRUE )plot_guttman_scalogram( fit, diagnostics = NULL, column_facet = NULL, top_n_persons = 40L, highlight_unexpected = TRUE, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
column_facet |
Facet name used for the columns. Default
|
top_n_persons |
Maximum number of persons shown (default
|
highlight_unexpected |
Logical. When |
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data object whose data slot bundles the
scalogram matrix and the optional unexpected-response overlay.
unexpected_response_table() for the case-level review of
the cells flagged in the overlay;
plot_rater_agreement_heatmap() for a complementary rater-pair
view of the same residual structure;
diagnose_mfrm() for the underlying diagnostics bundle.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_guttman_scalogram(fit, draw = FALSE) dim(p$data$matrix) # Look for: a clean monotone "staircase" of higher scores in the # upper-right triangle and lower scores in the lower-left, once # rows are sorted by person ability. Cells circled by the # unexpected-response overlay break the staircase and warrant # case-level review with `unexpected_response_table()`.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_guttman_scalogram(fit, draw = FALSE) dim(p$data$matrix) # Look for: a clean monotone "staircase" of higher scores in the # upper-right triangle and lower scores in the lower-left, once # rows are sorted by person ability. Cells circled by the # unexpected-response overlay break the staircase and warrant # case-level review with `unexpected_response_table()`.
Visualize the design-weighted precision curve and optionally
per-facet-level contribution curves from compute_information().
plot_information( x, type = c("tif", "iif", "se", "both"), facet = NULL, draw = TRUE, ... )plot_information( x, type = c("tif", "iif", "se", "both"), facet = NULL, draw = TRUE, ... )
x |
Output from |
type |
|
facet |
For |
draw |
If |
... |
Additional graphical parameters. |
Invisibly, an mfrm_plot_data object.
"tif": overall design-weighted precision across theta.
"se": approximate standard error across theta.
"both": precision and approximate SE together, useful for presentations.
"iif": facet-level contribution curves for one selected facet in a
supported RSM, PCM, or bounded GPCM fit.
Use "tif" for a quick overall read on precision.
Use "se" when standard-error language is easier to communicate than
precision.
Use "both" when you want both views in one figure.
Use "iif" when you want to see which facet levels are shaping the total
precision curve.
The total curve peaks where the realized design is most precise.
SE is derived as 1 / sqrt(precision); lower is better.
Facet-level curves show which facet levels contribute most to that realized precision at each theta.
For bounded GPCM, those contributions include the squared
discrimination scaling implied by the fitted slope_facet.
If the precision peak sits far from the bulk of person measures, the realized design may be poorly targeted.
draw = FALSE returns an mfrm_plot_data object. The underlying plotting
data are stored in $data$plot. For type = "tif", "se", or "both",
those rows come from x$tif. For type = "iif", the returned rows come
from x$iif filtered to the requested facet.
Compute information with compute_information().
Plot with plot_information(info) for the total precision curve.
Use plot_information(info, type = "iif", facet = "Rater") for
facet-level contributions.
Use draw = FALSE when you want reusable plotting payloads for custom
graphics or reporting helpers.
compute_information(), fit_mfrm()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) info <- compute_information(fit) tif_data <- plot_information(info, type = "tif", draw = FALSE) head(tif_data$data$plot) iif_data <- plot_information(info, type = "iif", facet = "Rater", draw = FALSE) head(iif_data$data$plot)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25) info <- compute_information(fit) tif_data <- plot_information(info, type = "tif", draw = FALSE) head(tif_data$data$plot) iif_data <- plot_information(info, type = "iif", facet = "Rater", draw = FALSE) head(iif_data$data$plot)
Plot inter-rater agreement diagnostics using base R
plot_interrater_agreement( x, diagnostics = NULL, rater_facet = NULL, context_facets = NULL, exact_warn = 0.5, corr_warn = 0.3, plot_type = c("exact", "corr", "difference"), top_n = 20, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )plot_interrater_agreement( x, diagnostics = NULL, rater_facet = NULL, context_facets = NULL, exact_warn = 0.5, corr_warn = 0.3, plot_type = c("exact", "corr", "difference"), top_n = 20, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )
x |
Output from |
diagnostics |
Optional output from |
rater_facet |
Name of the rater facet when |
context_facets |
Optional context facets when |
exact_warn |
Warning threshold for exact agreement. |
corr_warn |
Warning threshold for pairwise correlation. |
plot_type |
|
top_n |
Maximum pairs displayed for bar-style plots. |
main |
Optional custom plot title. |
palette |
Optional named color overrides ( |
label_angle |
X-axis label angle for bar-style plots. |
preset |
Visual preset ( |
draw |
If |
Inter-rater agreement plots summarize pairwise consistency for a chosen rater facet. Agreement statistics are computed over observations that share the same person and context-facet levels, ensuring that comparisons reflect identical rating targets.
Exact agreement is the proportion of matched observations where both raters assigned the same category score. The expected agreement line shows the proportion expected by chance given each rater's marginal category distribution, providing a baseline.
Pairwise correlation is the Pearson correlation between scores assigned by each rater pair on matched observations.
The difference plot decomposes disagreement into systematic bias (mean signed difference on x-axis: positive = Rater 1 more severe) and total inconsistency (mean absolute difference on y-axis). Points near the origin indicate both low bias and low inconsistency.
The context_facets parameter specifies which facets define "the
same rating target" (e.g., Criterion). When NULL, all non-rater
facets are used as context.
A plotting-data object of class mfrm_plot_data.
"exact" (default)Bar chart of exact agreement proportion by
rater pair. Expected agreement overlaid as connected circles.
Horizontal reference line at exact_warn. Bars colored red when
observed agreement falls below the warning threshold.
"corr"Bar chart of pairwise Pearson correlation by rater
pair. Reference line at corr_warn. Ordered by correlation
(lowest first). Low correlations suggest inconsistent rank
ordering of persons between raters.
"difference"Scatter plot. X-axis: mean signed score
difference (Rater 1 Rater 2); positive values indicate
Rater 1 is more severe. Y-axis: mean absolute difference
(overall disagreement magnitude). Points colored red when
flagged. Vertical reference at 0.
Pairs below exact_warn and/or corr_warn should be prioritized for
rater calibration review. On the difference plot, points far from the
origin along the x-axis indicate systematic bias; points high on the
y-axis indicate large inconsistency regardless of direction.
Select rater facet and run "exact" view.
Confirm with "corr" view.
Use "difference" to inspect directional disagreement.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
interrater_agreement_table(), plot_facets_chisq(),
plot_qc_dashboard(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_interrater_agreement(fit, rater_facet = "Rater", draw = FALSE) if (interactive()) { plot_interrater_agreement( fit, rater_facet = "Rater", draw = TRUE, plot_type = "exact", main = "Inter-rater Agreement (Customized)", palette = c(ok = "#2b8cbe", flag = "#cb181d"), label_angle = 45, preset = "publication" ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_interrater_agreement(fit, rater_facet = "Rater", draw = FALSE) if (interactive()) { plot_interrater_agreement( fit, rater_facet = "Rater", draw = TRUE, plot_type = "exact", main = "Inter-rater Agreement (Customized)", palette = c(ok = "#2b8cbe", flag = "#cb181d"), label_angle = 45, preset = "publication" ) }
Convenience wrapper around plot_person_fit() using the classic KIDMAP-style
naming that many Rasch users recognize. The returned payload is identical to
plot_person_fit().
plot_kidmap(fit, diagnostics = NULL, ...)plot_kidmap(fit, diagnostics = NULL, ...)
fit |
An |
diagnostics |
Optional |
... |
Passed to |
An mfrm_plot_data object.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) km <- plot_kidmap(fit, draw = FALSE) head(km$data$data)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) km <- plot_kidmap(fit, draw = FALSE) head(km$data$data)
Builds an N x N heatmap of pairwise standardized residuals between facet levels, computed from the diagnostics observation table. Cells with large absolute values flag pairs of facet elements (e.g. two raters, two items) whose residuals co-move more than the main-effects MFRM expects, which is the standard Yen Q3-style indicator of local response dependence.
plot_local_dependence_heatmap( fit, diagnostics = NULL, facet = "Rater", min_pairs = 5L, preset = c("standard", "publication", "compact"), draw = TRUE )plot_local_dependence_heatmap( fit, diagnostics = NULL, facet = "Rater", min_pairs = 5L, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
facet |
Facet whose levels are placed on both axes (default
|
min_pairs |
Minimum number of shared response opportunities
required to retain a pair. Pairs below the threshold are shown
as |
preset |
Visual preset. |
draw |
If |
This helper complements plot_marginal_pairwise(): the marginal
version uses posterior-integrated agreement residuals on a
top-N pair list, while this view shows every pair on a shared color
scale so an analyst can scan for diagonal blocks or hotspots.
An mfrm_plot_data whose data slot bundles the symmetric
residual matrix, the long-form pairs table, and the threshold
used.
plot_marginal_pairwise(), plot_qc_dashboard(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_local_dependence_heatmap(fit, draw = FALSE) dim(p$data$matrix) # Look for: |off-diagonal correlation| < 0.2 is the typical # acceptable regime; values >= 0.3 (Yen 1984 / Marais 2013 # guideline) flag pairs that may share dependence beyond the # main-effects MFRM. Inspect those cells in `diag$obs`.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_local_dependence_heatmap(fit, draw = FALSE) dim(p$data$matrix) # Look for: |off-diagonal correlation| < 0.2 is the typical # acceptable regime; values >= 0.3 (Yen 1984 / Marais 2013 # guideline) flag pairs that may share dependence beyond the # main-effects MFRM. Inspect those cells in `diag$obs`.
Plot strict marginal-fit follow-up cells using base R
plot_marginal_fit( x, diagnostics = NULL, plot_type = c("std_residual", "prop_diff"), top_n = 20, facet = NULL, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )plot_marginal_fit( x, diagnostics = NULL, plot_type = c("std_residual", "prop_diff"), top_n = 20, facet = NULL, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )
x |
Output from |
diagnostics |
Optional output from |
plot_type |
|
top_n |
Maximum cells shown. |
facet |
Optional facet name used to keep only matching facet-level rows.
When |
main |
Optional custom plot title. |
palette |
Optional named color overrides. Recognized names:
|
label_angle |
X-axis label angle. |
preset |
Visual preset ( |
draw |
If |
This helper visualizes the largest first-order strict marginal-fit cells from
diagnose_mfrm(..., diagnostic_mode = "both") or
diagnostic_mode = "marginal_fit".
The "std_residual" view ranks cells by the absolute standardized residual
from posterior-integrated expected category counts. The "prop_diff" view
ranks the same cells by the signed observed-minus-expected proportion gap.
Use this plot after summary(diagnostics) indicates strict marginal flags.
The display is exploratory: it highlights which facet/category cells deserve
follow-up, but it is not a standalone inferential test.
A plotting-data object of class mfrm_plot_data.
Positive bars mean the observed category usage exceeded the posterior- expected marginal usage for that cell.
Negative bars mean the observed usage fell below the posterior-expected marginal usage.
Red bars indicate the current strict marginal warning rule was triggered by
|StdResidual| >= abs_z_warn.
Fit with fit_mfrm() using method = "MML" for RSM / PCM.
Run diagnose_mfrm() with diagnostic_mode = "both".
Use plot_marginal_fit() to inspect the largest strict marginal cells.
Follow up with rating_scale_table() or substantive design review.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
diagnose_mfrm(), rating_scale_table(), plot_marginal_pairwise(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") p <- plot_marginal_fit(diag, draw = FALSE, preset = "publication") p$data$preset if (interactive()) { plot_marginal_fit( diag, plot_type = "prop_diff", draw = TRUE, preset = "publication" ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") p <- plot_marginal_fit(diag, draw = FALSE, preset = "publication") p$data$preset if (interactive()) { plot_marginal_fit( diag, plot_type = "prop_diff", draw = TRUE, preset = "publication" ) }
Plot strict pairwise local-dependence follow-up using base R
plot_marginal_pairwise( x, diagnostics = NULL, metric = c("exact", "adjacent"), top_n = 20, facet = NULL, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )plot_marginal_pairwise( x, diagnostics = NULL, metric = c("exact", "adjacent"), top_n = 20, facet = NULL, main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )
x |
Output from |
diagnostics |
Optional output from |
metric |
|
top_n |
Maximum level pairs shown. |
facet |
Optional facet name used to keep only matching pairwise rows. |
main |
Optional custom plot title. |
palette |
Optional named color overrides. Recognized names: |
label_angle |
X-axis label angle. |
preset |
Visual preset ( |
draw |
If |
This helper visualizes the strict pairwise local-dependence follow-up derived from posterior-integrated expected exact and adjacent agreement.
The "exact" view ranks level pairs by the absolute exact-agreement
standardized residual. The "adjacent" view uses the adjacent-agreement
standardized residual instead. Both are exploratory corroboration screens for
strict marginal-fit flags.
A plotting-data object of class mfrm_plot_data.
Positive bars mean the observed agreement exceeded the posterior-expected agreement for that level pair.
Negative bars mean the observed agreement fell below the posterior-expected agreement.
Red bars indicate the pair exceeded the current strict-warning threshold.
Fit with fit_mfrm() using method = "MML" for RSM / PCM.
Run diagnose_mfrm() with diagnostic_mode = "both".
Use plot_marginal_pairwise() to inspect level pairs behind pairwise
local-dependence flags.
Corroborate with legacy diagnostics, design review, and substantive interpretation before making claims.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
diagnose_mfrm(), plot_marginal_fit(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") p <- plot_marginal_pairwise(diag, draw = FALSE, preset = "publication") p$data$preset if (interactive()) { plot_marginal_pairwise( diag, metric = "adjacent", draw = TRUE, preset = "publication" ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200 ) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "both") p <- plot_marginal_pairwise(diag, draw = FALSE, preset = "publication") p$data$preset if (interactive()) { plot_marginal_pairwise( diag, metric = "adjacent", draw = TRUE, preset = "publication" ) }
Plot a multi-metric simulation dashboard.
plot_mfrm_sim_dashboard( x, metrics = NULL, x_var = NULL, group_var = NULL, panel_var = NULL, facet = NULL, component = NULL, design_id = NULL, draw = TRUE, scales = c("free_y", "fixed"), ... )plot_mfrm_sim_dashboard( x, metrics = NULL, x_var = NULL, group_var = NULL, panel_var = NULL, facet = NULL, component = NULL, design_id = NULL, draw = TRUE, scales = c("free_y", "fixed"), ... )
x |
A simulation specification, grid summary, simulation evaluation object, or data frame. |
metrics |
Character vector of numeric metric columns to plot. If omitted, a conservative default set is chosen from |
x_var |
Design or data column used for the horizontal axis. Defaults to |
group_var |
Optional design or data column used for grouped lines. |
panel_var |
Optional design or data column used for panels in addition to metric panels. |
facet |
Optional facet filter when the source table has a |
component |
Optional component to plot. See |
design_id |
Optional design rows used when |
draw |
Whether to draw a base R multi-panel plot. |
scales |
Y-axis scaling. |
... |
Additional arguments passed to the base plotting call. |
This dashboard separates two planning stages. Before fitting, arbitrary design-grid summaries can plot workload and connectivity metrics such as MeanObsPerPerson, Observations, and MinPairCoverage. After repeated fitting, evaluation objects can plot empirical performance metrics such as MeanReliability, MeanSeparation, MeanSeverityRMSE, MeanUnderfitRate, or ConvergenceRate.
The returned mfrm_plot_data payload contains a long-form data frame with .metric, .value, .x, .group, and .panel columns so users can build their own ggplot2, lattice, or spreadsheet visualizations.
In practice, useful dashboards usually combine one metric from each decision category:
response burden/connectivity: Observations, MeanObsPerPerson, MinPairCoverage, CompletePairCoverageRate
measurement precision: MeanReliability, MeanSeparation, MeanStrata
recovery error: MeanSeverityRMSE, MeanSeverityBias
fit and flag direction: MeanUnderfitRate, MeanOverfitRate, MeanMnSqMisfitRate
computational stability: ConvergenceRate, MeanElapsedSec
Invisibly, an mfrm_plot_data object.
list_mfrm_sim_metrics(),
plot_mfrm_sim_grid(),
evaluate_mfrm_design()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) dash <- plot_mfrm_sim_dashboard( spec, metrics = c("MeanObsPerPerson", "MinPairCoverage"), x_var = "n_Rater", group_var = "n_Task", panel_var = "n_Criteria", draw = FALSE ) head(dash$data$data)spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) dash <- plot_mfrm_sim_dashboard( spec, metrics = c("MeanObsPerPerson", "MinPairCoverage"), x_var = "n_Rater", group_var = "n_Task", panel_var = "n_Criteria", draw = FALSE ) head(dash$data$data)
Plot arbitrary-facet simulation design diagnostics.
plot_mfrm_sim_design( x, type = c("load", "person_load", "pair_coverage"), facet = NULL, pair = NULL, draw = TRUE, ... )plot_mfrm_sim_design( x, type = c("load", "person_load", "pair_coverage"), facet = NULL, pair = NULL, draw = TRUE, ... )
x |
Output from |
type |
Plot payload type: facet-level load, person-level load, or pairwise coverage. |
facet |
Optional facet filter for |
pair |
Optional two-facet character vector for |
draw |
Whether to draw a base R plot. |
... |
Reserved for future extensions. |
Invisibly, an mfrm_plot_data object.
summarize_mfrm_sim_design(),
plot_mfrm_sim_grid()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 10, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2) ) plot_mfrm_sim_design(spec, draw = FALSE)spec <- build_mfrm_arbitrary_sim_spec( n_person = 10, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2) ) plot_mfrm_sim_design(spec, draw = FALSE)
Plot arbitrary-facet simulation design grid tradeoffs.
plot_mfrm_sim_grid( x, x_var = NULL, metric = c("Observations", "MeanObsPerPerson", "MinPairCoverage", "MeanPairCoverage", "CompletePairCoverageRate", "MinFacetPersonShare"), group_var = NULL, panel_var = NULL, design_id = NULL, draw = TRUE, ... )plot_mfrm_sim_grid( x, x_var = NULL, metric = c("Observations", "MeanObsPerPerson", "MinPairCoverage", "MeanPairCoverage", "CompletePairCoverageRate", "MinFacetPersonShare"), group_var = NULL, panel_var = NULL, design_id = NULL, draw = TRUE, ... )
x |
Output from |
x_var |
Design-grid column for the horizontal axis. Defaults to |
metric |
Grid metric to plot. |
group_var |
Optional design-grid column used to draw separate lines or point groups, such as |
panel_var |
Optional design-grid column used to split the returned payload, and the base plot when |
design_id |
Optional design rows to summarize when |
draw |
Whether to draw a base R plot. |
... |
Reserved for future extensions. |
Use this function for planning questions where one facet count changes together with other facet counts. For example, set x_var = "n_Rater", group_var = "n_Task", and panel_var = "n_Criteria" to inspect whether adding raters increases person workload, improves pair coverage, or simply creates a larger response burden under different task/criteria choices.
Invisibly, an mfrm_plot_data object. The payload contains the full grid summary and a plotting table with .x, .y, .group, and .panel columns for custom graphics.
summarize_mfrm_sim_grid(),
plot_mfrm_sim_dashboard(),
list_mfrm_sim_metrics(),
plot_mfrm_sim_design()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) plot_mfrm_sim_grid( spec, x_var = "n_Rater", metric = "Observations", group_var = "n_Task", panel_var = "n_Criteria", draw = FALSE )spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) plot_mfrm_sim_grid( spec, x_var = "n_Rater", metric = "Observations", group_var = "n_Task", panel_var = "n_Criteria", draw = FALSE )
Per-person diagnostic bubble plot inspired by FACETS Table 6 / KIDMAP
summaries. Each bubble represents one person at the intersection of
Infit (x) and Outfit (y), sized by total observations and coloured by
the active MnSq screening band: green when both Infit and Outfit fall
in [lower, upper], amber when one statistic is outside, red when both
are outside.
plot_person_fit( fit, diagnostics = NULL, lower = NULL, upper = NULL, top_n_label = 12L, preset = c("standard", "publication", "compact"), draw = TRUE )plot_person_fit( fit, diagnostics = NULL, lower = NULL, upper = NULL, top_n_label = 12L, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
lower |
Lower fit threshold. |
upper |
Upper fit threshold. |
top_n_label |
Maximum number of persons whose label is drawn
next to the bubble (largest |Infit-1| + |Outfit-1|). Default |
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data object whose data slot contains
columns Person, Infit, Outfit, N, Status, and
MisfitDirection. Status keeps the plot-colour contract
(in_band, one_outside, both_outside), while MisfitDirection
separates underfit (above the upper MnSq band), overfit (below the
lower band), mixed, and in_band.
The default band is the active package MnSq screening band returned by
mfrm_misfit_thresholds(). The package default is the broad 0.5-1.5
convention, but applied studies may use narrower or broader bands by
purpose and sample context. Persons in the green centre are inside the
current screening band; amber and red corners are candidates for misfit
review. Read p$data$data$MisfitDirection to distinguish underfit
(MnSq above the upper band), overfit (MnSq below the lower band), and
mixed high/low patterns before moving to unexpected_response_table() for
case-level follow-up.
diagnose_mfrm(), unexpected_response_table(),
build_misfit_casebook().
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_person_fit(fit, draw = FALSE) head(p$data$data) table(p$data$data$MisfitDirection, useNA = "ifany")toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_person_fit(fit, draw = FALSE) head(p$data$data) table(p$data$data$MisfitDirection, useNA = "ifany")
Plot a base-R QC dashboard
plot_qc_dashboard( fit, diagnostics = NULL, threshold_profile = "standard", thresholds = NULL, abs_z_min = 2, prob_max = 0.3, rater_facet = NULL, interrater_exact_warn = 0.5, interrater_corr_warn = 0.3, fixed_p_max = 0.05, random_p_max = 0.05, top_n = 20, draw = TRUE, preset = c("standard", "publication", "compact") )plot_qc_dashboard( fit, diagnostics = NULL, threshold_profile = "standard", thresholds = NULL, abs_z_min = 2, prob_max = 0.3, rater_facet = NULL, interrater_exact_warn = 0.5, interrater_corr_warn = 0.3, fixed_p_max = 0.05, random_p_max = 0.05, top_n = 20, draw = TRUE, preset = c("standard", "publication", "compact") )
fit |
Output from |
diagnostics |
Optional output from |
threshold_profile |
Threshold profile name ( |
thresholds |
Optional named threshold overrides. |
abs_z_min |
Absolute standardized-residual cutoff for unexpected panel. |
prob_max |
Maximum observed-category probability cutoff for unexpected panel. |
rater_facet |
Optional rater facet used in inter-rater panel. |
interrater_exact_warn |
Warning threshold for inter-rater exact agreement. |
interrater_corr_warn |
Warning threshold for inter-rater correlation. |
fixed_p_max |
Warning cutoff for fixed-effect facet chi-square p-values. |
random_p_max |
Warning cutoff for random-effect facet chi-square p-values. |
top_n |
Maximum elements displayed in displacement panel. |
draw |
If |
preset |
Visual preset ( |
The dashboard draws nine QC panels in a 33 grid:
| Panel | What it shows | Key reference lines |
| 1. Category counts | Observed (bars) vs model-expected counts (line) | -- |
| 2. Infit vs Outfit | Scatter of element MnSq values | active lower, 1.0, upper MnSq bands |
| 3. |ZSTD| histogram | Distribution of absolute standardised residuals | |ZSTD| = 2 |
| 4. Unexpected responses | Standardised residual vs |
abs_z_min, prob_max |
| 5. Fair-average gaps | Boxplots of (Observed - FairM) per facet | zero line |
| 6. Displacement | Top absolute displacement values | logits |
| 7. Inter-rater agreement | Exact agreement with expected overlay per pair | interrater_exact_warn |
| 8. Fixed chi-square | Fixed-effect per facet |
fixed_p_max |
| 9. Separation & Reliability | Bar chart of separation index per facet | -- |
threshold_profile controls warning overlays. Three built-in profiles
are available: "strict", "standard" (default), and "lenient".
Use thresholds to override any profile value with named entries.
For bounded GPCM, the dashboard reuses the residual-based diagnostics
stack and the slope-aware fair-average table carried by diagnose_mfrm().
Interpret that panel as a GPCM-specific screening view with the caveats
documented in fair_average_table(), not as Rasch-family fair-M invariance
evidence.
A plotting-data object of class mfrm_plot_data.
This function draws a fixed 33 panel grid (no plot_type
argument). For individual panel control, use the dedicated helpers:
plot_unexpected(), plot_fair_average(), plot_displacement(),
plot_interrater_agreement(), plot_facets_chisq().
Recommended panel order for fast review:
Category counts + Infit/Outfit (row 1): first-pass model screening. Category bars should roughly track the expected line; Infit/Outfit points are reviewed against the active MnSq band. Points above the upper band indicate underfit; points below the lower band indicate overfit.
Unexpected responses + Displacement (row 2): element-level outliers. Sparse points and small displacements are desirable.
Inter-rater + Chi-square (row 3): facet-level comparability. Read these as screening panels: higher agreement suggests stronger scoring consistency, and significant fixed chi-square indicates detectable facet spread under the current model.
Separation/Reliability (row 3): approximate screening precision. Higher separation indicates more statistically distinct strata under the current SE approximation.
Treat this dashboard as a screening layer; follow up with dedicated helpers
(plot_unexpected(), plot_displacement(), plot_interrater_agreement(),
plot_facets_chisq()) for detailed diagnosis.
Fit and diagnose model.
Run plot_qc_dashboard() for one-page triage.
Drill into flagged panels using dedicated functions.
plot_unexpected(), plot_fair_average(), plot_displacement(), plot_interrater_agreement(), plot_facets_chisq(), build_visual_summaries()
# Fast smoke run: build the payload only (no graphics device). toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) qc_quick <- plot_qc_dashboard(fit_quick, draw = FALSE) nrow(qc_quick$data$panels) fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) qc <- plot_qc_dashboard(fit, draw = FALSE) qc$data$panels$Status # Look for: a row whose `Status` is "OK" for each panel that # the run should support. "WARN" / "REVIEW" rows tell you which # downstream helper to run next (e.g. `plot_unexpected()`, # `plot_residual_pca()`); the dashboard is a triage screen, not # a publication figure on its own. if (interactive()) { plot_qc_dashboard(fit, rater_facet = "Rater") }# Fast smoke run: build the payload only (no graphics device). toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) qc_quick <- plot_qc_dashboard(fit_quick, draw = FALSE) nrow(qc_quick$data$panels) fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) qc <- plot_qc_dashboard(fit, draw = FALSE) qc$data$panels$Status # Look for: a row whose `Status` is "OK" for each panel that # the run should support. "WARN" / "REVIEW" rows tell you which # downstream helper to run next (e.g. `plot_unexpected()`, # `plot_residual_pca()`); the dashboard is a triage screen, not # a publication figure on its own. if (interactive()) { plot_qc_dashboard(fit, rater_facet = "Rater") }
Visualizes the output from run_qc_pipeline() as either a traffic-light
bar chart or a detail panel showing values versus thresholds.
plot_qc_pipeline(x, type = c("traffic_light", "detail"), draw = TRUE, ...)plot_qc_pipeline(x, type = c("traffic_light", "detail"), draw = TRUE, ...)
x |
Output from |
type |
Plot type: |
draw |
If |
... |
Additional graphical parameters passed to plotting functions. |
Two plot types are provided for visual triage of QC results:
"traffic_light" (default): A horizontal bar chart with one row
per QC check. Bars are coloured green (Pass), amber (Warn), or red
(Fail). Provides an at-a-glance summary of the current QC review state.
"detail": A panel showing each check's observed value and its
pass/warn/fail thresholds. Useful for understanding how close a
borderline result is to the next verdict level.
Invisible verdicts tibble from the QC pipeline.
The pipeline evaluates up to 10 checks (depending on available diagnostics):
Convergence: did the optimizer converge?
Overall Infit: global information-weighted mean-square
Overall Outfit: global unweighted mean-square
Misfit rate: proportion of elements with
Category usage: minimum observations per score category
Disordered steps: whether threshold estimates are monotonic
Separation (per facet): element discrimination adequacy
Residual PCA eigenvalue: first-component eigenvalue (if computed)
Displacement: maximum absolute displacement across elements
Inter-rater agreement: minimum pairwise exact agreement
Green (Pass): the check meets the current threshold-profile criteria.
Amber (Warn): borderline—monitor but not necessarily disqualifying. Review the detail panel to see how close the value is to the fail threshold.
Red (Fail): requires investigation before strong operational or interpretive claims are made from the current run. Common remedies include collapsing categories (for disordered steps), removing outlier raters (for misfit), or increasing sample size (for low separation).
The detail view shows numeric values, making it easy to communicate exact results to stakeholders.
run_qc_pipeline(), plot_qc_dashboard(),
build_visual_summaries(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("study1") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) qc <- run_qc_pipeline(fit) plot_qc_pipeline(qc, draw = FALSE)toy <- load_mfrmr_data("study1") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) qc <- run_qc_pipeline(fit) plot_qc_pipeline(qc, draw = FALSE)
Summarizes inter-rater agreement as a symmetric rater x rater
heatmap. Cells are coloured by the chosen agreement metric: exact
agreement proportion by default, or the Pearson-style Corr column
from interrater_agreement_table() when metric = "correlation".
The plot is a compact alternative to plot_interrater_agreement()'s
bar chart when the rater count exceeds ~6 pairs.
plot_rater_agreement_heatmap( fit, diagnostics = NULL, rater_facet = "Rater", metric = c("exact", "correlation"), preset = c("standard", "publication", "compact"), draw = TRUE )plot_rater_agreement_heatmap( fit, diagnostics = NULL, rater_facet = "Rater", metric = c("exact", "correlation"), preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
rater_facet |
Name of the rater facet (default |
metric |
Column to colour by: |
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data object whose data slot bundles the
rater x rater matrix and the raw pairwise rows.
interrater_agreement_table() for the underlying numeric
table; plot_guttman_scalogram() for a complementary
person-by-element view of residual structure;
diagnose_mfrm() for the diagnostics bundle the heatmap
reads from.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_rater_agreement_heatmap(fit, draw = FALSE) dim(p$data$matrix) # Look for (default `metric = "exact"`): # - Off-diagonal cells close to the corresponding entry of # `summary(diag)$interrater$ExactAgreement` indicate consistent # pair behaviour; cells well below the average mark a pair # that disagrees more than the rest. # - With `metric = "correlation"` the colour scale switches to # `[-1, 1]`; positive cells = pairs agree on relative ordering, # negative cells = pairs systematically rank persons in opposite # directions and are the highest-priority review cases.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_rater_agreement_heatmap(fit, draw = FALSE) dim(p$data$matrix) # Look for (default `metric = "exact"`): # - Off-diagonal cells close to the corresponding entry of # `summary(diag)$interrater$ExactAgreement` indicate consistent # pair behaviour; cells well below the average mark a pair # that disagrees more than the rest. # - With `metric = "correlation"` the colour scale switches to # `[-1, 1]`; positive cells = pairs agree on relative ordering, # negative cells = pairs systematically rank persons in opposite # directions and are the highest-priority review cases.
Ranks the levels of a chosen rater facet by estimated severity and
draws each level as a horizontal CI whisker around the point
estimate. Optional gentle / strict guidance bands at +/-0.5 and
+/-1.0 logit relative to the centred mean make rater calibration
easy to read for training feedback.
plot_rater_severity_profile( fit, diagnostics = NULL, facet = "Rater", ci_level = 0.95, show_bands = TRUE, preset = c("standard", "publication", "compact"), draw = TRUE )plot_rater_severity_profile( fit, diagnostics = NULL, facet = "Rater", ci_level = 0.95, show_bands = TRUE, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
facet |
Facet name to plot (default |
ci_level |
Confidence level used for the whiskers (default
|
show_bands |
Logical. When |
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data object whose data slot contains
columns Level, Estimate, SE, CI_Lower, CI_Upper,
Band.
The vertical reference line at zero is the sum-to-zero centring
point. Levels well within +/- 0.5 logit (gentle band) are
typically interchangeable in operational scoring; levels outside
+/- 1.0 logit (strict band) deserve targeted training or
anchoring.
diagnose_mfrm(), analyze_facet_equivalence(),
plot_facet_equivalence().
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_rater_severity_profile(fit, draw = FALSE) head(p$data$data)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_rater_severity_profile(fit, draw = FALSE) head(p$data$data)
Plots each rater's severity estimate across a user-supplied
ordering variable (e.g. Session, Wave, AdminDate), producing
one line per rater. When the ordering column is time-like (numeric
or date), the x-axis is drawn on that scale; otherwise the values
are rendered as discrete ordered categories. Useful for rater
training / drift feedback loops.
plot_rater_trajectory( fits, facet = "Rater", ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE )plot_rater_trajectory( fits, facet = "Rater", ci_level = 0.95, preset = c("standard", "publication", "compact"), draw = TRUE )
fits |
A named list of |
facet |
Facet whose levels are tracked (default |
ci_level |
Confidence level for the per-wave CI ribbons
drawn around each trajectory (default |
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data object whose data slot is a long
data.frame with Wave, Level, Estimate, SE, CI_Lower,
CI_Upper columns.
Each wave is fit independently under its own sum-to-zero
identification, so the per-wave severity logits live on separate
scales unless you actively link them. Before interpreting movement
across waves as rater drift, link the waves by either (i) holding
common anchors fixed across fits (see
mfrmr_linking_and_dff for the supported linking route), or
(ii) harmonizing the scale post-hoc with a Stocking-Lord type
transformation and reviewing the result via plot_anchor_drift().
The trajectory plot itself does not perform linking; it only
visualizes the supplied fits on their as-fit scales.
plot_anchor_drift(), mfrmr_linking_and_dff
toy <- load_mfrmr_data("example_core") fit_a <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) fit_b <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_rater_trajectory(list(T1 = fit_a, T2 = fit_b), draw = FALSE) head(p$data$data) # Look for: stable trajectories (small wave-to-wave shifts within # each rater's CI ribbon) once the waves are anchor-linked. A # rater whose line drifts >0.5 logits across waves is the typical # "calibration drift" signal. Without anchor linking the per-wave # logits are on different scales and the picture cannot be read # as drift; see the Anchor-linking caveat in the docstring.toy <- load_mfrmr_data("example_core") fit_a <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) fit_b <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_rater_trajectory(list(T1 = fit_a, T2 = fit_b), draw = FALSE) head(p$data$data) # Look for: stable trajectories (small wave-to-wave shifts within # each rater's CI ribbon) once the waves are anchor-linked. A # rater whose line drifts >0.5 logits across waves is the typical # "calibration drift" signal. Without anchor linking the per-wave # logits are on different scales and the picture cannot be read # as drift; see the Anchor-linking caveat in the docstring.
Compact facet-level visual of the Wright & Masters (1982)
separation, strata, and reliability indices that
diagnose_mfrm() computes. Helpful as a single small figure for
"are persons / raters / criteria distinguishable?" review.
These are Rasch/FACETS-style separation indices on the fitted logit
scale, not ICCs; use compute_facet_icc() for the complementary
observed-score variance-share view.
plot_reliability_snapshot( fit, diagnostics = NULL, metric = c("reliability", "separation", "strata"), preset = c("standard", "publication", "compact"), draw = TRUE )plot_reliability_snapshot( fit, diagnostics = NULL, metric = c("reliability", "separation", "strata"), preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
metric |
|
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data whose data slot bundles a tidy
Facet, Metric, Value data frame.
diagnose_mfrm(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_reliability_snapshot(fit, draw = FALSE) p$data$table # Look for (default `metric = "reliability"`): # - >= 0.9 strong, 0.7-0.9 adequate, < 0.7 weak (Wright & Masters 1982). # - The Person row is the operative reliability for ability scores. # - Non-Person rows (Rater / Criterion) report the same index but # should be read as "are facet elements distinguishable?"; values # close to 1 mean facet means differ reliably from each other.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_reliability_snapshot(fit, draw = FALSE) p$data$table # Look for (default `metric = "reliability"`): # - >= 0.9 strong, 0.7-0.9 adequate, < 0.7 weak (Wright & Masters 1982). # - The Person row is the operative reliability for ability scores. # - Non-Person rows (Rater / Criterion) report the same index but # should be read as "are facet elements distinguishable?"; values # close to 1 mean facet means differ reliably from each other.
Plot residual dimensionality parallel-analysis output
plot_residual_dimensionality( x, mode = c("overall", "facet"), facet = NULL, components = NULL, draw = TRUE, preset = c("standard", "publication", "compact") )plot_residual_dimensionality( x, mode = c("overall", "facet"), facet = NULL, components = NULL, draw = TRUE, preset = c("standard", "publication", "compact") )
x |
Output from |
mode |
|
facet |
Facet to plot when |
components |
Optional component indices to display. |
draw |
If |
preset |
Visual preset ( |
An mfrm_plot_data object. The payload contains the comparison
table used for plotting and can be inspected or saved.
check_residual_dimensionality(), plot_residual_pca()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 20) dim_check <- check_residual_dimensionality(fit, mode = "overall", method = "parametric", reps = 5) plot_residual_dimensionality(dim_check, draw = FALSE)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 20) dim_check <- check_residual_dimensionality(fit, mode = "overall", method = "parametric", reps = 5) plot_residual_dimensionality(dim_check, draw = FALSE)
Visualizes the person x element matrix of standardized residuals
from diagnose_mfrm() as a heatmap. Complements
plot_guttman_scalogram() (which shows raw responses) by exposing
the residual structure directly: large positive cells show
under-prediction, negative cells over-prediction.
plot_residual_matrix( fit, diagnostics = NULL, facet = "Rater", top_n_persons = 40L, preset = c("standard", "publication", "compact"), draw = TRUE )plot_residual_matrix( fit, diagnostics = NULL, facet = "Rater", top_n_persons = 40L, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
facet |
Facet whose levels become the column axis (default
|
top_n_persons |
Cap on the number of rows. Defaults to 40 to keep the figure legible; persons are kept by largest absolute residual mean. |
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data whose data slot bundles the residual
matrix (rows = Person, columns = facet level) and the long-form
obs table.
plot_guttman_scalogram(), plot_unexpected(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_residual_matrix(fit, top_n_persons = 12, draw = FALSE) dim(p$data$matrix) # Look for: cell values within ~|2| are routine; |residual| > 2 is # misfit at the 5% level and |residual| > 3 at the 1% level # (Wright & Linacre 1994). Persons with multiple high-magnitude # cells across the same facet level point at scoring drift.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_residual_matrix(fit, top_n_persons = 12, draw = FALSE) dim(p$data$matrix) # Look for: cell values within ~|2| are routine; |residual| > 2 is # misfit at the 5% level and |residual| > 3 at the 1% level # (Wright & Linacre 1994). Persons with multiple high-magnitude # cells across the same facet level point at scoring drift.
Visualize residual PCA results
plot_residual_pca( x, mode = c("overall", "facet"), facet = NULL, plot_type = c("scree", "loadings"), component = 1L, top_n = 20L, preset = c("standard", "publication", "compact"), draw = TRUE )plot_residual_pca( x, mode = c("overall", "facet"), facet = NULL, plot_type = c("scree", "loadings"), component = 1L, top_n = 20L, preset = c("standard", "publication", "compact"), draw = TRUE )
x |
Output from |
mode |
|
facet |
Facet name for |
plot_type |
|
component |
Component index for loadings plot. |
top_n |
Maximum number of variables shown in loadings plot. |
preset |
Visual preset ( |
draw |
If |
x can be either:
output of analyze_residual_pca(), or
a diagnostics object from diagnose_mfrm() (PCA is computed internally), or
a fitted object from fit_mfrm() (diagnostics and PCA are computed internally).
Plot types:
"scree": component vs eigenvalue line plot
"loadings": horizontal bar chart of top absolute loadings
For mode = "facet" and facet = NULL, the first available facet is used.
A named list of plotting data (class mfrm_plot_data) with:
plot: "scree" or "loadings"
mode: "overall" or "facet"
facet: facet name (or NULL)
title: plot title text
data: underlying table used for plotting
plot_type = "scree": look for dominant early components relative
to later components and the unit-eigenvalue reference line. Treat
this as exploratory residual-structure screening, not a standalone
unidimensionality test. Use plot_residual_dimensionality() when
the plot should include a simulated parallel-analysis threshold.
plot_type = "loadings": identifies variables/elements driving each
component; inspect both sign and absolute magnitude.
Facet mode (mode = "facet") helps localize residual structure to a
specific facet after global PCA review.
Run diagnose_mfrm() with residual_pca = "overall" or "both".
Build PCA object via analyze_residual_pca() (or pass diagnostics directly).
Use scree plot first, then loadings plot for targeted interpretation.
analyze_residual_pca(), diagnose_mfrm(),
check_residual_dimensionality(), plot_residual_dimensionality()
toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:24] toy <- toy_full[match(toy_full$Person, toy_people, nomatch = 0L) > 0L, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) ) diag <- diagnose_mfrm(fit, residual_pca = "overall") pca <- analyze_residual_pca(diag, mode = "overall") plt <- plot_residual_pca(pca, mode = "overall", plot_type = "scree", draw = FALSE) head(plt$data) plt_load <- plot_residual_pca( pca, mode = "overall", plot_type = "loadings", component = 1, draw = FALSE ) head(plt_load$data) if (interactive()) { plot_residual_pca(pca, mode = "overall", plot_type = "scree", preset = "publication") }toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:24] toy <- toy_full[match(toy_full$Person, toy_people, nomatch = 0L) > 0L, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) ) diag <- diagnose_mfrm(fit, residual_pca = "overall") pca <- analyze_residual_pca(diag, mode = "overall") plt <- plot_residual_pca(pca, mode = "overall", plot_type = "scree", draw = FALSE) head(plt$data) plt_load <- plot_residual_pca( pca, mode = "overall", plot_type = "loadings", component = 1, draw = FALSE ) head(plt_load$data) if (interactive()) { plot_residual_pca(pca, mode = "overall", plot_type = "scree", preset = "publication") }
Produces a Q-Q plot of per-person standardized residuals. Under the fitted Rasch-family model the residuals are approximately N(0, 1), so deviations from the reference line diagnose distributional misfit that mean-square summaries may miss.
plot_residual_qq( fit, diagnostics = NULL, preset = c("standard", "publication", "compact"), draw = TRUE )plot_residual_qq( fit, diagnostics = NULL, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
diagnostics |
Optional |
preset |
Visual preset. |
draw |
If |
An mfrm_plot_data object with a data slot containing
Person, Theoretical, Sample columns.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_residual_qq(fit, draw = FALSE) head(p$data$data) # Look for: points hugging the y = x reference line. Heavy upper- # right tails indicate persons whose residual aggregates exceed # the standard normal expectation; pair with `plot_unexpected()` # for case-level follow-up. This is an exploratory screen; do # not treat tail behaviour as a definitive normality test.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_residual_qq(fit, draw = FALSE) head(p$data$data) # Look for: points hugging the y = x reference line. Heavy upper- # right tails indicate persons whose residual aggregates exceed # the standard normal expectation; pair with `plot_unexpected()` # for case-level follow-up. This is an exploratory screen; do # not treat tail behaviour as a definitive normality test.
Visualizes empirical-Bayes shrinkage by drawing one row per facet level with the raw (pre-shrinkage) and shrunken estimates plus the shrinkage factor. Rows are ordered by absolute shrinkage so the levels that move most under the prior appear at the top.
plot_shrinkage_funnel( fit, facet = NULL, top_n = 30L, preset = c("standard", "publication", "compact"), draw = TRUE )plot_shrinkage_funnel( fit, facet = NULL, top_n = 30L, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
facet |
Facet to draw (default: first non-person facet with shrinkage columns present). |
top_n |
Maximum number of rows to draw (default 30). |
preset |
Visual preset. |
draw |
If |
Requires a fit produced via apply_empirical_bayes_shrinkage() or
a fit_mfrm(..., facet_shrinkage = "empirical_bayes") run, so that
fit$facets$others carries Estimate, ShrunkEstimate, and
ShrinkageFactor columns.
An mfrm_plot_data whose data slot bundles the long
Level, RawEstimate, ShrunkEstimate, ShrinkageFactor
table.
apply_empirical_bayes_shrinkage(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) fit_eb <- apply_empirical_bayes_shrinkage(fit) p <- plot_shrinkage_funnel(fit_eb, draw = FALSE) head(p$data$table) # Look for: short segments (Raw and Shrunken close together) = # little pooling. Long segments fanning toward the centre = the # prior pulled the estimate strongly; this is most pronounced for # small-N levels. ShrinkageFactor near 1 means most of the # movement was driven by the prior rather than the data.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) fit_eb <- apply_empirical_bayes_shrinkage(fit) p <- plot_shrinkage_funnel(fit_eb, draw = FALSE) head(p$data$table) # Look for: short segments (Raw and Shrunken close together) = # little pooling. Long segments fanning toward the centre = the # prior pulled the estimate strongly; this is most pronounced for # small-N levels. ShrinkageFactor near 1 means most of the # movement was driven by the prior rather than the data.
Plot directional misfit rates from summarize_simulation_misfit(). With
draw = FALSE, the function returns a tidy plotting payload for custom
graphics.
plot_simulation_misfit_rates( x, facet = NULL, x_var = NULL, group_var = NULL, directions = c("underfit", "overfit", "mixed"), draw = TRUE )plot_simulation_misfit_rates( x, facet = NULL, x_var = NULL, group_var = NULL, directions = c("underfit", "overfit", "mixed"), draw = TRUE )
x |
Output from |
facet |
Optional facet filter. |
x_var |
Design variable for the x-axis. When |
group_var |
Optional design variable used in labels. |
directions |
Direction rates to include. Use |
draw |
If |
An mfrm_plot_data object.
summarize_simulation_misfit(), evaluate_mfrm_design(),
plot_fit_direction_summary()
sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(20, 30), n_rater = 3, n_criterion = 2, raters_per_person = 2, reps = 1, maxit = 10, seed = 42 )) plot_simulation_misfit_rates(sim_eval, draw = FALSE)sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(20, 30), n_rater = 3, n_criterion = 2, raters_per_person = 2, reps = 1, maxit = 10, seed = 42 )) plot_simulation_misfit_rates(sim_eval, draw = FALSE)
Computes the expected total and mean score over the observed many-facet
design while varying the person measure theta. Unlike
plot_expected_score_curve(), this summarizes the realized design as a
whole rather than one threshold/criterion curve group.
plot_test_characteristic_curve( fit, theta_range = c(-6, 6), theta_points = 201L, draw = TRUE )plot_test_characteristic_curve( fit, theta_range = c(-6, 6), theta_points = 201L, draw = TRUE )
fit |
An |
theta_range |
Numeric length-2 theta range. |
theta_points |
Number of theta grid points. |
draw |
If |
For each observed design cell d, the helper computes
from the fitted rating-scale, partial-credit, or
bounded GPCM structure, then aggregates with the observed exposure weight
:
ExpectedMeanScore divides this total by .
An mfrm_plot_data object with a tcc data frame.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tcc <- plot_test_characteristic_curve(fit, draw = FALSE) head(tcc$data$tcc)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tcc <- plot_test_characteristic_curve(fit, draw = FALSE) head(tcc$data$tcc)
Renders the Rasch-Andrich threshold structure as a vertical ladder per
step-facet level. Each tick is a tau_k; lines connecting adjacent
thresholds are coloured to make disordered crossings (tau_{k+1} < tau_k) visually obvious. For RSM there is one ladder; for PCM (and
bounded GPCM) there is one ladder per step_facet level.
plot_threshold_ladder( fit, highlight_disorder = TRUE, preset = c("standard", "publication", "compact"), draw = TRUE )plot_threshold_ladder( fit, highlight_disorder = TRUE, preset = c("standard", "publication", "compact"), draw = TRUE )
fit |
An |
highlight_disorder |
Logical. When |
preset |
Visual preset ( |
draw |
If |
An mfrm_plot_data object with a data slot containing
columns Group, Step, Threshold, Disordered for each ladder
row.
Within each ladder, thresholds should ascend monotonically. A disordered crossing (highlighted in the fail colour) suggests that the corresponding category is rarely the most likely response over any logit interval, and is a common trigger for category-collapsing decisions.
category_structure_report(), category_curves_report(),
plot.mfrm_fit() (type = "ccc").
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_threshold_ladder(fit, draw = FALSE) head(p$data$data)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_threshold_ladder(fit, draw = FALSE) head(p$data$data)
Plot unexpected responses using base R
plot_unexpected( x, diagnostics = NULL, abs_z_min = 2, prob_max = 0.3, top_n = 100, rule = c("either", "both"), plot_type = c("scatter", "severity"), main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )plot_unexpected( x, diagnostics = NULL, abs_z_min = 2, prob_max = 0.3, top_n = 100, rule = c("either", "both"), plot_type = c("scatter", "severity"), main = NULL, palette = NULL, label_angle = 45, preset = c("standard", "publication", "compact"), draw = TRUE )
x |
Output from |
diagnostics |
Optional output from |
abs_z_min |
Absolute standardized-residual cutoff. |
prob_max |
Maximum observed-category probability cutoff. |
top_n |
Maximum rows used from the unexpected table. |
rule |
Flagging rule ( |
plot_type |
|
main |
Optional custom plot title. |
palette |
Optional named color overrides ( |
label_angle |
X-axis label angle for |
preset |
Visual preset ( |
draw |
If |
This helper visualizes flagged observations from unexpected_response_table().
An observation is "unexpected" when its standardised residual and/or
observed-category probability exceed user-specified cutoffs.
The severity index is a composite ranking metric that combines the
absolute standardised residual and the negative log
probability . Higher severity
indicates responses that are more surprising under the fitted model.
The rule parameter controls flagging logic:
"either": flag if abs_z_min or
prob_max.
"both": flag only if both conditions hold simultaneously.
Under common thresholds, many well-behaved runs will produce relatively few flagged observations, but the flagged proportion is design- and model-dependent. Treat the output as a screening display rather than a calibrated goodness-of-fit test.
A plotting-data object of class mfrm_plot_data.
"scatter" (default)X-axis: standardized residual .
Y-axis: (negative log of
observed-category probability; higher = more surprising).
Points colored orange when the observed score is higher than
expected, teal when lower. Dashed lines mark abs_z_min and
prob_max thresholds. Clusters of points in the upper corners
indicate systematic misfit patterns worth investigating.
"severity"Ranked bar chart of the composite severity index
for the top_n most unexpected responses. Bar length reflects
the combined unexpectedness; labels identify the specific
person-facet combination. Use for QC triage and case-level
prioritization.
Scatter plot: farther from zero on x-axis = larger residual mismatch; higher y-axis = lower observed-category probability. A uniform scatter with few points beyond the threshold lines indicates fewer locally surprising responses under the current thresholds.
Severity plot: focuses on the most extreme observations for targeted case review. Look for recurring persons or facet levels among the top entries—repeated appearances may signal rater misuse, scoring errors, or model misspecification.
Fit model and run diagnose_mfrm().
Start with "scatter" to assess global unexpected pattern.
Switch to "severity" for case prioritization.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
unexpected_response_table(), plot_fair_average(), plot_displacement(),
plot_qc_dashboard(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_unexpected(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 10, draw = FALSE) if (interactive()) { plot_unexpected( fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 10, plot_type = "severity", preset = "publication", main = "Unexpected Response Severity (Customized)", palette = c(higher = "#d95f02", lower = "#1b9e77", bar = "#2b8cbe"), label_angle = 45 ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) p <- plot_unexpected(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 10, draw = FALSE) if (interactive()) { plot_unexpected( fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 10, plot_type = "severity", preset = "publication", main = "Unexpected Response Severity (Customized)", palette = c(higher = "#d95f02", lower = "#1b9e77", bar = "#2b8cbe"), label_angle = 45 ) }
Produces a shared-logit variable map showing person ability distribution alongside measure estimates for every facet in side-by-side columns on the same scale.
plot_wright_unified( fit, diagnostics = NULL, bins = 20L, show_thresholds = TRUE, top_n = 30L, show_ci = FALSE, ci_level = 0.95, draw = TRUE, preset = c("standard", "publication", "compact"), palette = NULL, label_angle = 45, ... )plot_wright_unified( fit, diagnostics = NULL, bins = 20L, show_thresholds = TRUE, top_n = 30L, show_ci = FALSE, ci_level = 0.95, draw = TRUE, preset = c("standard", "publication", "compact"), palette = NULL, label_angle = 45, ... )
fit |
Output from |
diagnostics |
Optional output from |
bins |
Integer number of bins for the person histogram. Default |
show_thresholds |
Logical; if |
top_n |
Maximum number of facet/step points retained for labeling. |
show_ci |
Logical; if |
ci_level |
Confidence level used when |
draw |
If |
preset |
Visual preset ( |
palette |
Optional named color overrides passed to the shared Wright-map drawer. |
label_angle |
Rotation angle for group labels on the facet panel. |
... |
Additional graphical parameters. |
This unified map arranges:
Column 1: Person measure distribution (horizontal histogram)
Shared facet/step panel: facet levels and optional threshold positions on the same vertical logit axis
Range and interquartile overlays for each facet group to show spread
This is the package's most compact targeting view when you want one display that shows where persons, facet levels, and category thresholds sit relative to the same latent scale.
The logit scale on the y-axis is shared, allowing direct visual comparison of all facets and persons.
Invisibly, a list with persons, facets, and thresholds
data used for the plot.
Facet levels at the same height on the map are at similar difficulty.
The person histogram shows where examinees cluster relative to the facet scale.
Thresholds (if shown) indicate category boundary positions.
Large gaps between the person distribution and facet locations can signal targeting problems.
Fit a model with fit_mfrm().
Plot with plot_wright_unified(fit).
Compare person distribution with facet level locations.
Use show_thresholds = TRUE when you want the category structure in the
same view.
Use plot_wright_unified() when your main question is targeting or coverage
on the shared logit scale. Use plot_information() when your main question
is measurement precision across theta.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
fit_mfrm(), plot.mfrm_fit(), mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] fit <- fit_mfrm(toy_small, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 10) map_data <- plot_wright_unified(fit, draw = FALSE) names(map_data)toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] fit <- fit_mfrm(toy_small, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 10) map_data <- plot_wright_unified(fit, draw = FALSE) names(map_data)
Plot an APA/FACETS table object using base R
## S3 method for class 'apa_table' plot( x, y = NULL, type = c("numeric_profile", "first_numeric"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )## S3 method for class 'apa_table' plot( x, y = NULL, type = c("numeric_profile", "first_numeric"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )
x |
Output from |
y |
Reserved for generic compatibility. |
type |
Plot type: |
main |
Optional title override. |
palette |
Optional named color overrides. |
label_angle |
Axis-label rotation angle for bar-type plots. |
draw |
If |
... |
Reserved for generic compatibility. |
Quick visualization helper for numeric columns in apa_table() output.
It is intended for table QA and exploratory checks, not final publication
graphics.
A plotting-data object of class mfrm_plot_data.
"numeric_profile": compares column means to spot scale/centering mismatches.
"first_numeric": checks distribution shape of the first numeric column.
Build table with apa_table().
Run summary(tbl) for metadata.
Use plot(tbl, type = "numeric_profile") for quick numeric QC.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tbl <- apa_table(fit, which = "summary") p <- plot(tbl, draw = FALSE) p2 <- plot(tbl, type = "first_numeric", draw = FALSE) if (interactive()) { plot( tbl, type = "numeric_profile", main = "APA Numeric Profile (Customized)", palette = c(numeric_profile = "#2b8cbe", grid = "#d9d9d9"), label_angle = 45 ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tbl <- apa_table(fit, which = "summary") p <- plot(tbl, draw = FALSE) p2 <- plot(tbl, type = "first_numeric", draw = FALSE) if (interactive()) { plot( tbl, type = "numeric_profile", main = "APA Numeric Profile (Customized)", palette = c(numeric_profile = "#2b8cbe", grid = "#d9d9d9"), label_angle = 45 ) }
Plot an anchor-audit object
## S3 method for class 'mfrm_anchor_audit' plot( x, y = NULL, type = c("issue_counts", "facet_constraints", "level_observations"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )## S3 method for class 'mfrm_anchor_audit' plot( x, y = NULL, type = c("issue_counts", "facet_constraints", "level_observations"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )
x |
Output from |
y |
Reserved for generic compatibility. |
type |
Plot type: |
main |
Optional title override. |
palette |
Optional named colors. |
label_angle |
X-axis label angle for bar plots. |
draw |
If |
... |
Reserved for generic compatibility. |
Base-R visualization helper for anchor audit outputs.
A plotting-data object of class mfrm_plot_data.
"issue_counts": volume of each issue class.
"facet_constraints": anchored/grouped/free mix by facet.
"level_observations": observation support across levels.
Run audit_mfrm_anchors().
Start with plot(aud, type = "issue_counts").
Inspect constraint and support plots before fitting.
audit_mfrm_anchors(), make_anchor_table()
toy <- load_mfrmr_data("example_core") aud <- audit_mfrm_anchors(toy, "Person", c("Rater", "Criterion"), "Score") p <- plot(aud, draw = FALSE)toy <- load_mfrmr_data("example_core") aud <- audit_mfrm_anchors(toy, "Person", c("Rater", "Criterion"), "Score") p <- plot(aud, draw = FALSE)
Plot report/table bundles with base R defaults
## S3 method for class 'mfrm_bundle' plot(x, y = NULL, type = NULL, ...)## S3 method for class 'mfrm_bundle' plot(x, y = NULL, type = NULL, ...)
x |
A bundle object returned by mfrmr table/report helpers. |
y |
Reserved for generic compatibility. |
type |
Optional plot type. Available values depend on bundle class. |
... |
Additional arguments forwarded to class-specific plotters. |
plot() dispatches by bundle class:
mfrm_unexpected -> plot_unexpected()
mfrm_fair_average -> plot_fair_average()
mfrm_displacement -> plot_displacement()
mfrm_interrater -> plot_interrater_agreement()
mfrm_facets_chisq -> plot_facets_chisq()
mfrm_bias_interaction -> plot_bias_interaction()
mfrm_bias_count -> bias-count plots (cell counts / low-count rates)
mfrm_fixed_reports -> pairwise-contrast diagnostics
mfrm_visual_summaries -> warning/summary message count plots
mfrm_category_structure -> default base-R category plots
mfrm_category_curves -> default ogive/CCC plots
mfrm_rating_scale -> category-counts/threshold plots
mfrm_measurable -> measurable-data coverage/count plots
mfrm_unexpected_after_bias -> post-bias unexpected-response plots
mfrm_output_bundle -> graph/score output-file diagnostics
mfrm_residual_pca -> residual PCA scree/loadings via plot_residual_pca()
mfrm_specifications -> facet/anchor/convergence plots
mfrm_data_quality -> row-audit/category/missing-row plots
mfrm_iteration_report -> replayed-iteration trajectories
mfrm_subset_connectivity -> subset-observation/connectivity plots
mfrm_facet_statistics -> facet statistic profile plots
mfrm_export_bundle / mfrm_summary_appendix_export -> export handoff
plots (formats, artifact_groups, selection_tables,
selection_handoff, selection_handoff_bundles,
selection_handoff_roles,
selection_handoff_role_sections, selection_bundles,
selection_roles, selection_sections)
If a class is outside these families, use dedicated plotting helpers or custom base R graphics on component tables.
A plotting-data object of class mfrm_plot_data.
The returned object is plotting data (mfrm_plot_data) that captures
the selected route and payload; set draw = TRUE for immediate base graphics.
Create bundle output (e.g., unexpected_response_table()).
Inspect routing with summary(bundle) if needed.
Call plot(bundle, type = ..., draw = FALSE) to obtain reusable plot data.
summary(), plot_unexpected(), plot_fair_average(), plot_displacement()
toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) t4 <- unexpected_response_table(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 5) p <- plot(t4, draw = FALSE) vis <- build_visual_summaries(fit, diagnose_mfrm(fit, residual_pca = "none")) p_vis <- plot(vis, type = "comparison", draw = FALSE) spec <- specifications_report(fit) p_spec <- plot(spec, type = "facet_elements", draw = FALSE) if (interactive()) { plot( t4, type = "severity", draw = TRUE, main = "Unexpected Response Severity (Customized)", palette = c(higher = "#d95f02", lower = "#1b9e77", bar = "#2b8cbe"), label_angle = 45 ) plot( vis, type = "comparison", draw = TRUE, main = "Warning vs Summary Counts (Customized)", palette = c(warning = "#cb181d", summary = "#3182bd"), label_angle = 45 ) }toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) t4 <- unexpected_response_table(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 5) p <- plot(t4, draw = FALSE) vis <- build_visual_summaries(fit, diagnose_mfrm(fit, residual_pca = "none")) p_vis <- plot(vis, type = "comparison", draw = FALSE) spec <- specifications_report(fit) p_spec <- plot(spec, type = "facet_elements", draw = FALSE) if (interactive()) { plot( t4, type = "severity", draw = TRUE, main = "Unexpected Response Severity (Customized)", palette = c(higher = "#d95f02", lower = "#1b9e77", bar = "#2b8cbe"), label_angle = 45 ) plot( vis, type = "comparison", draw = TRUE, main = "Warning vs Summary Counts (Customized)", palette = c(warning = "#cb181d", summary = "#3182bd"), label_angle = 45 ) }
Plot a data-description object
## S3 method for class 'mfrm_data_description' plot( x, y = NULL, type = c("score_distribution", "facet_levels", "missing"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )## S3 method for class 'mfrm_data_description' plot( x, y = NULL, type = c("score_distribution", "facet_levels", "missing"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )
x |
Output from |
y |
Reserved for generic compatibility. |
type |
Plot type: |
main |
Optional title override. |
palette |
Optional named colors ( |
label_angle |
X-axis label angle for bar plots. |
draw |
If |
... |
Reserved for generic compatibility. |
This method draws quick pre-fit quality views from describe_mfrm_data():
score distribution balance
facet-level structure size
missingness by selected columns
A plotting-data object of class mfrm_plot_data.
"score_distribution": bar chart of weighted observation counts per
score category. Y-axis is WeightedN (sum of weights for each
category). Categories with very few observations (< 10) may produce
unstable threshold estimates. A roughly uniform or unimodal
distribution is ideal; heavy floor/ceiling effects compress the
measurement range.
"facet_levels": bar chart showing the number of distinct levels
per facet. Useful for verifying that the design structure matches
expectations (e.g., expected number of raters or criteria). Very
large numbers of levels increase computation time and may require
higher maxit in fit_mfrm().
"missing": bar chart of missing-value counts per input column.
Columns with non-zero counts should be investigated before
fitting—rows with missing scores, persons, or facet IDs are
dropped during estimation.
Run describe_mfrm_data() before fitting.
Inspect summary(ds) and plot(ds, type = "missing").
Check category/facet balance with other plot types.
Fit model after resolving obvious data issues.
describe_mfrm_data(), plot()
toy <- load_mfrmr_data("example_core") ds <- describe_mfrm_data(toy, "Person", c("Rater", "Criterion"), "Score") p <- plot(ds, draw = FALSE)toy <- load_mfrmr_data("example_core") ds <- describe_mfrm_data(toy, "Person", c("Rater", "Criterion"), "Score") p <- plot(ds, draw = FALSE)
Plot a design-simulation study
## S3 method for class 'mfrm_design_evaluation' plot( x, facet = c("Rater", "Criterion", "Person"), metric = c("separation", "reliability", "infit", "outfit", "misfitrate", "mnsqmisfitrate", "underfitrate", "overfitrate", "mixedmisfitrate", "inbandrate", "severityrmse", "severitybias", "convergencerate", "elapsedsec", "mincategorycount"), x_var = c("n_person", "n_rater", "n_criterion", "raters_per_person"), group_var = NULL, draw = TRUE, ... )## S3 method for class 'mfrm_design_evaluation' plot( x, facet = c("Rater", "Criterion", "Person"), metric = c("separation", "reliability", "infit", "outfit", "misfitrate", "mnsqmisfitrate", "underfitrate", "overfitrate", "mixedmisfitrate", "inbandrate", "severityrmse", "severitybias", "convergencerate", "elapsedsec", "mincategorycount"), x_var = c("n_person", "n_rater", "n_criterion", "raters_per_person"), group_var = NULL, draw = TRUE, ... )
x |
Output from |
facet |
Facet to visualize. |
metric |
Metric to plot. |
x_var |
Design variable used on the x-axis. When |
group_var |
Optional design variable used for separate lines. The same
alias rules as |
draw |
If |
... |
Reserved for generic compatibility. |
This method is designed for quick design-planning scans rather than polished publication graphics.
Useful first plots are:
rater metric = "separation" against x_var = "n_person"
criterion metric = "severityrmse" against x_var = "n_person"
when you want aligned recovery error rather than raw location shifts
rater metric = "convergencerate" against x_var = "raters_per_person"
rater metric = "underfitrate" or metric = "overfitrate" when
directional MnSq-band misfit is the planning target
If draw = TRUE, invisibly returns a plotting-data list. If
draw = FALSE, returns that list directly. The returned list includes
resolved canonical variables (x_var, group_var) together with public
labels (x_label, group_label), design_variable_aliases, and
design_descriptor, plus planning_scope, planning_constraints, and
planning_schema.
evaluate_mfrm_design(), summary.mfrm_design_evaluation
sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(8, 12), n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 8, seed = 123 )) p <- plot(sim_eval, facet = "Rater", metric = "separation", x_var = "n_person", draw = FALSE) c(p$facet, p$x_var)sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(8, 12), n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 8, seed = 123 )) p <- plot(sim_eval, facet = "Rater", metric = "separation", x_var = "n_person", draw = FALSE) c(p$facet, p$x_var)
Renders the directed nesting index
as a heatmap between facet pairs,
highlighting fully nested relationships close to 1. Colour scale
runs from 0 (crossed, white / cold) to 1 (fully nested, dark).
## S3 method for class 'mfrm_facet_nesting' plot(x, preset = c("standard", "publication", "compact"), ...)## S3 method for class 'mfrm_facet_nesting' plot(x, preset = c("standard", "publication", "compact"), ...)
x |
An |
preset |
Plot preset. |
... |
Reserved. |
Invisibly, the matrix rendered.
detect_facet_nesting(),
analyze_hierarchical_structure().
Per-level observation counts rendered as a horizontal bar chart
coloured by the Linacre sample-size band assigned in
facet_small_sample_audit(). Vertical dashed lines mark the
sparse / marginal / standard thresholds so reviewers see where
every facet level sits relative to the Linacre (1994) guidance.
## S3 method for class 'mfrm_facet_sample_audit' plot(x, top_n = NULL, preset = c("standard", "publication", "compact"), ...)## S3 method for class 'mfrm_facet_sample_audit' plot(x, top_n = NULL, preset = c("standard", "publication", "compact"), ...)
x |
An |
top_n |
Optional integer; trim the y-axis to the |
preset |
One of |
... |
Reserved. |
Invisibly, the data.frame used for the plot.
Plot outputs from a legacy-compatible workflow run
## S3 method for class 'mfrm_facets_run' plot(x, y = NULL, type = c("fit", "qc"), ...)## S3 method for class 'mfrm_facets_run' plot(x, y = NULL, type = c("fit", "qc"), ...)
x |
A |
y |
Unused. |
type |
Plot route: |
... |
Additional arguments passed to the selected plot function. |
This method is a router for fast visualization from a one-shot workflow result:
type = "fit" for model-level displays.
type = "qc" for multi-panel quality-control diagnostics.
A plotting object from the delegated plot route.
Returns the plotting object produced by the delegated route:
plot.mfrm_fit() for "fit" and plot_qc_dashboard() for "qc".
Run run_mfrm_facets().
Start with plot(out, type = "fit", draw = FALSE).
Continue with plot(out, type = "qc", draw = FALSE) for diagnostics.
run_mfrm_facets(), plot.mfrm_fit(), plot_qc_dashboard(),
mfrmr_visual_diagnostics, mfrmr_workflow_methods
toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] out <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 10 ) p_fit <- plot(out, type = "fit", draw = FALSE) p_fit$wright_map$data$plot p_qc <- plot(out, type = "qc", draw = FALSE) p_qc$data$plottoy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] out <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 10 ) p_fit <- plot(out, type = "fit", draw = FALSE) p_fit$wright_map$data$plot p_qc <- plot(out, type = "qc", draw = FALSE) p_qc$data$plot
Plot fitted MFRM results with base R
## S3 method for class 'mfrm_fit' plot( x, type = NULL, facet = NULL, top_n = 30, theta_range = c(-6, 6), theta_points = 241, title = NULL, palette = NULL, label_angle = 45, show_ci = FALSE, ci_level = 0.95, group = NULL, draw = TRUE, preset = c("standard", "publication", "compact"), ... )## S3 method for class 'mfrm_fit' plot( x, type = NULL, facet = NULL, top_n = 30, theta_range = c(-6, 6), theta_points = 241, title = NULL, palette = NULL, label_angle = 45, show_ci = FALSE, ci_level = 0.95, group = NULL, draw = TRUE, preset = c("standard", "publication", "compact"), ... )
x |
An |
type |
Plot type. Omit |
facet |
Optional facet name for |
top_n |
Maximum number of facet/step locations retained for compact displays. |
theta_range |
Numeric length-2 range for pathway, CCC, and category-surface payloads. |
theta_points |
Number of theta grid points used for pathway, CCC, and category-surface payloads. |
title |
Optional custom title. |
palette |
Optional color overrides. |
label_angle |
Rotation angle for x-axis labels where applicable. |
show_ci |
If |
ci_level |
Confidence level used when |
group |
Optional grouping for |
draw |
If |
preset |
Visual preset ( |
... |
Additional arguments ignored for S3 compatibility. |
This S3 plotting method provides the core fit-family visuals for
mfrmr. When type is omitted, it returns the Wright map alone as
an mfrm_plot_data object (the most useful single figure for a
first inspection). Pass type = "bundle" (or "all" / "default")
to obtain the legacy three-plot mfrm_plot_bundle containing a
Wright map, pathway map, and category characteristic curves. The
returned object always carries machine-readable metadata through
the mfrm_plot_data contract, even when the plot is drawn
immediately.
type = "wright" shows persons, facet levels, and step thresholds on
a shared logit scale. type = "pathway" shows expected score traces
and dominant-category regions across theta. type = "ccc" shows
category response probabilities. type = "ccc_surface" or
type = "category_surface" returns a 3D-ready category-probability
surface payload for external rendering; it deliberately does not add a
plotly/rgl dependency or replace the 2D CCC/pathway reporting figures. The
payload includes category_support, interpretation_guide, and
reporting_policy tables so retained zero-frequency categories and
manuscript-use boundaries remain visible to beginners. The remaining
types ("facet", "person", "step", "shrinkage") provide
compact location-specific displays.
Invisibly, an mfrm_plot_data object (default and for any
single type), or an mfrm_plot_bundle when
type = "bundle" / "all" / "default".
Fit a model with fit_mfrm().
Use plot(fit) to inspect the Wright map at a glance.
Switch to type = "pathway", "ccc", or "shrinkage" for the
relevant follow-up figure, or type = "bundle" for the
three-plot overview when preparing a FACETS-style summary.
For a plot-selection guide and extended examples, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
fit_mfrm(), plot_wright_unified(), plot_bubble(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25 ) wright <- plot(fit, draw = FALSE) wright$data$plot # Look for: persons clustered against the facet / step rows on the # shared logit axis. Large gaps between the person density and # the step / facet rails indicate weak targeting; ceiling / # floor stripes mean the test is too easy / hard. bundle <- plot(fit, type = "bundle", draw = FALSE) bundle$wright_map$data$plot # Look for: pathway curves rising in the expected order with # visible dominant-category bands; CCC curves peaking sequentially # without one category being completely overlapped by neighbours. surface <- plot(fit, type = "ccc_surface", draw = FALSE) head(surface$data$surface) surface$data$category_support # Look for: every retained category having `Observed > 0`; categories # with zero observations are returned as a placeholder slice and # should not be interpreted as a real score region. surface$data$interpretation_guide if (interactive()) { plot( fit, type = "wright", preset = "publication", title = "Customized Wright Map", show_ci = TRUE, label_angle = 45 ) plot( fit, type = "pathway", title = "Customized Pathway Map", palette = c("#1f78b4") ) plot( fit, type = "ccc", title = "Customized Category Characteristic Curves", palette = c("#1b9e77", "#d95f02", "#7570b3") ) }toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", model = "RSM", maxit = 25 ) wright <- plot(fit, draw = FALSE) wright$data$plot # Look for: persons clustered against the facet / step rows on the # shared logit axis. Large gaps between the person density and # the step / facet rails indicate weak targeting; ceiling / # floor stripes mean the test is too easy / hard. bundle <- plot(fit, type = "bundle", draw = FALSE) bundle$wright_map$data$plot # Look for: pathway curves rising in the expected order with # visible dominant-category bands; CCC curves peaking sequentially # without one category being completely overlapped by neighbours. surface <- plot(fit, type = "ccc_surface", draw = FALSE) head(surface$data$surface) surface$data$category_support # Look for: every retained category having `Observed > 0`; categories # with zero observations are returned as a placeholder slice and # should not be interpreted as a real score region. surface$data$interpretation_guide if (interactive()) { plot( fit, type = "wright", preset = "publication", title = "Customized Wright Map", show_ci = TRUE, label_angle = 45 ) plot( fit, type = "pathway", title = "Customized Pathway Map", palette = c("#1f78b4") ) plot( fit, type = "ccc", title = "Customized Category Characteristic Curves", palette = c("#1b9e77", "#d95f02", "#7570b3") ) }
Plot a future arbitrary-facet planning active branch
## S3 method for class 'mfrm_future_branch_active_branch' plot( x, y = NULL, type = c("profile_metrics", "load_balance", "coverage", "readiness_tiers", "table_rows", "role_tables", "appendix_roles", "appendix_sections", "appendix_presets", "selection_handoff_presets", "selection_tables", "selection_handoff", "selection_handoff_bundles", "selection_handoff_roles", "selection_handoff_role_sections", "selection_bundles", "selection_roles", "selection_sections"), appendix_preset = c("recommended", "compact", "all", "methods", "results", "diagnostics", "reporting"), selection_value = c("count", "fraction"), draw = TRUE, main = NULL, palette = NULL, label_angle = 45, ... )## S3 method for class 'mfrm_future_branch_active_branch' plot( x, y = NULL, type = c("profile_metrics", "load_balance", "coverage", "readiness_tiers", "table_rows", "role_tables", "appendix_roles", "appendix_sections", "appendix_presets", "selection_handoff_presets", "selection_tables", "selection_handoff", "selection_handoff_bundles", "selection_handoff_roles", "selection_handoff_role_sections", "selection_bundles", "selection_roles", "selection_sections"), appendix_preset = c("recommended", "compact", "all", "methods", "results", "diagnostics", "reporting"), selection_value = c("count", "fraction"), draw = TRUE, main = NULL, palette = NULL, label_angle = 45, ... )
x |
Output from the future-branch active planning scaffold stored in
|
y |
Unused placeholder for generic compatibility. |
type |
Plot type: |
appendix_preset |
Appendix preset used for |
selection_value |
For |
draw |
If |
main |
Optional title override. |
palette |
Optional named color overrides. |
label_angle |
Axis-label rotation angle. |
... |
Reserved for generic compatibility. |
A plotting-data object of class mfrm_plot_data.
summary.mfrm_future_branch_active_branch()
Plot DIF/bias screening simulation results
## S3 method for class 'mfrm_signal_detection' plot( x, signal = c("dif", "bias"), metric = c("power", "false_positive", "estimate", "screen_rate", "screen_false_positive"), x_var = c("n_person", "n_rater", "n_criterion", "raters_per_person"), group_var = NULL, draw = TRUE, ... )## S3 method for class 'mfrm_signal_detection' plot( x, signal = c("dif", "bias"), metric = c("power", "false_positive", "estimate", "screen_rate", "screen_false_positive"), x_var = c("n_person", "n_rater", "n_criterion", "raters_per_person"), group_var = NULL, draw = TRUE, ... )
x |
Output from |
signal |
Whether to plot DIF or bias screening results. |
metric |
Metric to plot. For |
x_var |
Design variable used on the x-axis. When |
group_var |
Optional design variable used for separate lines. The same
alias rules as |
draw |
If |
... |
Reserved for generic compatibility. |
If draw = TRUE, invisibly returns plotting data. If draw = FALSE,
returns that plotting-data list directly. The returned list includes
resolved canonical variables (x_var, group_var) together with public
labels (x_label, group_label), design_variable_aliases,
design_descriptor, planning_scope, planning_constraints,
planning_schema,
display_metric, and interpretation_note so
callers can label bias-side plots as screening summaries rather than
formal power/error-rate displays.
evaluate_mfrm_signal_detection(), summary.mfrm_signal_detection
sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection( n_person = 8, n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 5, bias_max_iter = 1, seed = 123 )) plot(sig_eval, signal = "dif", metric = "power", x_var = "n_person", draw = FALSE)sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection( n_person = 8, n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 5, bias_max_iter = 1, seed = 123 )) plot(sig_eval, signal = "dif", metric = "power", x_var = "n_person", draw = FALSE)
Plot a summary-table bundle for manuscript QC
## S3 method for class 'mfrm_summary_table_bundle' plot( x, y = NULL, type = c("table_rows", "role_tables", "appendix_roles", "appendix_sections", "appendix_presets", "selection_handoff_presets", "selection_tables", "selection_handoff", "selection_handoff_bundles", "selection_handoff_roles", "selection_handoff_role_sections", "selection_bundles", "selection_roles", "selection_sections", "numeric_profile", "first_numeric"), which = NULL, selection_value = c("count", "fraction"), appendix_preset = c("recommended", "compact", "all", "methods", "results", "diagnostics", "reporting"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )## S3 method for class 'mfrm_summary_table_bundle' plot( x, y = NULL, type = c("table_rows", "role_tables", "appendix_roles", "appendix_sections", "appendix_presets", "selection_handoff_presets", "selection_tables", "selection_handoff", "selection_handoff_bundles", "selection_handoff_roles", "selection_handoff_role_sections", "selection_bundles", "selection_roles", "selection_sections", "numeric_profile", "first_numeric"), which = NULL, selection_value = c("count", "fraction"), appendix_preset = c("recommended", "compact", "all", "methods", "results", "diagnostics", "reporting"), main = NULL, palette = NULL, label_angle = 45, draw = TRUE, ... )
x |
Output from |
y |
Reserved for generic compatibility. |
type |
Plot type: |
which |
Optional table selector used for numeric plot types. |
selection_value |
For |
appendix_preset |
Appendix preset used for |
main |
Optional title override. |
palette |
Optional named color overrides. |
label_angle |
Axis-label rotation angle for bar-type plots. |
draw |
If |
... |
Reserved for generic compatibility. |
This helper keeps summary-bundle plotting conservative. It either visualizes
the bundle's own bundle-level indexes ("table_rows", "role_tables",
"appendix_roles", "appendix_sections", "appendix_presets") or routes a
selected table through apa_table() and plot.apa_table() for numeric QC.
A plotting-data object of class mfrm_plot_data.
"table_rows": compares returned table sizes to show where reporting mass sits.
"role_tables": shows how many returned tables belong to each reporting role.
"appendix_roles": shows how returned tables contribute to conservative
appendix routing by reporting role.
"appendix_sections": shows how returned tables are distributed across
methods/results/diagnostics/reporting sections.
"appendix_presets": shows how many tables the current bundle contributes
to the conservative appendix presets.
"selection_handoff_presets": shows plot-ready appendix handoff counts by
preset for workflow-only appendix routing surfaces in the bundle.
"selection_tables" / "selection_handoff" /
"selection_handoff_bundles" /
"selection_handoff_roles" / "selection_handoff_role_sections" /
"selection_bundles" /
"selection_roles" / "selection_sections": show workflow-only appendix
selection surfaces already materialized inside the bundle.
"numeric_profile" / "first_numeric": reuse the same numeric QC logic as
plot.apa_table() but start from a summary-table bundle.
build_summary_table_bundle(), apa_table(), plot.apa_table()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) bundle <- build_summary_table_bundle(fit) plot(bundle, draw = FALSE) plot(bundle, type = "numeric_profile", which = "facet_overview", draw = FALSE)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) bundle <- build_summary_table_bundle(fit) plot(bundle, draw = FALSE) plot(bundle, type = "numeric_profile", which = "facet_overview", draw = FALSE)
Build a precision audit report
precision_audit_report(fit, diagnostics = NULL)precision_audit_report(fit, diagnostics = NULL)
fit |
Output from |
diagnostics |
Optional output from |
This helper summarizes how mfrmr derived SE, CI, and reliability values
for the current run. It is package-native and is intended to help users
distinguish model-based precision paths from exploratory ones without
requiring external software conventions.
A named list with:
profile: one-row precision overview
checks: package-native precision audit checks
approximation_notes: detailed method notes
settings: resolved model and method labels
precision_audit_report() is a reporting gatekeeper for precision claims.
It tells you how the package derived uncertainty summaries for the current
run and how cautiously those summaries should be written up.
It does not, by itself, validate the measurement model or substantive conclusions.
A favorable precision tier does not override convergence, fit, linking, or design problems elsewhere in the analysis.
profile: one-row overview of the active precision tier and recommended use.
checks: package-native audit checks for SE ordering, reliability ordering,
coverage of sample/population summaries, and SE source labels.
approximation_notes: method notes copied from diagnose_mfrm().
Use the profile$PrecisionTier and checks table to decide whether SE, CI,
and reliability language can be phrased as model-based, should be qualified
as hybrid, or should remain exploratory in the final report.
Run diagnose_mfrm() for the fitted model.
Build precision_audit_report(fit, diagnostics = diag).
Use summary() to see whether the run supports model-based reporting
language or should remain in exploratory/screening mode.
diagnose_mfrm(), facet_statistics_report(), reporting_checklist()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- precision_audit_report(fit, diagnostics = diag) summary(out)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") out <- precision_audit_report(fit, diagnostics = diag) summary(out)
Forecast population-level MFRM operating characteristics for one future design
predict_mfrm_population( fit = NULL, sim_spec = NULL, n_person = NULL, n_rater = NULL, n_criterion = NULL, raters_per_person = NULL, design = NULL, reps = 50, fit_method = NULL, model = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), seed = NULL )predict_mfrm_population( fit = NULL, sim_spec = NULL, n_person = NULL, n_rater = NULL, n_criterion = NULL, raters_per_person = NULL, design = NULL, reps = 50, fit_method = NULL, model = NULL, maxit = 25, quad_points = 7, residual_pca = c("none", "overall", "facet", "both"), seed = NULL )
fit |
Optional output from |
sim_spec |
Optional output from |
n_person |
Number of persons/respondents in the future design. Defaults to the value stored in the base simulation specification. |
n_rater |
Number of rater facet levels in the future design. Defaults to the value stored in the base simulation specification. |
n_criterion |
Number of criterion/item facet levels in the future design. Defaults to the value stored in the base simulation specification. |
raters_per_person |
Number of raters assigned to each person in the future design. Defaults to the value stored in the base simulation specification. |
design |
Optional named design override supplied as a named list,
named vector, or one-row data frame. Names may use canonical variables
( |
reps |
Number of replications used in the forecast simulation. |
fit_method |
Estimation method used inside the forecast simulation. When
|
model |
Measurement model used when refitting the forecasted design. Defaults to the model recorded in the base simulation specification. |
maxit |
Maximum iterations passed to |
quad_points |
Quadrature points for |
residual_pca |
Residual PCA mode passed to |
seed |
Optional seed for reproducible replications. |
predict_mfrm_population() is a scenario-level forecasting helper built
on top of evaluate_mfrm_design(). It is intended for questions such as:
what separation/reliability would we expect if the next administration had 60 persons, 4 raters, and 2 ratings per person?
how much Monte Carlo uncertainty remains around those expected summaries?
The function deliberately returns aggregate operating characteristics (for example mean separation, reliability, recovery RMSE, convergence rate) rather than future individual true values for one respondent or one rater.
If fit is supplied, the function first constructs a fit-derived parametric
starting point with extract_mfrm_sim_spec() and then evaluates the
requested future design under that explicit data-generating mechanism. This
should be interpreted as a fit-based forecast under modeling assumptions, not
as a guaranteed out-of-sample prediction.
When that fit-derived or manually built simulation specification stores an
active latent-regression population generator, the helper still operates at
the design / operating-characteristic level. It repeatedly simulates
person-level covariates and responses, refits the MML population-model
branch, and summarizes the resulting facet-level behavior. This is distinct
from the fitted-model posterior scoring provided by predict_mfrm_units().
The bounded GPCM branch is supported here with the same caveats as
evaluate_mfrm_design(): forecasts are scenario-level operating
characteristics under an explicit slope-aware generator, and the planning
layer still targets the role-based person x rater-like x criterion-like
design contract rather than a fully arbitrary-facet planner.
An object of class mfrm_population_prediction with components:
design: requested future design
forecast: facet-level forecast table
overview: run-level overview
simulation: underlying evaluate_mfrm_design() result
sim_spec: simulation specification used for the forecast
facet_names: public non-person facet names carried by the simulation
specification
design_variable_aliases: public aliases for
n_person/n_rater/n_criterion/raters_per_person
design_descriptor: role-based description of design variables carried
from the underlying planning object
planning_scope: explicit record of the current planning contract,
including a facet_manifest and future-planner scaffold marker
planning_constraints: explicit record of mutable/locked design variables
planning_schema: combined planner-schema contract carrying the role
table, current boundary, mutability map, facet manifest, and a
schema-only future facet-count table
settings: forecasting settings
ademp: simulation-study metadata
notes: interpretation notes
forecast contains facet-level expected summaries for the requested
future design.
Mcse* columns quantify Monte Carlo uncertainty from using a finite number
of replications.
design_variable_aliases and design_descriptor carry the same public
naming metadata used by the underlying planning object. They rename the
standard two non-person facet roles for presentation, but they do not turn
the current planner into a fully arbitrary-facet simulator.
If sim_spec$population$active = TRUE, the forecast summarizes repeated
latent-regression MML refits under that stored person-level generator; it
is still a scenario forecast rather than direct posterior scoring for one
observed sample.
simulation stores the full design-evaluation object in case you want to
inspect replicate-level behavior.
This helper does not produce definitive future person measures or rater severities for one concrete sample. It forecasts design-level behavior under the supplied or derived parametric assumptions.
The forecast is implemented as a one-scenario Monte Carlo / operating-
characteristic study following the general guidance of Morris, White, and
Crowther (2019) and the ADEMP-oriented reporting framework discussed by
Siepe et al. (2024). In mfrmr, this function is a practical wrapper for
future-design planning rather than a direct implementation of a published
many-facet forecasting procedure.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartos, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.
build_mfrm_sim_spec(), extract_mfrm_sim_spec(),
evaluate_mfrm_design(), summary.mfrm_population_prediction
spec <- build_mfrm_sim_spec( n_person = 16, n_rater = 3, n_criterion = 2, raters_per_person = 2, assignment = "rotating" ) pred <- predict_mfrm_population( sim_spec = spec, design = list(person = 18), reps = 1, maxit = 5, seed = 123 ) s_pred <- summary(pred) s_pred$forecast[, c("Facet", "MeanSeparation", "McseSeparation")]spec <- build_mfrm_sim_spec( n_person = 16, n_rater = 3, n_criterion = 2, raters_per_person = 2, assignment = "rotating" ) pred <- predict_mfrm_population( sim_spec = spec, design = list(person = 18), reps = 1, maxit = 5, seed = 123 ) s_pred <- summary(pred) s_pred$forecast[, c("Facet", "MeanSeparation", "McseSeparation")]
Score future or partially observed units under the fitted scoring basis
predict_mfrm_units( fit, new_data, person = NULL, facets = NULL, score = NULL, weight = NULL, person_data = NULL, person_id = NULL, population_policy = c("error", "omit"), interval_level = 0.95, n_draws = 0, seed = NULL )predict_mfrm_units( fit, new_data, person = NULL, facets = NULL, score = NULL, weight = NULL, person_data = NULL, person_id = NULL, population_policy = c("error", "omit"), interval_level = 0.95, n_draws = 0, seed = NULL )
fit |
Output from |
new_data |
Long-format data for the future or partially observed units to be scored. |
person |
Optional person column in |
facets |
Optional facet-column mapping for |
score |
Optional score column in |
weight |
Optional weight column in |
person_data |
Optional one-row-per-person data.frame with the
background variables required by a latent-regression fit. Ignored for
ordinary fixed-calibration scoring. For intercept-only latent-regression
fits ( |
person_id |
Optional person-ID column in |
population_policy |
How missing background data are handled when
|
interval_level |
Posterior interval level returned in |
n_draws |
Optional number of quadrature-grid posterior draws to return per scored person. Use 0 to skip draws. |
seed |
Optional seed for reproducible posterior draws. |
predict_mfrm_units() is the individual-unit companion to
predict_mfrm_population(). It uses the fitted calibration and, when
available, the fitted one-dimensional population model to score new or
partially observed persons via Expected A Posteriori (EAP) summaries on a
quadrature grid.
When the original fit uses ordinary method = "MML", the posterior
summaries are taken under that fitted MML calibration. When the original fit
uses the latent-regression MML branch, the scoring prior is the fitted
conditional normal population model , so the returned summaries are
population-model-aware posterior EAP estimates. When the original fit uses
method = "JML", mfrmr applies the fitted facet/step parameters with a
standard normal reference prior on the quadrature grid, so the returned
person scores remain fixed-calibration EAP summaries rather than direct JML
estimates from the fitting step.
When the fitted population model is intercept-only (population_formula = ~ 1), predict_mfrm_units() still uses the fitted population-model basis,
but it can reconstruct the minimal scored-person table internally because no
background covariates are needed beyond the person IDs in new_data.
The current bounded GPCM branch is included in this scoring layer,
so fitted GPCM objects can be used for the same fixed-calibration
posterior summaries. This does not imply that every downstream diagnostic or
reporting helper has already been generalized to GPCM.
This is appropriate for questions such as:
what posterior location/uncertainty do these partially observed new respondents have under the existing calibration?
how uncertain are those scores, given the observed response pattern?
All non-person facet levels in new_data must already exist in the fitted
calibration. The function does not recalibrate the model, update facet
estimates, or treat overlapping person IDs as the same latent units from the
training data. Person IDs in new_data are treated as labels for the rows
being scored.
When n_draws > 0, the returned draws component contains discrete
quadrature-grid posterior draws that can be used as approximate plausible
values under the fitted scoring basis. They should be interpreted as
posterior uncertainty summaries, not as deterministic future truth values.
For JML fits, this scoring stage is intentionally post hoc: mfrmr uses
the fitted facet and step parameters from the joint-likelihood fit, then
adds a standard normal reference prior only for the scoring layer so that
new or partially observed units can be summarized on a quadrature grid.
This is a practical fixed-calibration EAP procedure, not a claim that the
original JML fit itself estimated a population model.
An object of class mfrm_unit_prediction with components:
estimates: posterior summaries by person
draws: optional quadrature-grid posterior draws
audit: row-level preparation audit for new_data
population_audit: optional person-level omission audit for
latent-regression scoring
input_data: cleaned canonical scoring rows retained from new_data
person_data: cleaned or supplied person-level background data used for
latent-regression scoring; NULL otherwise
settings: scoring settings
notes: interpretation notes
estimates contains posterior EAP summaries for each person in
new_data.
Lower and Upper are quadrature-grid posterior interval bounds at the
requested interval_level.
SD is posterior uncertainty under the fitted scoring basis used for
scoring.
draws, when requested, contains approximate plausible values on the
fitted quadrature grid.
population_audit, when present, records whether scored persons were
omitted because their background data were incomplete for a
latent-regression fit.
This helper does not update the original calibration, estimate new non-person facet levels, or produce deterministic future person true values. It scores new response patterns under the fitted calibration and, when applicable, the fitted one-dimensional population model.
The posterior summaries follow the usual quadrature-based EAP scoring
framework used in item response modeling under calibrated parameters
(for example Bock & Aitkin, 1981). When fit uses the latent-regression
branch, mfrmr scores under the fitted conditional normal population model
in the general plausible-values spirit discussed by Mislevy (1991). Optional
posterior draws are exposed as quadrature-grid plausible-value-style
summaries for practical many-facet scoring rather than as a claim of full
ConQuest numerical equivalence. When the source fit is JML, the same
literature supports
the quadrature-based scoring layer, but the standard normal prior is a
package-level reference prior introduced for post hoc scoring rather than an
estimated population distribution.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177-196.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176.
predict_mfrm_population(), fit_mfrm(),
summary.mfrm_unit_prediction
toy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) raters <- unique(toy$Rater)[1:2] criteria <- unique(toy$Criterion)[1:2] new_units <- data.frame( Person = c("NEW01", "NEW01", "NEW02", "NEW02"), Rater = c(raters[1], raters[2], raters[1], raters[2]), Criterion = c(criteria[1], criteria[2], criteria[1], criteria[2]), Score = c(2, 3, 2, 4) ) pred_units <- predict_mfrm_units(toy_fit, new_units, n_draws = 0) summary(pred_units)$estimates[, c("Person", "Estimate", "Lower", "Upper")]toy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) raters <- unique(toy$Rater)[1:2] criteria <- unique(toy$Criterion)[1:2] new_units <- data.frame( Person = c("NEW01", "NEW01", "NEW02", "NEW02"), Rater = c(raters[1], raters[2], raters[1], raters[2]), Criterion = c(criteria[1], criteria[2], criteria[1], criteria[2]), Score = c(2, 3, 2, 4) ) pred_units <- predict_mfrm_units(toy_fit, new_units, n_draws = 0) summary(pred_units)$estimates[, c("Person", "Estimate", "Lower", "Upper")]
Print an APA reporting bundle
## S3 method for class 'mfrm_apa_outputs' print(x, include_notes = FALSE, include_captions = FALSE, qa = FALSE, ...)## S3 method for class 'mfrm_apa_outputs' print(x, include_notes = FALSE, include_captions = FALSE, qa = FALSE, ...)
x |
Output from |
include_notes |
Logical. If |
include_captions |
Logical. If |
qa |
Logical. If |
... |
Optional arguments passed to |
Typing an mfrm_apa_outputs object at the console prints the concise
Method / Results draft stored in x$report_text. Use summary(x) or
print(x, qa = TRUE) for the structured QA surface with content checks,
component counts, and section availability.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "legacy") apa <- build_apa_outputs(fit, diag) apa summary(apa) print(apa, qa = TRUE, top_n = 2)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag <- diagnose_mfrm(fit, residual_pca = "none", diagnostic_mode = "legacy") apa <- build_apa_outputs(fit, diag) apa summary(apa) print(apa, qa = TRUE, top_n = 2)
Print APA narrative text with preserved line breaks
## S3 method for class 'mfrm_apa_text' print(x, ...)## S3 method for class 'mfrm_apa_text' print(x, ...)
x |
Character text object from |
... |
Reserved for generic compatibility. |
Prints APA narrative text with preserved paragraph breaks using cat().
This is preferred over bare print() when you want readable multi-line
report output in the console.
The input object (invisibly).
The printed text is the same content stored in
build_apa_outputs(...)$report_text, but with explicit paragraph breaks.
Generate apa <- build_apa_outputs(...).
Print readable narrative with apa$report_text.
Use summary(apa) to check completeness before manuscript use.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "both") apa <- build_apa_outputs(fit, diag) apa$report_texttoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "both") apa <- build_apa_outputs(fit, diag) apa$report_text
Computes a Q3-style index inspired by Yen (1984) – the Pearson
correlation of standardized residuals between every pair of
levels of a chosen facet – from a diagnose_mfrm() bundle. Under
the conditional-independence assumption of the MFRM, |Q3| should be
small for every pair; large absolute values flag pairs of facet
elements (e.g. two raters or two items) whose residuals co-move
more than the main-effects model expects.
q3_statistic( fit, diagnostics = NULL, facet = "Rater", min_pairs = 5L, yen_threshold = 0.2, marais_threshold = 0.3, relative_offset = 0.2 )q3_statistic( fit, diagnostics = NULL, facet = "Rater", min_pairs = 5L, yen_threshold = 0.2, marais_threshold = 0.3, relative_offset = 0.2 )
fit |
An |
diagnostics |
Optional |
facet |
Facet whose levels are paired (default |
min_pairs |
Minimum number of shared response opportunities
required to retain a pair. Pairs below the threshold drop out
of the table (mirrors |
yen_threshold |
Community-convention flag threshold (default
|
marais_threshold |
Stricter community-convention threshold
(default |
relative_offset |
Screening offset for the relative-flag rule
|
An object of class mfrm_q3 containing:
pairsA data frame with one row per facet-level pair
and columns Level1, Level2, Q3, N, AbsQ3,
YenFlag, MaraisFlag, RelativeFlag, and a textual
Interpretation summarising which thresholds were exceeded.
summaryOne-row tibble with MeanQ3, MaxAbsQ3,
and the three flagged-pair counts.
thresholdsThe thresholds used, for reproducibility.
facetThe facet whose levels were paired.
This implementation differs from Yen's (1984) original definition in two respects that together affect threshold interpretation.
(1) Standardized vs raw residuals. Yen (1984, eqs. 7-8, p. 127)
defines Q3 = cor(d_i, d_j) where d_{ik} = u_{ik} - P_hat_{ik} is
the raw residual. mfrmr uses standardized residuals
Z = (u - P_hat) / sqrt(Var(u)) because that is what
diagnose_mfrm() stores. Standardization down-weights high-variance
observations and changes the sampling distribution of the resulting
correlation; the published critical values (Chen & Thissen, 1997;
Christensen et al., 2017) were derived for raw-residual Q3.
(2) Mean-aggregation. When the facet being paired (e.g. Rater)
has multiple residual rows per (Person, Level) cell because of
additional facets in the design (e.g. multiple Criterion rows per
Person-Rater cell), the standardized residuals are first
mean-aggregated to one value per (Person, Level) cell, and the
Pearson correlation is taken over those mean-aggregated residuals.
Yen's original formulation takes the correlation directly over
per-(Person, Item) residuals, without aggregation. Mean-aggregation
reduces noise but also shrinks the effective sample size and can pull
correlations toward the cell mean.
For both reasons, treat the values returned here as a screening summary rather than a direct substitute for the published Q3 thresholds. For a formal local-dependence test under raw-residual Q3, use a parametric bootstrap as recommended by Christensen et al. (2017).
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. doi:10.1177/014662168400800201
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for
item pairs using item response theory. Journal of Educational
and Behavioral Statistics, 22(3), 265-289. (Origin of the
commonly cited |Q3| > 0.20 cutoff.)
Marais, I. (2013). Local dependence. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch models in health (pp. 111-130). London: ISTE / Wiley.
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen's Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178-194. doi:10.1177/0146621616677520
plot_local_dependence_heatmap(), diagnose_mfrm()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) q3 <- q3_statistic(fit) q3$summary # Look for: MaxAbsQ3 < 0.20 (Chen & Thissen 1997 community cutoff) is # the comfortable regime; values above 0.30 are commonly considered # strict-flag worthy (Marais, 2013, summarising literature). For a # formal test, use a parametric bootstrap (Christensen et al., 2017). # The summary's flag counts give a quick triage; inspect `q3$pairs` # for the offending level pairs and follow up with content review. head(q3$pairs)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) q3 <- q3_statistic(fit) q3$summary # Look for: MaxAbsQ3 < 0.20 (Chen & Thissen 1997 community cutoff) is # the comfortable regime; values above 0.30 are commonly considered # strict-flag worthy (Marais, 2013, summarising literature). For a # formal test, use a parametric bootstrap (Christensen et al., 2017). # The summary's flag counts give a quick triage; inspect `q3$pairs` # for the offending level pairs and follow up with content review. head(q3$pairs)
Build a rating-scale diagnostics report
rating_scale_table( fit, diagnostics = NULL, whexact = FALSE, drop_unused = FALSE )rating_scale_table( fit, diagnostics = NULL, whexact = FALSE, drop_unused = FALSE )
fit |
Output from |
diagnostics |
Optional output from |
whexact |
Use exact ZSTD transformation for category fit. |
drop_unused |
If |
This helper provides category usage/fit statistics and threshold summaries
for reviewing score-category functioning.
The category usage portion is a global observed-score screen. In PCM fits
with a step_facet, threshold diagnostics should be interpreted within each
StepFacet rather than as one pooled whole-scale verdict.
Typical checks:
sparse category usage (Count, ExpectedCount)
category fit (Infit, Outfit, ZStd)
threshold ordering within each StepFacet
(threshold_table$Estimate, GapFromPrev)
A named list with:
category_table: category-level counts, expected counts, fit, and ZSTD
threshold_table: model step/threshold estimates
summary: one-row summary (usage and threshold monotonicity)
caveats: structured score-support warning/review rows
diagnostic_mode: character scalar carried from
diagnostics$diagnostic_mode ("legacy", "both", or
"marginal_fit"); used by downstream reporting helpers to
pick the correct expected-count basis
marginal_fit: list bundle from diagnostics$marginal_fit when
strict marginal fit was computed, otherwise NULL. Carries
the raw OverallRMSD / OverallMaxAbsStdResidual / per-cell
tables that feed the MarginalOverallRMSD columns in
summary.
Start with summary:
UsedCategories close to total Categories suggests that most score
categories are represented in the observed data.
very small MinCategoryCount indicates potential instability.
ThresholdMonotonic = FALSE indicates disordered thresholds within at
least one threshold set. In PCM fits, inspect threshold_table by
StepFacet before drawing scale-wide conclusions.
Then inspect:
category_table for global category-level misfit/sparsity.
threshold_table for adjacent-step gaps and ordering within each
StepFacet.
Fit model: fit_mfrm().
Build diagnostics: diagnose_mfrm().
Run rating_scale_table() and review summary().
Use plot() to visualize category profile quickly.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
The category_table data.frame contains:
Score category value.
Observed count and percentage of total.
Mean person measure for respondents in this category.
Category-level fit statistics.
Standardized fit values.
Expected count and observed-expected difference.
Logical; TRUE if count is below minimum threshold.
Fit-based warning flags.
Structured score-support caveats for retained zero-count categories.
The threshold_table data.frame contains:
Step label (e.g., "1-2", "2-3").
Estimated threshold/step difficulty (logits).
Threshold family identifier when the fit uses facet-specific threshold sets.
Difference from the previous threshold within the same
StepFacet when thresholds are facet-specific. Gaps below
1.4 logits may indicate category underuse; gaps above 5.0 may
indicate wide unused regions (Linacre, 2002).
Logical flag repeated within each threshold set.
For PCM fits, read this within StepFacet, not as a pooled item-bank
verdict.
Adjacent score-category support metadata. Thresholds adjacent to retained zero-count categories are flagged for cautious interpretation.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573. doi:10.1007/BF02293814
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174. doi:10.1007/BF02296272
Linacre, J. M. (2002). What do Infit and Outfit, mean-square and
standardized mean? Rasch Measurement Transactions, 16(2), 878.
(Source for the broad 0.5-1.5 mean-square screening convention and
the threshold-gap heuristics used in summary(t8)$summary; applied
misfit bands remain purpose- and sample-dependent.)
Wind, S. A. (2023). Detecting rating scale malfunctioning with the
partial credit model and generalized partial credit model.
Educational and Psychological Measurement, 83(5), 953-983.
doi:10.1177/00131644221116292 (Recent simulation evidence on
PCM- and GPCM-based rating-scale diagnostics; useful for
interpreting the summary(t8)$summary flags in the bounded
GPCM route.)
diagnose_mfrm(), measurable_summary_table(), plot.mfrm_fit(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) t8 <- rating_scale_table(fit) summary(t8) summary(t8)$summary p_t8 <- plot(t8, draw = FALSE) p_t8$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) t8 <- rating_scale_table(fit) summary(t8) summary(t8)$summary p_t8 <- plot(t8, draw = FALSE) p_t8$data$plot
NA
Convenience helper that replaces the standard non-NA missing-code
sentinels used in SPSS / SAS / FACETS exports (99, 999, -1,
"N", "NA", "n/a", ".", "") with NA across the columns
you select. This is useful before calling fit_mfrm() on data exported
with those conventions.
recode_missing_codes( data, columns = NULL, codes = c("99", "999", "-1", "N", "NA", "n/a", ".", ""), numeric_codes = TRUE, verbose = FALSE )recode_missing_codes( data, columns = NULL, codes = c("99", "999", "-1", "N", "NA", "n/a", ".", ""), numeric_codes = TRUE, verbose = FALSE )
data |
A data frame. |
columns |
Character vector of column names to recode. Defaults
to |
codes |
Character vector of code values to convert to |
numeric_codes |
Logical; if |
verbose |
Logical; if |
The input data with the specified missing sentinels
replaced by NA. A mfrm_missing_recoding attribute records the
per-column replacement counts for audit logs.
describe_mfrm_data(), fit_mfrm().
dat <- data.frame( Person = paste0("P", 1:5), Rater = c("R1", "R1", "R2", "R2", "R2"), Score = c(1, 99, 2, -1, 3) ) cleaned <- recode_missing_codes(dat, columns = "Score") cleaned$Score attr(cleaned, "mfrm_missing_recoding")dat <- data.frame( Person = paste0("P", 1:5), Rater = c("R1", "R1", "R2", "R2", "R2"), Score = c(1, 99, 2, -1, 3) ) cleaned <- recode_missing_codes(dat, columns = "Score") cleaned$Score attr(cleaned, "mfrm_missing_recoding")
Recommend a design condition from simulation results
recommend_mfrm_design( x, facets = c("Rater", "Criterion"), min_separation = 2, min_reliability = 0.8, max_severity_rmse = 0.5, max_misfit_rate = 0.1, min_convergence_rate = 1, prefer = c("n_person", "raters_per_person", "n_rater", "n_criterion") )recommend_mfrm_design( x, facets = c("Rater", "Criterion"), min_separation = 2, min_reliability = 0.8, max_severity_rmse = 0.5, max_misfit_rate = 0.1, min_convergence_rate = 1, prefer = c("n_person", "raters_per_person", "n_rater", "n_criterion") )
x |
Output from |
facets |
Facets that must satisfy the planning thresholds. |
min_separation |
Minimum acceptable mean separation. |
min_reliability |
Minimum acceptable mean reliability. |
max_severity_rmse |
Maximum acceptable severity recovery RMSE. |
max_misfit_rate |
Maximum acceptable mean misfit rate. |
min_convergence_rate |
Minimum acceptable convergence rate. |
prefer |
Ranking priority among design variables. Earlier entries are
optimized first when multiple designs pass. Custom public aliases from
|
This helper converts a design-study summary into a simple planning table.
A design is marked as recommended when all requested facets satisfy all
selected thresholds simultaneously.
If multiple designs pass, the helper returns the smallest one according to
prefer (by default: fewer persons first, then fewer ratings per person,
then fewer raters, then fewer criteria).
A list of class mfrm_design_recommendation with:
facet_table: facet-level threshold checks, including design-variable
alias columns when applicable
design_table: design-level aggregated checks, including design-variable
alias columns when applicable
recommended: the first passing design after ranking
thresholds: thresholds used in the recommendation
design_variable_aliases: accepted public aliases for design variables
design_descriptor: role-based design-variable metadata
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of mutable/locked design variables
planning_schema: combined planner-schema contract
caveats: structured warning rows for situations where the
recommendation rests on weak evidence (e.g., no design met every
threshold; the recommended design is at the boundary of the
evaluated grid; only one rep was simulated). Empty tibble()
when no caveats apply.
Review summary.mfrm_design_evaluation() and plot.mfrm_design_evaluation().
Use recommend_mfrm_design(...) to identify the smallest acceptable design.
evaluate_mfrm_design(), summary.mfrm_design_evaluation, plot.mfrm_design_evaluation
sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(8, 12), n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 8, seed = 123 )) rec <- recommend_mfrm_design(sim_eval) rec$recommendedsim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(8, 12), n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 8, seed = 123 )) rec <- recommend_mfrm_design(sim_eval) rec$recommended
Build a package-native reference audit for report completeness
reference_case_audit( fit, diagnostics = NULL, bias_results = NULL, reference_profile = c("core", "compatibility"), include_metrics = TRUE, top_n_attention = 15L )reference_case_audit( fit, diagnostics = NULL, bias_results = NULL, reference_profile = c("core", "compatibility"), include_metrics = TRUE, top_n_attention = 15L )
fit |
Output from |
diagnostics |
Optional output from |
bias_results |
Optional output from |
reference_profile |
Audit profile. |
include_metrics |
If |
top_n_attention |
Number of lowest-coverage components to keep in
|
This function repackages the internal contract audit into package-native terminology so users can review output completeness without needing external manual/table numbering. It reports:
component-level schema coverage
numerical consistency checks for derived report tables
the highest-priority attention items for follow-up
It is an internal completeness audit for package-native outputs, not an external validation study.
Use reference_profile = "core" for ordinary mfrmr workflows.
Use reference_profile = "compatibility" only when you explicitly want to
inspect the compatibility layer.
An object of class mfrm_reference_audit.
overall: one-row internal audit summary with schema coverage and metric
pass rate.
component_summary: per-component coverage summary.
attention_items: quickest list of components needing review.
metric_summary / metric_checks: numerical consistency status.
facets_parity_report(), diagnose_mfrm(), build_fixed_reports()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") audit <- reference_case_audit(fit, diagnostics = diag) summary(audit)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") audit <- reference_case_audit(fit, diagnostics = diag) summary(audit)
Benchmark packaged reference cases
reference_case_benchmark( cases = c("synthetic_truth", "synthetic_latent_regression", "synthetic_bias_contract", "study1_itercal_pair", "study2_itercal_pair", "combined_itercal_pair"), method = "MML", model = "RSM", quad_points = 7, maxit = 40, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid") )reference_case_benchmark( cases = c("synthetic_truth", "synthetic_latent_regression", "synthetic_bias_contract", "study1_itercal_pair", "study2_itercal_pair", "combined_itercal_pair"), method = "MML", model = "RSM", quad_points = 7, maxit = 40, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid") )
cases |
Reference cases to run. Defaults to the standard
|
method |
Estimation method passed to |
model |
Model family passed to |
quad_points |
Quadrature points for |
maxit |
Maximum optimizer iterations passed to |
reltol |
Convergence tolerance passed to |
mml_engine |
MML optimization engine passed to |
This function checks mfrmr against the package's curated reference case
families:
synthetic_truth: checks whether recovered facet measures align with the
known generating values from the package's synthetic design.
synthetic_latent_regression: checks whether the first-version
latent-regression MML branch recovers known population coefficients,
residual latent variance, criterion ordering, and posterior-shift
direction from a synthetic overlap case.
synthetic_latent_regression_omit: checks whether the population-model
complete-case omission policy is reflected in the fitted metadata,
response-row audit, active person estimates, and replay provenance.
synthetic_conquest_overlap_dry_run: builds the narrow ConQuest-overlap
bundle for the latent-regression synthetic case, round-trips package tables
through the normalization/audit helpers, and confirms the package-side
workflow without claiming that ConQuest itself was executed.
synthetic_gpcm: checks whether the bounded GPCM branch recovers
known criterion-specific slopes, row-centered step parameters, and
criterion ordering from a synthetic overlap case. This case
currently requires model = "GPCM" and is intended for method = "MML".
synthetic_bias_contract: checks whether package bias tables and
pairwise local comparisons satisfy the identities documented in the bias
help workflow.
*_itercal_pair: compares a baseline packaged dataset with its iterative
recalibration counterpart to review fit stability, facet-measure
alignment, and linking coverage together.
The resulting object is intended as a reference-case check for package
behavior. It does not by itself establish
external validity against FACETS, ConQuest, or published calibration
studies, and it does not assume any familiarity with external table
numbering or printer layouts.
When specialized latent-regression omission or ConQuest-overlap package-side
cases are requested, summary(bench) prints preview rows from
population_policy_checks and conquest_overlap_checks alongside the
reference notes so the package-versus-external validation boundary remains
visible.
An object of class mfrm_reference_benchmark.
overview: one-row reference-case summary.
case_summary: pass/warn/fail triage by reference case.
fit_runs: fitted-run metadata (fit, precision tier, convergence, and
latent-regression population-model/posterior-basis fields, including
categorical-coding details when present).
design_checks: exact design recovery checks for each dataset.
recovery_checks: known-truth recovery metrics for the synthetic cases,
including the latent-regression reference case.
bias_checks: source-backed bias/local-measure identity checks.
pair_checks: paired-dataset stability screens for the iterated cases.
linking_checks: common-element audits for paired calibration datasets.
conquest_overlap_checks: package-side checks for the
ConQuest-overlap bundle/normalization/audit workflow; this remains a
package-side check until actual ConQuest output tables are supplied.
population_policy_checks: complete-case omission checks for population
model benchmark fixtures.
source_profile: source-backed rules used by the reference checks.
bench <- reference_case_benchmark( cases = "synthetic_truth", method = "JML", maxit = 30 ) summary(bench)bench <- reference_case_benchmark( cases = "synthetic_truth", method = "JML", maxit = 30 ) summary(bench)
Build an auto-filled MFRM reporting checklist
reporting_checklist( fit, diagnostics = NULL, bias_results = NULL, hierarchical_structure = NULL, include_references = TRUE )reporting_checklist( fit, diagnostics = NULL, bias_results = NULL, hierarchical_structure = NULL, include_references = TRUE )
fit |
Output from |
diagnostics |
Optional output from |
bias_results |
Optional output from |
hierarchical_structure |
Optional output from
|
include_references |
If |
This helper ports the app-level reporting checklist into a package-native bundle. It does not try to judge substantive reporting quality; instead, it checks whether the fitted object and related diagnostics contain the evidence typically reported in MFRM write-ups.
Checklist items are grouped into seven core sections:
Method section
Global fit
Facet-level statistics
Element-level statistics
Rating scale diagnostics
Bias/interaction analysis
Visual displays
When a fit uses the latent-regression population-model branch, the checklist
also adds a Population Model section covering coefficient reporting,
categorical model-matrix coding, complete-case omissions, posterior-basis
wording, and ConQuest scope wording.
The output is designed for manuscript preparation, audit trails, and reproducible reporting workflows.
A named list with checklist tables. Class:
mfrm_reporting_checklist.
reporting_checklist() is a manuscript-preparation guide. It tells you
which reporting elements are already present in the current analysis
objects and which still need to be generated or documented. The primary
draft-status column is DraftReady; ReadyForAPA is retained as a
backward-compatible alias.
It is not a single run-level pass/fail decision for publication.
DraftReady = TRUE / ReadyForAPA = TRUE does not certify formal
inferential adequacy.
Missing bias rows may simply mean bias_results were not supplied.
checklist: one row per reporting item with Available = TRUE/FALSE.
DraftReady = TRUE means the item can be drafted into a report with the
package's documented caveats. ReadyForAPA is a backward-compatible alias
of the same flag; neither field certifies formal inferential adequacy.
section_summary: available items by section.
software_scope: external-software relationship summary for mfrmr,
FACETS, ConQuest, and SPSS-style tabular handoffs.
visual_scope: plotting-route summary that separates report-default
2D figures from exploratory surface/3D-ready payloads, including a short
InterpretationCheck for the main user-facing caveat. For bounded
GPCM, this table also exposes SupportStatus and ModelCaveat so the
run-specific plotting boundary can be retained in reports and handoffs.
references: core background references when requested.
support_status: bounded-GPCM support contract, when applicable.
caveat: bounded-GPCM visual/reporting caveat, when applicable.
Review the rows with Available = FALSE or DraftReady = FALSE, then add
the missing diagnostics, bias results, or narrative context before calling
build_apa_outputs() for draft text generation. For RSM / PCM
reporting runs, the preferred route is an MML fit plus
diagnose_mfrm(..., diagnostic_mode = "both") so the checklist can see the
legacy and strict marginal screens together. For bounded GPCM, keep
support_status, caveat, and visual_scope$ModelCaveat attached to any
copied checklist, visual, or APA text.
reporting_checklist() is the manuscript/reporting branch of the package.
Use it when the question is "what is still missing from the report?" rather
than "which observations or links need follow-up?" For operational review:
Use build_misfit_casebook() after diagnose_mfrm() when you need ranked
misfit cases and grouping views for local follow-up.
Use build_linking_review() after anchor/drift/chain helpers when you
need operational linking triage rather than manuscript-oriented reporting
tables.
Fit with fit_mfrm(). For RSM / PCM reporting runs, prefer
method = "MML".
Compute diagnostics with diagnose_mfrm(). For RSM / PCM, prefer
diagnostic_mode = "both".
Run reporting_checklist() to see which reporting elements are already
available from the current analysis objects.
If the issue is operational rather than manuscript-facing, branch to
build_misfit_casebook() or build_linking_review() instead of treating
reporting_checklist() as the single review hub.
build_apa_outputs(), build_visual_summaries(),
specifications_report(), data_quality_report(),
build_misfit_casebook(), build_linking_review()
# Fast smoke run: a JML fit + legacy-only diagnostic produces a # populated checklist in well under a second. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag_quick <- diagnose_mfrm(fit_quick, residual_pca = "none", diagnostic_mode = "legacy") chk_quick <- reporting_checklist(fit_quick, diagnostics = diag_quick) head(chk_quick$checklist[, c("Section", "Item", "DraftReady")]) fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") chk <- reporting_checklist(fit, diagnostics = diag) summary(chk) # Look for: a high `Ready` / `Total` ratio in the summary block. # Sections with `Ready = 0` need follow-up before submitting # (typically diagnostic_mode = "both" or a residual-PCA pass). apa <- build_apa_outputs(fit, diag) head(chk$checklist[, c("Section", "Item", "DraftReady", "NextAction")]) # Look for: every row where `DraftReady = "yes"` is ready to draft # from under the documented caveats. `"no"` rows include a concrete # `NextAction` step (e.g. "run plot_qc_dashboard()") so the gap can # be closed without re-reading the methodology guide. nchar(apa$report_text)# Fast smoke run: a JML fit + legacy-only diagnostic produces a # populated checklist in well under a second. toy <- load_mfrmr_data("example_core") fit_quick <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 15) diag_quick <- diagnose_mfrm(fit_quick, residual_pca = "none", diagnostic_mode = "legacy") chk_quick <- reporting_checklist(fit_quick, diagnostics = diag_quick) head(chk_quick$checklist[, c("Section", "Item", "DraftReady")]) fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") chk <- reporting_checklist(fit, diagnostics = diag) summary(chk) # Look for: a high `Ready` / `Total` ratio in the summary block. # Sections with `Ready = 0` need follow-up before submitting # (typically diagnostic_mode = "both" or a residual-PCA pass). apa <- build_apa_outputs(fit, diag) head(chk$checklist[, c("Section", "Item", "DraftReady", "NextAction")]) # Look for: every row where `DraftReady = "yes"` is ready to draft # from under the documented caveats. `"no"` rows include a concrete # `NextAction` step (e.g. "run plot_qc_dashboard()") so the gap can # be closed without re-reading the methodology guide. nchar(apa$report_text)
This helper mirrors mfrmRFacets.R behavior as a package API and keeps
legacy-compatible defaults (model = "RSM", method = "JML"), while allowing
users to choose compatible estimation options.
run_mfrm_facets( data, person = NULL, facets = NULL, score = NULL, weight = NULL, keep_original = FALSE, model = c("RSM", "PCM"), method = c("JML", "JMLE", "MML"), step_facet = NULL, anchors = NULL, group_anchors = NULL, noncenter_facet = "Person", dummy_facets = NULL, positive_facets = NULL, quad_points = 15, maxit = 400, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid"), top_n_interactions = 20L ) mfrmRFacets( data, person = NULL, facets = NULL, score = NULL, weight = NULL, keep_original = FALSE, model = c("RSM", "PCM"), method = c("JML", "JMLE", "MML"), step_facet = NULL, anchors = NULL, group_anchors = NULL, noncenter_facet = "Person", dummy_facets = NULL, positive_facets = NULL, quad_points = 15, maxit = 400, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid"), top_n_interactions = 20L )run_mfrm_facets( data, person = NULL, facets = NULL, score = NULL, weight = NULL, keep_original = FALSE, model = c("RSM", "PCM"), method = c("JML", "JMLE", "MML"), step_facet = NULL, anchors = NULL, group_anchors = NULL, noncenter_facet = "Person", dummy_facets = NULL, positive_facets = NULL, quad_points = 15, maxit = 400, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid"), top_n_interactions = 20L ) mfrmRFacets( data, person = NULL, facets = NULL, score = NULL, weight = NULL, keep_original = FALSE, model = c("RSM", "PCM"), method = c("JML", "JMLE", "MML"), step_facet = NULL, anchors = NULL, group_anchors = NULL, noncenter_facet = "Person", dummy_facets = NULL, positive_facets = NULL, quad_points = 15, maxit = 400, reltol = 1e-06, mml_engine = c("direct", "em", "hybrid"), top_n_interactions = 20L )
data |
A data.frame in long format. |
person |
Optional person column name. If |
facets |
Optional facet column names. If |
score |
Optional score column name. If |
weight |
Optional weight column name. |
keep_original |
Passed to |
model |
MFRM model ( |
method |
Estimation method ( |
step_facet |
Step facet for PCM mode; passed to |
anchors |
Optional anchor table (data.frame). |
group_anchors |
Optional group-anchor table (data.frame). |
noncenter_facet |
Non-centered facet passed to |
dummy_facets |
Optional dummy facets fixed at zero. |
positive_facets |
Optional facets with positive orientation. |
quad_points |
Quadrature points for MML; passed to |
maxit |
Maximum optimizer iterations. |
reltol |
Optimization tolerance. |
mml_engine |
MML optimization engine passed to |
top_n_interactions |
Number of rows for interaction diagnostics. |
run_mfrm_facets() is intended as a one-shot workflow helper:
fit -> diagnostics -> key report tables.
Returned objects can be inspected with summary() and plot().
A list with components:
fit: fit_mfrm() result
diagnostics: diagnose_mfrm() result
iteration: estimation_iteration_report() result
fair_average: fair_average_table() result
rating_scale: rating_scale_table() result
run_info: run metadata table
mapping: resolved column mapping
method = "JML" (default): legacy-compatible joint estimation
route; the default preserves the FACETS-style output continuity
that existing one-shot scripts expect. For new analysis scripts,
prefer fit_mfrm(..., method = "MML") – MML is the package-wide
recommended route because person parameters are integrated out
under an N(0, 1) prior and per-person posterior SEs are available.
method = "JMLE": explicit JMLE label; internally equivalent to
JML route.
method = "MML": marginal maximum likelihood route using
quad_points. Use mml_engine = "em" or "hybrid" only for
RSM / PCM fits when you want the staged MML alternatives.
model = "PCM" is supported; set step_facet when facet-specific step
structure is needed.
plot(out, type = "fit") delegates to plot.mfrm_fit() and returns
fit-level visual bundles (e.g., Wright/pathway/CCC).
plot(out, type = "qc") delegates to plot_qc_dashboard() and returns
a QC dashboard plot object.
Start with summary(out):
check convergence and iteration count in overview.
confirm resolved columns in mapping.
Then inspect:
out$rating_scale for category/threshold behavior.
out$fair_average for observed-vs-model scoring tendencies.
out$diagnostics for misfit/reliability/interactions.
Run run_mfrm_facets() with explicit column mapping.
Check summary(out) and summary(out$diagnostics).
Visualize with plot(out, type = "fit") and plot(out, type = "qc").
Export selected tables for reporting (out$rating_scale, out$fair_average).
For new scripts, prefer the package-native route:
fit_mfrm() -> diagnose_mfrm() -> reporting_checklist() ->
build_apa_outputs().
Use run_mfrm_facets() when you specifically need the legacy-compatible
one-shot wrapper.
fit_mfrm(), diagnose_mfrm(), estimation_iteration_report(),
fair_average_table(), rating_scale_table(), mfrmr_visual_diagnostics,
mfrmr_workflow_methods, mfrmr_compatibility_layer
toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] # Legacy-compatible default: RSM + JML out <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 6 ) out$fit$summary[, c("Model", "Method", "MethodUsed")] s <- summary(out) s$overview[, c("Model", "Method", "Converged")] p_fit <- plot(out, type = "fit", draw = FALSE) p_fit$wright_map$data$plot # Optional: MML route if (interactive()) { out_mml <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", quad_points = 5, maxit = 6 ) out_mml$fit$summary[, c("Model", "Method", "MethodUsed")] }toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:12], , drop = FALSE] # Legacy-compatible default: RSM + JML out <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 6 ) out$fit$summary[, c("Model", "Method", "MethodUsed")] s <- summary(out) s$overview[, c("Model", "Method", "Converged")] p_fit <- plot(out, type = "fit", draw = FALSE) p_fit$wright_map$data$plot # Optional: MML route if (interactive()) { out_mml <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", method = "MML", quad_points = 5, maxit = 6 ) out_mml$fit$summary[, c("Model", "Method", "MethodUsed")] }
Integrates convergence, model fit, reliability, separation, element misfit, unexpected responses, category structure, connectivity, inter-rater agreement, and DIF/bias into a single pass/warn/fail report.
run_qc_pipeline( fit, diagnostics = NULL, threshold_profile = "standard", thresholds = NULL, rater_facet = NULL, include_bias = TRUE, bias_results = NULL )run_qc_pipeline( fit, diagnostics = NULL, threshold_profile = "standard", thresholds = NULL, rater_facet = NULL, include_bias = TRUE, bias_results = NULL )
fit |
Output from |
diagnostics |
Output from |
threshold_profile |
Threshold preset: |
thresholds |
Named list to override individual thresholds. |
rater_facet |
Character name of the rater facet for inter-rater check (auto-detected if NULL). |
include_bias |
If |
bias_results |
Optional pre-computed bias results from |
The pipeline evaluates 10 quality checks and assigns a verdict
(Pass / Warn / Fail) to each. The overall status is the most severe
verdict across all checks. Diagnostics are computed automatically via
diagnose_mfrm() if not supplied.
Reliability and separation are used here as QC signals. In mfrmr,
Reliability / Separation are model-based facet indices and
RealReliability / RealSeparation provide more conservative lower bounds.
For MML, these rely on model-based ModelSE values for non-person facets;
for JML, they remain exploratory approximations.
Three threshold presets are available via threshold_profile:
| Aspect | strict | standard | lenient |
| Global fit warn | 1.3 | 1.5 | 1.7 |
| Global fit fail | 1.5 | 2.0 | 2.5 |
| Reliability pass | 0.90 | 0.80 | 0.70 |
| Separation pass | 3.0 | 2.0 | 1.5 |
| Misfit warn (pct) | 3 | 5 | 10 |
| Unexpected fail | 3 | 5 | 10 |
| Min cat count | 15 | 10 | 5 |
| Agreement pass | 60 | 50 | 40 |
| Bias fail (pct) | 5 | 10 | 15 |
Individual thresholds can be overridden via the thresholds argument
(a named list keyed by the internal threshold names shown above).
The element-misfit row uses misfit_low and misfit_high for the MnSq
band and reports both the band and percentage criteria in qc$verdicts.
For bounded GPCM, this pipeline is available as an exploratory screening
route. The returned object includes support_status = "supported_with_caveat" and a caveat field; interpret fair-average and
bias checks as slope-aware GPCM screens, not as Rasch-family invariance
evidence.
Object of class mfrm_qc_pipeline with verdicts, overall status,
details, and recommendations.
The 10 checks are:
Convergence: Did the model converge?
Global fit: Infit/Outfit MnSq within the current review band.
Reliability: Minimum non-person facet model reliability index.
Separation: Minimum non-person facet model separation index.
Element misfit: Percentage of elements with Infit/Outfit outside the current review band.
Unexpected responses: Percentage of observations with large standardized residuals.
Category structure: Minimum category count and threshold ordering.
Connectivity: All observations in a single connected subset.
Inter-rater agreement: Exact agreement percentage for the rater facet (if applicable).
Functioning/Bias screen: Percentage of interaction cells that cross the screening threshold (if interaction results are available).
$overall: character string "Pass", "Warn", or "Fail".
$verdicts: tibble with columns Check, Verdict, Value, and
Threshold for each of the 10 checks.
$details: character vector of human-readable detail strings.
$raw_details: named list of per-check numeric details for
programmatic access.
$recommendations: character vector of actionable suggestions for
checks that did not pass.
$config: records the threshold profile and effective thresholds.
Fit a model: fit <- fit_mfrm(...).
Optionally compute diagnostics and bias:
diag <- diagnose_mfrm(fit);
bias <- estimate_bias(fit, diag, ...).
Run the pipeline: qc <- run_qc_pipeline(fit, diag, bias_results = bias).
Check qc$overall for the headline verdict.
Review qc$verdicts for per-check details.
Follow qc$recommendations for remediation.
Visualize with plot_qc_pipeline().
diagnose_mfrm(), estimate_bias(),
mfrm_threshold_profiles(), plot_qc_pipeline(),
plot_qc_dashboard(), build_visual_summaries()
toy <- load_mfrmr_data("study1") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) qc <- run_qc_pipeline(fit) qc summary(qc) qc$verdictstoy <- load_mfrmr_data("study1") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) qc <- run_qc_pipeline(fit) qc summary(qc) qc$verdicts
Sample approximate plausible values under fitted posterior scoring
sample_mfrm_plausible_values( fit, new_data, person = NULL, facets = NULL, score = NULL, weight = NULL, person_data = NULL, person_id = NULL, population_policy = c("error", "omit"), n_draws = 5, interval_level = 0.95, seed = NULL )sample_mfrm_plausible_values( fit, new_data, person = NULL, facets = NULL, score = NULL, weight = NULL, person_data = NULL, person_id = NULL, population_policy = c("error", "omit"), n_draws = 5, interval_level = 0.95, seed = NULL )
fit |
Output from |
new_data |
Long-format data for the future or partially observed units to be scored. |
person |
Optional person column in |
facets |
Optional facet-column mapping for |
score |
Optional score column in |
weight |
Optional weight column in |
person_data |
Optional one-row-per-person data.frame with the
background variables required by a latent-regression fit. Ignored for
ordinary fixed-calibration scoring. Intercept-only latent-regression fits
can reconstruct the minimal scored-person table internally. This is the
scoring-time table for |
person_id |
Optional person-ID column in |
population_policy |
How missing background data are handled when
|
n_draws |
Number of posterior draws per person. Must be a positive integer. |
interval_level |
Posterior interval level passed to
|
seed |
Optional seed for reproducible posterior draws. |
sample_mfrm_plausible_values() is a thin public wrapper around
predict_mfrm_units() that exposes the fixed-calibration posterior draws as
a standalone object. It is useful when downstream workflows want repeated
latent-value imputations rather than just one posterior EAP summary.
In the current mfrmr implementation these are approximate plausible
values drawn from the fitted quadrature-grid posterior under the scoring
basis implied by fit. For ordinary MML fits this is the fitted marginal
calibration; for latent-regression MML fits it is the fitted conditional
normal population model for the scored persons; for JML fits it is the
fixed facet/step calibration together with a standard normal reference prior
on the quadrature grid. They should be interpreted as posterior uncertainty
summaries for the scored persons, not as deterministic future truth values
and not as a claim of full many-facet plausible-values equivalence with
population-model software.
In other words, the JML path here is a practical scoring approximation
layered on top of the fitted joint-likelihood calibration, whereas the
latent-regression MML path uses the fitted one-dimensional conditional
normal population model. Neither path should be described as a full
many-facet plausible-values system with all ConQuest-style extensions.
An object of class mfrm_plausible_values with components:
values: one row per person per draw
estimates: companion posterior EAP summaries
audit: row-preparation audit
population_audit: optional person-level omission audit for
latent-regression scoring
input_data: cleaned canonical scoring rows retained from new_data
person_data: cleaned or supplied person-level background data used for
latent-regression scoring; NULL otherwise
settings: scoring settings
notes: interpretation notes
values contains one row per person per draw.
estimates contains the companion posterior EAP summaries from
predict_mfrm_units().
summary() reports draw counts and empirical draw summaries by person.
This helper does not update the calibration, estimate new non-person facet levels, or provide exact future true values. It samples from the fixed-grid posterior implied by the existing fixed calibration.
The underlying posterior scoring follows the usual quadrature-based EAP
framework of Bock and Aitkin (1981). The interpretation of multiple
posterior draws as plausible-value-style summaries follows the general logic
discussed by Mislevy (1991), while the current implementation remains a
practical fixed-calibration approximation rather than a full published
many-facet plausible-values method. For JML source fits, the quadrature
posterior uses a package-level standard normal reference prior for this
post hoc scoring layer.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177-196.
predict_mfrm_units(), summary.mfrm_plausible_values
toy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) new_units <- data.frame( Person = c("NEW01", "NEW01"), Rater = unique(toy$Rater)[1], Criterion = unique(toy$Criterion)[1:2], Score = c(2, 3) ) pv <- sample_mfrm_plausible_values(toy_fit, new_units, n_draws = 3, seed = 1) summary(pv)$draw_summarytoy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) new_units <- data.frame( Person = c("NEW01", "NEW01"), Rater = unique(toy$Rater)[1], Criterion = unique(toy$Criterion)[1:2], Score = c(2, 3) ) pv <- sample_mfrm_plausible_values(toy_fit, new_units, n_draws = 3, seed = 1) summary(pv)$draw_summary
Lightweight accessor that returns the per-facet empirical-Bayes
shrinkage table stored on a fit when facet_shrinkage != "none".
Returns NULL (with a message) when no shrinkage has been applied
so callers can probe without error.
shrinkage_report(fit)shrinkage_report(fit)
fit |
An |
A data.frame with one row per facet (and optionally
"Person") or NULL when shrinkage has not been applied.
apply_empirical_bayes_shrinkage(), fit_mfrm().
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25, facet_shrinkage = "empirical_bayes") shrinkage_report(fit)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25, facet_shrinkage = "empirical_bayes") shrinkage_report(fit)
Simulate arbitrary-facet RSM data.
simulate_mfrm_arbitrary_data( sim_spec, design_id = 1, design = NULL, seed = NULL, dif_effects = NULL, interaction_effects = NULL )simulate_mfrm_arbitrary_data( sim_spec, design_id = 1, design = NULL, seed = NULL, dif_effects = NULL, interaction_effects = NULL )
sim_spec |
Output from |
design_id |
Design row to use from |
design |
Optional one-row design override with the same columns as |
seed |
Optional random seed. |
dif_effects |
Optional DIF effect table. The table must include |
interaction_effects |
Optional interaction effect table. The table must include |
The arbitrary-facet generator samples
then applies row-matched dif_effects and interaction_effects as logit shifts before sampling ordered categories under a common-threshold RSM:
Higher simulated facet effects behave as more severe or more difficult levels, matching the sign convention used by fit_mfrm().
Specifications created by extract_mfrm_arbitrary_sim_spec() may reuse a fitted response skeleton, fitted person/facet estimates, retained weights, and the fitted rating range rather than generated labels and parameters.
Category labels are generated as 1:score_levels for design-first specifications. For fit-derived specifications, the fitted rating_min and rating_max are reused, so scales such as 0:3 remain on their original score metric.
A long-format data.frame with Study, Person, arbitrary facet columns, and Score. Attributes mfrm_truth and mfrm_simulation_spec store the generating values and reusable design metadata.
build_mfrm_arbitrary_sim_spec(),
extract_mfrm_arbitrary_sim_spec(),
fit_mfrm(),
summarize_mfrm_sim_design()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 12, facets = c(Rater = 3, Criteria = 2, Task = 3), facets_per_person = c(Rater = 2, Task = 2), score_levels = 4 ) sim <- simulate_mfrm_arbitrary_data(spec, seed = 1) head(sim) attr(sim, "mfrm_truth")$designspec <- build_mfrm_arbitrary_sim_spec( n_person = 12, facets = c(Rater = 3, Criteria = 2, Task = 3), facets_per_person = c(Rater = 2, Task = 2), score_levels = 4 ) sim <- simulate_mfrm_arbitrary_data(spec, seed = 1) head(sim) attr(sim, "mfrm_truth")$design
Simulate long-format many-facet Rasch data for design studies
simulate_mfrm_data( n_person = 50, n_rater = 4, n_criterion = 4, raters_per_person = n_rater, design = NULL, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, group_levels = NULL, dif_effects = NULL, interaction_effects = NULL, seed = NULL, model = c("RSM", "PCM", "GPCM"), step_facet = "Criterion", slope_facet = NULL, thresholds = NULL, slopes = NULL, assignment = NULL, sim_spec = NULL )simulate_mfrm_data( n_person = 50, n_rater = 4, n_criterion = 4, raters_per_person = n_rater, design = NULL, score_levels = 4, theta_sd = 1, rater_sd = 0.35, criterion_sd = 0.25, noise_sd = 0, step_span = 1.4, group_levels = NULL, dif_effects = NULL, interaction_effects = NULL, seed = NULL, model = c("RSM", "PCM", "GPCM"), step_facet = "Criterion", slope_facet = NULL, thresholds = NULL, slopes = NULL, assignment = NULL, sim_spec = NULL )
n_person |
Number of persons/respondents. |
n_rater |
Number of rater facet levels. |
n_criterion |
Number of criterion/item facet levels. |
raters_per_person |
Number of raters assigned to each person. |
design |
Optional named design override supplied as a named list,
named vector, or one-row data frame. When |
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
rater_sd |
Standard deviation of simulated rater severities. |
criterion_sd |
Standard deviation of simulated criterion difficulties. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread of step thresholds on the logit scale. |
group_levels |
Optional character vector of group labels. When supplied,
a balanced |
dif_effects |
Optional data.frame describing true group-linked DIF
effects. Must include |
interaction_effects |
Optional data.frame describing true non-group
interaction effects. Must include at least one design column such as
|
seed |
Optional random seed. |
model |
Measurement model recorded in the simulation setup. The current
public generator supports |
step_facet |
Step facet used when |
slope_facet |
Slope facet used when |
thresholds |
Optional threshold specification. Use either a numeric
vector of common thresholds or a data frame with columns |
slopes |
Optional slope specification used when |
assignment |
Assignment design. |
sim_spec |
Optional output from |
This function generates synthetic MFRM data from the Rasch model. The data-generating process is:
Draw person abilities:
Draw rater severities:
Draw criterion difficulties:
Generate evenly-spaced step thresholds spanning step_span/2
For each observation, compute the linear predictor
where
(optional)
Compute category probabilities under the recorded measurement model
(RSM, PCM, or bounded GPCM) and sample the response
Latent-value generation is explicit:
latent_distribution = "normal" draws centered normal person/rater/
criterion values using the supplied standard deviations
latent_distribution = "empirical" resamples centered support values
recorded in sim_spec$empirical_support
if sim_spec$population$active = TRUE, person measures are generated from
the stored latent-regression population model and template person
covariates rather than from theta_sd
When dif_effects is supplied, the specified logit shift is added to
for the focal group on the target facet level, creating a
known DIF signal. Similarly, interaction_effects injects a known
bias into specific facet-level combinations.
The generator targets the common two-facet rating design (persons
raters criteria). raters_per_person
controls the incomplete-block structure: when less than n_rater,
each person is assigned a rotating subset of raters to keep coverage
balanced and reproducible.
Threshold handling is intentionally explicit:
if thresholds = NULL, common equally spaced thresholds are generated
from step_span
if thresholds is a numeric vector, it is used as one common threshold set
if thresholds is a data frame, threshold values may vary by StepFacet
(currently Criterion or Rater)
For bounded GPCM, the generator now requires an explicit slope
contract in parallel with the threshold table. The current public branch
keeps slope_facet == step_facet and uses the internal category_prob_gpcm()
helper for
response sampling. Design-planning and forecasting helpers reuse this
slope-aware contract as caveated simulation/refit screening routes.
Assignment handling is also explicit:
"crossed" uses the full person x rater x criterion design
"rotating" assigns a deterministic rotating subset of raters per person
"resampled" reuses empirical person-level rater profiles stored in
sim_spec$assignment_profiles, optionally carrying over person-level
Group
"skeleton" reuses an observed person-by-rater-by-criterion response
skeleton stored in sim_spec$design_skeleton, optionally carrying over
Group and Weight
For more controlled workflows, build a reusable simulation specification
first via build_mfrm_sim_spec() or derive one from an observed fit with
extract_mfrm_sim_spec(), then pass it through sim_spec.
Returned data include attributes:
mfrm_truth: simulated true parameters (for parameter-recovery checks)
mfrm_truth$signals: injected DIF and interaction signal tables
mfrm_truth$slope_table: simulated discrimination table for bounded
GPCM
mfrm_population_data: generated one-row-per-person background data when
the simulation specification stores an active latent-regression generator,
including model-matrix xlevel and contrast provenance for categorical
covariates
mfrm_simulation_spec: generation settings (for reproducibility)
A long-format data.frame with core columns Study, Person,
two simulated non-person facet columns, and Score. By default those
facet columns are Rater and Criterion; when sim_spec records custom
public names, those names are used instead. If group labels are simulated
or reused from an observed response skeleton, a Group column is
included. If a weighted response skeleton is reused, a Weight column is
also included.
Higher theta values in mfrm_truth$person indicate higher person measures.
Higher values in mfrm_truth$facets$Rater indicate more severe raters.
Higher values in mfrm_truth$facets$Criterion indicate more difficult criteria.
mfrm_truth$signals$dif_effects and mfrm_truth$signals$interaction_effects
record any injected detection targets.
Generate one design with simulate_mfrm_data().
Fit with fit_mfrm() and diagnose with diagnose_mfrm().
For repeated design studies, use evaluate_mfrm_design().
evaluate_mfrm_design(), fit_mfrm(), diagnose_mfrm()
sim <- simulate_mfrm_data( n_person = 40, n_rater = 4, n_criterion = 4, raters_per_person = 2, seed = 123 ) head(sim) names(attr(sim, "mfrm_truth"))sim <- simulate_mfrm_data( n_person = 40, n_rater = 4, n_criterion = 4, raters_per_person = 2, seed = 123 ) head(sim) names(attr(sim, "mfrm_truth"))
Build a specification summary report (preferred alias)
specifications_report( fit, title = NULL, data_file = NULL, output_file = NULL, include_fixed = FALSE )specifications_report( fit, title = NULL, data_file = NULL, output_file = NULL, include_fixed = FALSE )
fit |
Output from |
title |
Optional analysis title. |
data_file |
Optional data-file label (for reporting only). |
output_file |
Optional output-file label (for reporting only). |
include_fixed |
If |
summary(out) is supported through summary().
plot(out) is dispatched through plot() for class
mfrm_specifications (type = "facet_elements",
"anchor_constraints", "convergence").
A named list with specification-report components. Class:
mfrm_specifications.
header / data_spec: run identity and model settings.
facet_labels: facet sizes and labels.
convergence_control: optimizer configuration and status.
Generate specifications_report(fit).
Verify model settings and convergence metadata.
Use the output as methods and run-documentation support in reports.
fit_mfrm(), data_quality_report(), estimation_iteration_report(),
mfrmr_reports_and_tables, mfrmr_compatibility_layer
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- specifications_report(fit, title = "Toy run") summary(out) p_spec <- plot(out, draw = FALSE) p_spec$data$plottoy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- specifications_report(fit, title = "Toy run") summary(out) p_spec <- plot(out, draw = FALSE) p_spec$data$plot
Build a subset connectivity report (preferred alias)
subset_connectivity_report( fit, diagnostics = NULL, top_n_subsets = NULL, min_observations = 0 )subset_connectivity_report( fit, diagnostics = NULL, top_n_subsets = NULL, min_observations = 0 )
fit |
Output from |
diagnostics |
Optional output from |
top_n_subsets |
Optional maximum number of subset rows to keep. |
min_observations |
Minimum observations required to keep a subset row. |
summary(out) is supported through summary().
plot(out) is dispatched through plot() for class
mfrm_subset_connectivity (type = "subset_observations",
"facet_levels", or "linking_matrix" / "coverage_matrix" /
"design_matrix").
A named list with subset-connectivity components. Class:
mfrm_subset_connectivity.
summary: number and size of connected subsets.
subset table: whether data are fragmented into disconnected components.
facet-level columns: where connectivity bottlenecks occur.
Run subset_connectivity_report(fit).
Confirm near-single-subset structure when possible.
Use results to justify linking/anchoring strategy.
diagnose_mfrm(), measurable_summary_table(), data_quality_report(),
mfrmr_linking_and_dff, mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- subset_connectivity_report(fit) summary(out) p_sub <- plot(out, draw = FALSE) p_design <- plot(out, type = "design_matrix", draw = FALSE) p_sub$data$plot p_design$data$plot out$summary[, c("Subset", "Observations", "ObservationPercent")]toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) out <- subset_connectivity_report(fit) summary(out) p_sub <- plot(out, draw = FALSE) p_design <- plot(out, type = "design_matrix", draw = FALSE) p_sub$data$plot p_design$data$plot out$summary[, c("Subset", "Observations", "ObservationPercent")]
Summarize an arbitrary-facet simulation design.
summarize_mfrm_sim_design(x, design_id = 1, design = NULL)summarize_mfrm_sim_design(x, design_id = 1, design = NULL)
x |
Simulated data from |
design_id |
Design row used when |
design |
Optional one-row design override used when |
The summary reports how many observations each facet level receives, how many levels of each facet are assigned to each person, and how completely each facet pair is covered. This is intended to make incomplete crossing visible before fitting or running repeated simulations.
A list of class mfrm_sim_design_summary with overview, facet-load, person-load, pair-coverage, and assignment tables.
plot_mfrm_sim_design(),
summarize_mfrm_sim_grid(),
simulate_mfrm_arbitrary_data()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 10, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2) ) summarize_mfrm_sim_design(spec)spec <- build_mfrm_arbitrary_sim_spec( n_person = 10, facets = c(Rater = 3, Criteria = 2, Task = 2), facets_per_person = c(Rater = 2) ) summarize_mfrm_sim_design(spec)
Summarize all rows in an arbitrary-facet simulation design grid.
summarize_mfrm_sim_grid(sim_spec, design_id = NULL)summarize_mfrm_sim_grid(sim_spec, design_id = NULL)
sim_spec |
Output from |
design_id |
Optional design rows to summarize. Use |
This grid summary is the multi-design companion to summarize_mfrm_sim_design(). It is useful when a planning grid varies n_Rater together with other facet counts such as n_Task, n_Criteria, or assignment counts such as Rater_per_person.
The returned table keeps the original design-grid columns and adds workload and coverage metrics. MeanObsPerPerson describes person-level rating load; MinPairCoverage and MeanPairCoverage summarize how completely each pair of non-person facets is crossed within the generated skeleton.
A data frame of class mfrm_sim_grid_summary.
plot_mfrm_sim_grid(),
plot_mfrm_sim_dashboard(),
list_mfrm_sim_metrics(),
summarize_mfrm_sim_design()
spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) grid <- summarize_mfrm_sim_grid(spec) grid[, c("design_id", "n_Rater", "n_Criteria", "n_Task", "Observations")]spec <- build_mfrm_arbitrary_sim_spec( n_person = 20, facets = list(Rater = c(2, 4), Criteria = c(2, 3), Task = c(2, 4)), facets_per_person = list(Rater = c(1, 2), Task = 2), score_levels = 4 ) grid <- summarize_mfrm_sim_grid(spec) grid[, c("design_id", "n_Rater", "n_Criteria", "n_Task", "Observations")]
Extract or aggregate the directional misfit rates from
evaluate_mfrm_design(). Unlike the legacy MeanMisfitRate column, which
reports the proportion of levels with , this helper separates
MnSq-band directions: underfit, overfit, mixed, and in-band.
summarize_simulation_misfit(x, by = NULL, digits = NULL)summarize_simulation_misfit(x, by = NULL, digits = NULL)
x |
Output from |
by |
Optional grouping variables. |
digits |
Optional number of digits for numeric columns. |
A data frame of class mfrm_simulation_misfit_summary.
evaluate_mfrm_design(), plot_simulation_misfit_rates(),
fit_direction_summary()
sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = 20, n_rater = 3, n_criterion = 2, raters_per_person = 2, reps = 1, maxit = 10, seed = 42 )) summarize_simulation_misfit(sim_eval)sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = 20, n_rater = 3, n_criterion = 2, raters_per_person = 2, reps = 1, maxit = 10, seed = 42 )) summarize_simulation_misfit(sim_eval)
Summarize an APA/FACETS table object
## S3 method for class 'apa_table' summary(object, digits = 3, top_n = 8, ...)## S3 method for class 'apa_table' summary(object, digits = 3, top_n = 8, ...)
object |
Output from |
digits |
Number of digits used for numeric summaries. |
top_n |
Maximum numeric columns shown in |
... |
Reserved for generic compatibility. |
Compact summary helper for QA of table payloads before manuscript export.
An object of class summary.apa_table.
overview: table size/composition and missingness.
numeric_profile: quick distribution summary of numeric columns.
caption/note: text metadata readiness.
Build table with apa_table().
Run summary(tbl) and inspect overview.
Use plot.apa_table() for quick numeric checks if needed.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tbl <- apa_table(fit, which = "summary") summary(tbl)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) tbl <- apa_table(fit, which = "summary") summary(tbl)
Summarize an anchor-audit object
## S3 method for class 'mfrm_anchor_audit' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_anchor_audit' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for numeric rounding. |
top_n |
Maximum rows shown in issue previews. |
... |
Reserved for generic compatibility. |
This summary provides a compact pre-estimation audit of anchor and group-anchor specifications.
An object of class summary.mfrm_anchor_audit.
Recommended order:
issue_counts: primary triage table (non-zero issues first).
facet_summary: anchored/grouped/free-level balance by facet.
level_observation_summary and category_counts: sparse-cell diagnostics.
recommendations: concrete remediation suggestions.
If issue_counts is non-empty, treat anchor constraints as provisional and
resolve issues before final estimation.
Run audit_mfrm_anchors() with intended anchors/group anchors.
Review summary(aud) and recommendations.
Revise anchor tables, then call fit_mfrm().
audit_mfrm_anchors(), fit_mfrm()
toy <- load_mfrmr_data("example_core") aud <- audit_mfrm_anchors(toy, "Person", c("Rater", "Criterion"), "Score") summary(aud)toy <- load_mfrmr_data("example_core") aud <- audit_mfrm_anchors(toy, "Person", c("Rater", "Criterion"), "Score") summary(aud)
Summarize APA report-output bundles
## S3 method for class 'mfrm_apa_outputs' summary(object, top_n = 3, preview_chars = 160, ...)## S3 method for class 'mfrm_apa_outputs' summary(object, top_n = 3, preview_chars = 160, ...)
object |
Output from |
top_n |
Maximum non-empty lines shown in each component preview. |
preview_chars |
Maximum characters shown in each preview cell. |
... |
Reserved for generic compatibility. |
This summary is a diagnostics layer for APA text products, not a replacement for the full narrative.
It reports component completeness, line/character volume, and a compact preview for quick QA before manuscript insertion.
An object of class summary.mfrm_apa_outputs.
overview: total coverage across standard text components.
components: per-component density and mention checks
(including residual-PCA mentions).
sections: package-native section coverage table.
content_checks: contract-based alignment checks for APA drafting readiness.
overview$DraftContractPass: the primary contract-completeness flag for
draft text components.
overview$ReadyForAPA: a backward-compatible alias of that contract flag,
not a certification of inferential adequacy.
overview$AnalysisReady: a stricter manuscript-readiness flag that also
requires model convergence and a formal precision tier.
preview: first non-empty lines for fast visual review.
Build outputs via build_apa_outputs().
Run summary(apa) to screen for empty/short components.
Use apa$report_text, apa$table_figure_notes,
and apa$table_figure_captions as draft components for final-text review.
build_apa_outputs(), summary()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "both") apa <- build_apa_outputs(fit, diag) summary(apa)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "both") apa <- build_apa_outputs(fit, diag) summary(apa)
mfrm_bias object in a user-friendly formatSummarize an mfrm_bias object in a user-friendly format
## S3 method for class 'mfrm_bias' summary(object, digits = 3, top_n = 10, p_cut = 0.05, ...)## S3 method for class 'mfrm_bias' summary(object, digits = 3, top_n = 10, p_cut = 0.05, ...)
object |
Output from |
digits |
Number of digits for printed numeric values. |
top_n |
Number of strongest bias rows to keep. |
p_cut |
Significance cutoff used for counting flagged rows. |
... |
Reserved for generic compatibility. |
This method returns a compact interaction-bias summary:
interaction facets/order and analyzed cell counts
effect-size profile (|bias| mean/max, significant cell count)
fixed-effect chi-square block
iteration-end convergence indicators
top rows ranked by absolute t
An object of class summary.mfrm_bias with:
overview: interaction facets/order, cell counts, and effect-size profile
chi_sq: fixed-effect chi-square block
final_iteration: end-of-iteration status row
top_rows: highest-|t| interaction rows
notes: short interpretation notes
overview: interaction order, analyzed cells, and effect-size profile.
chi_sq: fixed-effect test block.
final_iteration: end-of-loop status from the bias routine.
top_rows: strongest bias contrasts by |t|.
Estimate interactions with estimate_bias().
Check summary(bias) for screen-positive and unstable cells.
Use bias_interaction_report() or plot_bias_interaction() for details.
estimate_bias(), bias_interaction_report()
toy <- load_mfrmr_data("example_bias") toy <- toy[toy$Person %in% unique(toy$Person)[1:8], ] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 50) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 1) summary(bias)toy <- load_mfrmr_data("example_bias") toy <- toy[toy$Person %in% unique(toy$Person)[1:8], ] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 50) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 1) summary(bias)
Summarize report/table bundles in a user-friendly format
## S3 method for class 'mfrm_bundle' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_bundle' summary(object, digits = 3, top_n = 10, ...)
object |
Any report bundle produced by |
digits |
Number of digits for printed numeric values. |
top_n |
Number of preview rows shown from the main table component. |
... |
Reserved for generic compatibility. |
This method provides a compact summary for bundle-like outputs (for example: unexpected-response, fair-average, chi-square, and category report objects). It extracts:
object class and available components
one-row summary table when available
preview rows from the main data component
resolved settings/options
Branch-aware summaries are provided for:
mfrm_bias_count (branch = "original" / "facets")
mfrm_fixed_reports (branch = "original" / "facets")
mfrm_visual_summaries (branch = "original" / "facets")
Additional class-aware summaries are provided for:
mfrm_unexpected, mfrm_fair_average, mfrm_displacement
mfrm_interrater, mfrm_facets_chisq, mfrm_bias_interaction
mfrm_rating_scale, mfrm_category_structure, mfrm_category_curves
mfrm_measurable, mfrm_unexpected_after_bias, mfrm_output_bundle
mfrm_residual_pca, mfrm_specifications, mfrm_data_quality
mfrm_iteration_report, mfrm_subset_connectivity, mfrm_facet_statistics
mfrm_parity_report, mfrm_reference_benchmark
An object of class summary.mfrm_bundle.
overview: class, component count, and selected preview component.
summary: one-row aggregate block when supplied by the bundle.
preview: first top_n rows from the main table-like component.
settings: resolved option values if available.
validation_scope: internal-versus-external validation scope when
summarizing mfrm_reference_benchmark.
conquest_command_scope: ConQuest command-template scope when summarizing
mfrm_conquest_overlap_bundle.
conquest_output_contract: requested ConQuest outputs and audit handoff
when summarizing mfrm_conquest_overlap_bundle.
normalization_scope: extracted-table normalization scope when summarizing
mfrm_conquest_overlap_tables.
audit_scope: supplied-table audit scope when summarizing
mfrm_conquest_overlap_audit.
conquest_overlap_checks / population_policy_checks: specialized
benchmark check previews when summarizing mfrm_reference_benchmark.
Generate a bundle table/report helper output.
Run summary(bundle) for compact QA.
Drill into specific components via $ and visualize with plot(bundle, ...).
unexpected_response_table(), fair_average_table(), plot()
toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) t4 <- unexpected_response_table(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 5) summary(t4) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) t11 <- bias_count_table(bias, branch = "facets") summary(t11)toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) t4 <- unexpected_response_table(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 5) summary(t4) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) t11 <- bias_count_table(bias, branch = "facets") summary(t11)
Summarize a data-description object
## S3 method for class 'mfrm_data_description' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_data_description' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for numeric rounding. |
top_n |
Maximum rows shown in preview blocks. |
... |
Reserved for generic compatibility. |
This summary is intended as a compact pre-fit quality snapshot for manuscripts and analysis logs.
An object of class summary.mfrm_data_description.
overview: design/sample counts
missing: top columns by missingness
score_distribution: compact score-usage table, including zero-count
categories retained by the prepared score support
facet_overview: facet-level coverage summary
agreement: inter-rater agreement summary when available
reporting_map: manuscript-oriented guide to what is covered here versus
which companion outputs should be consulted
caveats: structured warning/review rows for score-support issues;
print(summary(ds)) shows a compact Caveats block when rows are present
Recommended read order:
overview: sample size, persons/facets/categories.
missing: missingness hotspots by selected input columns.
score_distribution: category usage balance.
notes / printed Caveats: retained zero-count score categories and
related score-support caveats; intermediate unused categories should be
treated as threshold-functioning warnings before model fitting.
facet_overview: coverage per facet (minimum/maximum weighted counts).
agreement: observed-score inter-rater agreement (when available).
Very low MinWeightedN in facet_overview is a practical warning for
unstable downstream facet estimates.
Run describe_mfrm_data() on raw long-format data.
Inspect summary(ds) before model fitting.
Resolve sparse/missing issues, then run fit_mfrm().
describe_mfrm_data(), summary.mfrm_fit()
toy <- load_mfrmr_data("example_core") ds <- describe_mfrm_data(toy, "Person", c("Rater", "Criterion"), "Score") summary(ds)toy <- load_mfrmr_data("example_core") ds <- describe_mfrm_data(toy, "Person", c("Rater", "Criterion"), "Score") summary(ds)
Summarize a design-simulation study
## S3 method for class 'mfrm_design_evaluation' summary(object, digits = 3, ...)## S3 method for class 'mfrm_design_evaluation' summary(object, digits = 3, ...)
object |
Output from |
digits |
Number of digits used in the returned numeric summaries. |
... |
Reserved for generic compatibility. |
The summary emphasizes condition-level averages that are useful for practical design planning, especially:
convergence rate
separation and reliability by facet
severity recovery RMSE
mean ZSTD misfit rate and directional MnSq underfit/overfit rates
An object of class summary.mfrm_design_evaluation with components:
overview: run-level overview
design_summary: aggregated design-by-facet metrics, with design-variable
alias columns when applicable
ademp: simulation-study metadata carried forward from the original object
facet_names: public facet labels carried from the simulation specification
design_variable_aliases: accepted public aliases for design variables
design_descriptor: role-based design-variable metadata
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of mutable/locked design variables
planning_schema: combined planner-schema contract
future_branch_active_summary: compact deterministic summary of the
schema-only future arbitrary-facet planning branch embedded in the current
planning schema
notes: short interpretation notes
evaluate_mfrm_design(), plot.mfrm_design_evaluation
sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(8, 12), n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 8, seed = 123 )) s <- summary(sim_eval) s$overview head(s$design_summary)sim_eval <- suppressWarnings(evaluate_mfrm_design( n_person = c(8, 12), n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 8, seed = 123 )) s <- summary(sim_eval) s$overview head(s$design_summary)
mfrm_diagnostics object in a user-friendly formatSummarize an mfrm_diagnostics object in a user-friendly format
## S3 method for class 'mfrm_diagnostics' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_diagnostics' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for printed numeric values. |
top_n |
Number of highest-absolute-Z fit rows to keep. |
... |
Reserved for generic compatibility. |
This method returns a compact diagnostics summary designed for quick review:
design overview (observations, persons, facets, categories, subsets)
diagnostic-basis guide for legacy versus strict fit paths
global fit statistics
approximate reliability/separation by facet
top facet/person fit rows by absolute ZSTD
counts of flagged diagnostics (unexpected, displacement, interactions)
An object of class summary.mfrm_diagnostics with:
overview: design-level counts and residual-PCA mode
status: concise front-door status block for quick review
key_warnings: highest-priority warnings to review first
next_actions: recommended follow-up helpers
diagnostic_basis: guide to legacy versus strict diagnostic targets
overall_fit: global fit block
precision_profile: design-weighted precision summary across the
information curve at decile theta points
precision_audit: separation / reliability / strata audit for the
sample- and population-basis modes (paired with precision_profile)
reliability: facet-level separation/reliability summary
facets_chisq: facets-style fixed-effect chi-square heterogeneity
screen across non-person facets
interrater: inter-rater agreement / pairwise correlation / rater
separation overview when a Rater facet is present
misfit_flagged: rows flagged by the Infit / Outfit / ZSTD
misfit thresholds active for this fit
misfit_thresholds: named numeric vector with the misfit
lower / upper thresholds used to populate misfit_flagged
misfit_threshold_label / misfit_threshold_note: wording that
identifies whether the active band is the package default or a
configured/custom screening convention
category_usage: per-category response-frequency summary used
to flag empty / collapsed categories
top_fit: top |ZSTD| rows
marginal_fit: optional strict marginal-fit overview when requested
top_marginal_cells: largest strict marginal residual cells when requested
marginal_pairwise: optional strict pairwise local-dependence overview
top_marginal_pairs: largest strict pairwise residual summaries
marginal_guidance: interpretation labels for strict marginal diagnostics
reporting_map: manuscript-oriented guide to what is covered here versus
which companion outputs should be consulted
flags: compact flag counts for major diagnostics
notes: short interpretation notes
digits: numeric-print precision threaded through to
print.summary.mfrm_diagnostics()
overview: analysis scale, subset count, and residual-PCA mode.
diagnostic_basis: plain-language map of which fit path was computed and
what each path means statistically.
overall_fit: global fit indices.
reliability: facet separation/reliability block, including model and
real bounds when available.
top_fit: highest |ZSTD| elements for immediate inspection.
flags: compact counts for key warning domains.
Run diagnostics with diagnose_mfrm(), using diagnostic_mode = "both"
for RSM / PCM when you want legacy continuity plus strict marginal screening.
Review summary(diag) for major warnings and inspect diagnostic_basis
before comparing legacy and strict outputs.
Follow up with dedicated tables/plots for flagged domains.
diagnose_mfrm(), summary.mfrm_fit()
toy <- load_mfrmr_data("example_core") toy <- toy[toy$Person %in% unique(toy$Person)[1:4], ] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 50) diag <- diagnose_mfrm(fit, residual_pca = "none") s <- summary(diag, top_n = 3) s$key_warnings # Look for: lines beginning with "MnSq misfit:" name the worst # element + Infit / Outfit values; "Unexpected responses flagged" # counts how many cell-level surprises the screen returned. s$top_fit # Look for: rows with |InfitZSTD| or |OutfitZSTD| > 2 are misfitting # at the 5% level; > 3 is misfitting at the 1% level. Investigate # in order of the AbsZ column. s$facets_chisq # Look for: FixedProb < 0.05 in each non-Person facet means the # facet contributes meaningful spread; FixedProb >= 0.05 means # that facet is statistically indistinguishable.toy <- load_mfrmr_data("example_core") toy <- toy[toy$Person %in% unique(toy$Person)[1:4], ] fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 50) diag <- diagnose_mfrm(fit, residual_pca = "none") s <- summary(diag, top_n = 3) s$key_warnings # Look for: lines beginning with "MnSq misfit:" name the worst # element + Infit / Outfit values; "Unexpected responses flagged" # counts how many cell-level surprises the screen returned. s$top_fit # Look for: rows with |InfitZSTD| or |OutfitZSTD| > 2 are misfitting # at the 5% level; > 3 is misfitting at the 1% level. Investigate # in order of the AbsZ column. s$facets_chisq # Look for: FixedProb < 0.05 in each non-Person facet means the # facet contributes meaningful spread; FixedProb >= 0.05 means # that facet is statistically indistinguishable.
Summarize a facet-quality dashboard
## S3 method for class 'mfrm_facet_dashboard' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_facet_dashboard' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for printed numeric values. |
top_n |
Number of flagged levels to preview. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_facet_dashboard.
facet_quality_dashboard(), plot_facet_quality_dashboard()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") summary(facet_quality_dashboard(fit, diagnostics = diag))toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") summary(facet_quality_dashboard(fit, diagnostics = diag))
Summarize a legacy-compatible workflow run
## S3 method for class 'mfrm_facets_run' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_facets_run' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for numeric rounding in summaries. |
top_n |
Maximum rows shown in nested preview tables. |
... |
Passed through to nested summary methods. |
This method returns a compact cross-object summary that combines:
model overview (object$fit$summary)
resolved column mapping
run settings (run_info)
nested summaries of fit and diagnostics
An object of class summary.mfrm_facets_run.
overview: convergence, information criteria, and scale size.
mapping: sanity check for auto/explicit column mapping.
fit / diagnostics: drill-down summaries for reporting decisions.
Run run_mfrm_facets() to execute a one-shot pipeline.
Inspect with summary(out) for mapping and convergence checks.
Review nested objects (out$fit, out$diagnostics) as needed.
run_mfrm_facets(), summary.mfrm_fit(), mfrmr_workflow_methods,
summary()
toy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:8], , drop = FALSE] out <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 25 ) s <- summary(out) s$overview[, c("Model", "Method", "Converged")] s$mappingtoy <- load_mfrmr_data("example_core") toy_small <- toy[toy$Person %in% unique(toy$Person)[1:8], , drop = FALSE] out <- run_mfrm_facets( data = toy_small, person = "Person", facets = c("Rater", "Criterion"), score = "Score", maxit = 25 ) s <- summary(out) s$overview[, c("Model", "Method", "Converged")] s$mapping
mfrm_fit object in a user-friendly formatSummarize an mfrm_fit object in a user-friendly format
## S3 method for class 'mfrm_fit' summary(object, digits = 3, top_n = 5, ...)## S3 method for class 'mfrm_fit' summary(object, digits = 3, top_n = 5, ...)
object |
Output from |
digits |
Number of digits for printed numeric values. |
top_n |
Number of extreme facet/person rows shown in summaries. |
... |
Reserved for generic compatibility. |
This method provides a compact, human-readable summary oriented to reporting. It returns a structured object and prints:
model fit overview (N, LogLik, AIC/BIC, convergence)
preprocessing row counts retained/dropped before estimation
estimation settings that affect identification/scoring interpretation
facet-level estimate distribution (mean/SD/range)
person measure distribution
step/threshold checks
a reporting map showing which companion summaries/tables should be used for manuscript-oriented data description, diagnostics, category checks, and draft reporting
high/low person measures and extreme facet levels
An object of class summary.mfrm_fit with:
overview: global model/fit indicators
status: concise front-door status block for quick review
key_warnings: highest-priority warnings to review first
next_actions: recommended follow-up helpers
population_overview: current population-model basis, residual variance,
and omission audit
population_coefficients: fitted latent-regression coefficients when a
population model is active
population_design: latent-regression design-matrix column audit when a
population model is active
population_coding: categorical covariate levels and contrast provenance
when a population model uses model-matrix coding
facet_overview: per-facet estimate distribution summary
person_overview: person-measure distribution summary
targeting: person-versus-non-person facet targeting overview
(Wright-map-style mean/SD comparison)
step_overview: threshold/step diagnostics
slope_overview: discrimination summary for GPCM fits
interaction_overview: model-estimated facet-interaction summary
when the fit was specified with facet_interactions
settings_overview: estimation-settings overview that pins the
configuration that affects identification/scoring
data_quality_overview: retained/dropped row counts from model
preprocessing
attached_diagnostics: logical flag indicating whether the
mfrm_fit was returned with diagnostics already attached
attached_diagnostics_cols: character vector of diagnostic
columns attached to fit$facets$person when
attached_diagnostics = TRUE
reporting_map: routing map showing which companion summaries
and tables should be used for the four manuscript-oriented
reporting sections (data description, diagnostics, category
checks, draft reporting)
person_high / person_low: highest and lowest person measures
facet_extremes: extreme facet-level estimates
caveats: structured warning/review rows for score-support and
latent-regression population-model issues
notes: short interpretation notes
digits: numeric-print precision threaded through to
print.summary.mfrm_fit()
overview: convergence and information criteria.
facet_overview: per-facet spread and range of estimates.
person_overview: distribution of person measures.
step_overview: threshold spread and monotonicity checks.
settings_overview: estimation settings that affect interpretation.
data_quality_overview: row counts retained or dropped before estimation;
use data_quality_report() with the original data for a full missingness,
unknown-element, and category-use audit.
population_coding: fitted categorical levels and contrasts that must be
reused when scoring new persons under the population-model posterior.
key_warnings / notes: short triage subset of retained zero-count score
categories and latent-regression population-model caveats such as
complete-case omissions, zero-variance design columns, missing
coefficients, or unstable residual variance when present. Incomplete or
non-finite covariates are normally handled before fitting as input errors
or complete-case omissions; they appear here only if retained in a
population-design audit row.
caveats: structured rows behind those warnings for appendix/export use;
print(summary(fit)) shows a compact Caveats block when rows are present.
reporting_map: where to get companion outputs for manuscript reporting.
top_person / top_facet: extreme estimates for quick triage.
Fit model with fit_mfrm().
Run summary(fit) for first-pass diagnostics.
For RSM / PCM, continue with diagnose_mfrm() for element-level fit
checks. For bounded GPCM, continue with compute_information() /
plot_information() or the fixed-calibration posterior scoring helpers.
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 15 ) s <- summary(fit) s$overview[, c("Model", "Method", "Converged")] # Look for: Converged = TRUE. If FALSE the fit is not safe to report; # raise `maxit`, relax `reltol`, or rerun with `quad_points = 31`. s$person_overview # Look for: Mean ~ 0 (logits) and SD ~ 1 are typical when the sample # is centred on the test difficulty. Min < -3 or Max > 3 with # `Extreme = "min"/"max"` rows indicates ceiling / floor cases. s$targeting # Look for: |Targeting| < ~0.5 logits across non-person facets is # comfortable. Larger absolute values mean the test is systematically # easier or harder than the person sample. SpreadRatio > 2 means # persons dominate facet variability; < 0.5 means facets dominate.toy <- load_mfrmr_data("example_core") fit <- fit_mfrm( toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 15 ) s <- summary(fit) s$overview[, c("Model", "Method", "Converged")] # Look for: Converged = TRUE. If FALSE the fit is not safe to report; # raise `maxit`, relax `reltol`, or rerun with `quad_points = 31`. s$person_overview # Look for: Mean ~ 0 (logits) and SD ~ 1 are typical when the sample # is centred on the test difficulty. Min < -3 or Max > 3 with # `Extreme = "min"/"max"` rows indicates ceiling / floor cases. s$targeting # Look for: |Targeting| < ~0.5 logits across non-person facets is # comfortable. Larger absolute values mean the test is systematically # easier or harder than the person sample. SpreadRatio > 2 means # persons dominate facet variability; < 0.5 means facets dominate.
Summarize a future arbitrary-facet planning active branch
## S3 method for class 'mfrm_future_branch_active_branch' summary(object, digits = 3, top_n = 8, ...)## S3 method for class 'mfrm_future_branch_active_branch' summary(object, digits = 3, top_n = 8, ...)
object |
Output from the future-branch active planning scaffold stored
in |
digits |
Number of digits used in numeric summaries. |
top_n |
Maximum number of recommendation rows to print in the preview. |
... |
Reserved for generic compatibility. |
This summary is intentionally conservative. It aggregates only deterministic
branch-side quantities already validated in the schema-first arbitrary-facet
planning scaffold: observation bookkeeping, load/balance, coverage,
guardrails, structural readiness, and conservative recommendation ranking.
It also exposes the same manuscript-facing table/appendix metadata used by
build_summary_table_bundle() so the future branch can be reviewed directly
without first routing through planning summaries. In addition to bundle-level
appendix presets and section counts, it includes export-like appendix
selection summaries by preset, reporting role, manuscript section,
bundle-aware handoff summaries, preset-specific table surface, and a
table-level handoff crosswalk, plus direct role_summary / table_profile
surfaces for table-shape review.
It does not report psychometric recovery or Monte Carlo performance.
An object of class summary.mfrm_future_branch_active_branch.
summary.mfrm_design_evaluation(), plot.mfrm_future_branch_active_branch()
Summarize a linking-review object
## S3 method for class 'mfrm_linking_review' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_linking_review' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for printed numeric values. |
top_n |
Number of top linking-risk rows to keep in the compact summary. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_linking_review.
Summarize a misfit-casebook object
## S3 method for class 'mfrm_misfit_casebook' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_misfit_casebook' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for printed numeric values. |
top_n |
Number of top case rows to keep in the compact summary. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_misfit_casebook.
Summarize approximate plausible values from posterior scoring
## S3 method for class 'mfrm_plausible_values' summary(object, digits = 3, ...)## S3 method for class 'mfrm_plausible_values' summary(object, digits = 3, ...)
object |
Output from |
digits |
Number of digits used in numeric summaries. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_plausible_values with:
draw_summary: empirical summaries of the sampled values by person
estimates: companion posterior EAP summaries
audit: row-preparation audit
population_audit: optional person-level omission audit for
latent-regression scoring
settings: scoring settings
notes: interpretation notes
sample_mfrm_plausible_values()
toy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) new_units <- data.frame( Person = c("NEW01", "NEW01"), Rater = unique(toy$Rater)[1], Criterion = unique(toy$Criterion)[1:2], Score = c(2, 3) ) pv <- sample_mfrm_plausible_values(toy_fit, new_units, n_draws = 3, seed = 1) summary(pv)toy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) new_units <- data.frame( Person = c("NEW01", "NEW01"), Rater = unique(toy$Rater)[1], Criterion = unique(toy$Criterion)[1:2], Score = c(2, 3) ) pv <- sample_mfrm_plausible_values(toy_fit, new_units, n_draws = 3, seed = 1) summary(pv)
Summarize a population-level design forecast
## S3 method for class 'mfrm_population_prediction' summary(object, digits = 3, ...)## S3 method for class 'mfrm_population_prediction' summary(object, digits = 3, ...)
object |
Output from |
digits |
Number of digits used in numeric summaries. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_population_prediction with:
design: requested future design
overview: run-level overview
forecast: facet-level forecast table
facet_names: public non-person facet names used in the forecast
design_variable_aliases: public aliases for design variables
design_descriptor: role-based description of design variables
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of mutable/locked design variables
planning_schema: combined planner-schema contract
future_branch_active_summary: compact deterministic summary of the
schema-only future arbitrary-facet planning branch embedded in the current
planning schema
ademp: simulation-study metadata
notes: interpretation notes
spec <- build_mfrm_sim_spec( n_person = 16, n_rater = 3, n_criterion = 2, raters_per_person = 2, assignment = "rotating" ) pred <- predict_mfrm_population( sim_spec = spec, design = list(person = 18), reps = 1, maxit = 5, seed = 123 ) s <- summary(pred) s$overview s$forecast[, c("Facet", "MeanSeparation", "McseSeparation")]spec <- build_mfrm_sim_spec( n_person = 16, n_rater = 3, n_criterion = 2, raters_per_person = 2, assignment = "rotating" ) pred <- predict_mfrm_population( sim_spec = spec, design = list(person = 18), reps = 1, maxit = 5, seed = 123 ) s <- summary(pred) s$overview s$forecast[, c("Facet", "MeanSeparation", "McseSeparation")]
Summarize a reporting-checklist bundle for manuscript work
## S3 method for class 'mfrm_reporting_checklist' summary(object, top_n = 10, ...)## S3 method for class 'mfrm_reporting_checklist' summary(object, top_n = 10, ...)
object |
Output from |
top_n |
Maximum number of draft-action rows shown in the compact action table. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_reporting_checklist with:
overview: run-level counts of available and draft-ready items
section_summary: section-level checklist coverage
software_scope: external-software relationship summary
visual_scope: plotting-route and 3D-ready payload summary, including
the main InterpretationCheck caveat for each visual family and any
model-specific SupportStatus / ModelCaveat columns
priority_summary: counts by priority/severity
action_items: highest-priority rows that still need draft work
settings: checklist settings rendered as a compact table
notes: interpretation notes
reporting_checklist(), summary.mfrm_apa_outputs
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") chk <- reporting_checklist(fit, diagnostics = diag) summary(chk)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "MML", maxit = 200) diag <- diagnose_mfrm(fit, residual_pca = "both", diagnostic_mode = "both") chk <- reporting_checklist(fit, diagnostics = diag) summary(chk)
Summarize a DIF/bias screening simulation
## S3 method for class 'mfrm_signal_detection' summary(object, digits = 3, ...)## S3 method for class 'mfrm_signal_detection' summary(object, digits = 3, ...)
object |
Output from |
digits |
Number of digits used in numeric summaries. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_signal_detection with:
overview: run-level overview
detection_summary: aggregated detection rates by design, with
design-variable alias columns when applicable
ademp: simulation-study metadata carried forward from the original object
facet_names: public facet labels carried from the simulation specification
design_variable_aliases: accepted public aliases for design variables
design_descriptor: role-based design-variable metadata
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of mutable/locked design variables
planning_schema: combined planner-schema contract
future_branch_active_summary: compact deterministic summary of the
schema-only future arbitrary-facet planning branch embedded in the current
planning schema
notes: short interpretation notes, including the bias-side screening caveat
evaluate_mfrm_signal_detection(), plot.mfrm_signal_detection
sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection( n_person = 8, n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 5, bias_max_iter = 1, seed = 123 )) summary(sig_eval)sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection( n_person = 8, n_rater = 2, n_criterion = 2, raters_per_person = 1, reps = 1, maxit = 5, bias_max_iter = 1, seed = 123 )) summary(sig_eval)
Summarize a summary-table bundle for manuscript QC
## S3 method for class 'mfrm_summary_table_bundle' summary(object, digits = 3, top_n = 8, ...)## S3 method for class 'mfrm_summary_table_bundle' summary(object, digits = 3, top_n = 8, ...)
object |
Output from |
digits |
Number of digits used for numeric summaries. |
top_n |
Maximum number of table-profile rows to keep. |
... |
Reserved for generic compatibility. |
This summary is designed to answer a manuscript-facing question: which reporting tables are available, how large are they, which roles do they serve, and which of them contain numeric content suitable for quick plotting or appendix export.
An object of class summary.mfrm_summary_table_bundle.
overview: source class, returned-table count, note count, and whether a
numeric table is available for plotting.
role_summary: counts and total size by reporting role.
table_catalog: complete returned-table registry with plot/export bridges.
table_profile: table-level dimensions, numeric-column counts, and missing
values for the largest returned tables.
plot_index: which returned tables are plot-ready and which bundle-level
numeric QC routes they support.
appendix_presets: conservative all / recommended / compact
plus section-aware methods / results / diagnostics / reporting
appendix-export presets derived from table roles.
appendix_role_summary: counts of returned tables by reporting role under
the same conservative appendix routing used by the bundle catalog.
appendix_section_summary: counts of returned tables by manuscript-facing
appendix section.
selection_handoff_table_summary: workflow-only table-level appendix
handoff crosswalk when present in the bundle.
selection_handoff_preset_summary: workflow-only appendix handoff overview
aggregated at the preset level when present in the bundle.
selection_handoff_bundle_summary: workflow-only appendix handoff
overview aggregated at the bundle-by-section level when present in the
bundle.
selection_handoff_role_summary: workflow-only appendix handoff overview
aggregated at the reporting-role level when present in the bundle.
selection_handoff_role_section_summary: workflow-only appendix handoff
overview aggregated at the reporting-role by appendix-section level when
present in the bundle.
selection_summary, selection_table_summary,
selection_table_preset_summary, selection_role_summary,
selection_section_summary, and selection_catalog: preset-filtered
appendix selection surfaces when workflow-only handoff tables are embedded
in the bundle.
reporting_map: where to go next for plotting, APA formatting, and export.
notes: carried forward source-level caveats from the originating summary.
Build bundle <- build_summary_table_bundle(summary(...)).
Run summary(bundle) to see reporting coverage.
Use plot(bundle, type = "table_rows") or
plot(bundle, type = "numeric_profile", which = ...) for quick QC.
build_summary_table_bundle(), apa_table(), plot()
toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) bundle <- build_summary_table_bundle(fit) summary(bundle)toy <- load_mfrmr_data("example_core") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) bundle <- build_summary_table_bundle(fit) summary(bundle)
Summarize threshold-profile presets for visual warning logic
## S3 method for class 'mfrm_threshold_profiles' summary(object, digits = 3, ...)## S3 method for class 'mfrm_threshold_profiles' summary(object, digits = 3, ...)
object |
Output from |
digits |
Number of digits used for numeric summaries. |
... |
Reserved for generic compatibility. |
Summarizes available warning presets and their PCA reference bands used by
build_visual_summaries().
An object of class summary.mfrm_threshold_profiles.
thresholds: raw preset values by profile (strict, standard, lenient).
threshold_ranges: per-threshold span across profiles (sensitivity to profile choice).
pca_reference: literature bands used for PCA narrative labeling.
Larger Span in threshold_ranges indicates settings that most change
warning behavior between strict and lenient modes.
Inspect summary(mfrm_threshold_profiles()).
Choose profile (strict / standard / lenient) for project policy.
Override selected thresholds in build_visual_summaries() only when justified.
mfrm_threshold_profiles(), build_visual_summaries()
profiles <- mfrm_threshold_profiles() summary(profiles)profiles <- mfrm_threshold_profiles() summary(profiles)
Summarize posterior unit scoring output
## S3 method for class 'mfrm_unit_prediction' summary(object, digits = 3, ...)## S3 method for class 'mfrm_unit_prediction' summary(object, digits = 3, ...)
object |
Output from |
digits |
Number of digits used in numeric summaries. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_unit_prediction with:
estimates: posterior summaries by person
audit: row-preparation audit
population_audit: optional person-level omission audit for
latent-regression scoring
settings: scoring settings
notes: interpretation notes
toy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) new_units <- data.frame( Person = c("NEW01", "NEW01"), Rater = unique(toy$Rater)[1], Criterion = unique(toy$Criterion)[1:2], Score = c(2, 3) ) pred_units <- predict_mfrm_units(toy_fit, new_units) summary(pred_units)toy <- load_mfrmr_data("example_core") keep_people <- unique(toy$Person)[1:18] toy_fit <- suppressWarnings( fit_mfrm( toy[toy$Person %in% keep_people, , drop = FALSE], "Person", c("Rater", "Criterion"), "Score", method = "MML", quad_points = 5, maxit = 15 ) ) new_units <- data.frame( Person = c("NEW01", "NEW01"), Rater = unique(toy$Rater)[1], Criterion = unique(toy$Criterion)[1:2], Score = c(2, 3) ) pred_units <- predict_mfrm_units(toy_fit, new_units) summary(pred_units)
Summarize a weighting-audit object
## S3 method for class 'mfrm_weighting_audit' summary(object, digits = 3, top_n = 10, ...)## S3 method for class 'mfrm_weighting_audit' summary(object, digits = 3, top_n = 10, ...)
object |
Output from |
digits |
Number of digits for printed numeric values. |
top_n |
Number of top rows to retain in compact summary tables. |
... |
Reserved for generic compatibility. |
An object of class summary.mfrm_weighting_audit.
Build an unexpected-after-adjustment screening report
unexpected_after_bias_table( fit, bias_results, diagnostics = NULL, abs_z_min = 2, prob_max = 0.3, top_n = 100, rule = c("either", "both") )unexpected_after_bias_table( fit, bias_results, diagnostics = NULL, abs_z_min = 2, prob_max = 0.3, top_n = 100, rule = c("either", "both") )
fit |
Output from |
bias_results |
Output from |
diagnostics |
Optional output from |
abs_z_min |
Absolute standardized-residual cutoff. |
prob_max |
Maximum observed-category probability cutoff. |
top_n |
Maximum number of rows to return. |
rule |
Flagging rule: |
This helper recomputes expected values and residuals after interaction
adjustments from estimate_bias() have been introduced.
summary(t10) is supported through summary().
plot(t10) is dispatched through plot() for class
mfrm_unexpected_after_bias (type = "scatter", "severity",
"comparison").
A named list with:
table: unexpected responses after bias adjustment
summary: one-row summary (includes baseline-vs-after counts)
thresholds: applied thresholds
facets: analyzed bias facet pair
summary: before/after unexpected counts and reduction metrics.
table: residual unexpected responses after bias adjustment.
thresholds: screening settings used in this comparison.
Large reductions indicate bias terms explain part of prior unexpectedness; persistent unexpected rows indicate remaining model-data mismatch.
Run unexpected_response_table() as baseline.
Estimate bias via estimate_bias().
Run unexpected_after_bias_table(...) and compare reductions.
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
The table data.frame has the same structure as
unexpected_response_table() output, with an additional
BiasAdjustment column showing the bias correction applied to each
observation's expected value.
The summary data.frame contains:
Total observations analyzed.
Unexpected count before bias adjustment.
Unexpected count after adjustment.
Reduction in unexpected count.
estimate_bias(), unexpected_response_table(), bias_count_table(),
mfrmr_visual_diagnostics
toy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) t10 <- unexpected_after_bias_table(fit, bias, diagnostics = diag, top_n = 20) summary(t10) p_t10 <- plot(t10, draw = FALSE) p_t10$data$plottoy <- load_mfrmr_data("example_bias") fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25) diag <- diagnose_mfrm(fit, residual_pca = "none") bias <- estimate_bias(fit, diag, facet_a = "Rater", facet_b = "Criterion", max_iter = 2) t10 <- unexpected_after_bias_table(fit, bias, diagnostics = diag, top_n = 20) summary(t10) p_t10 <- plot(t10, draw = FALSE) p_t10$data$plot
Build an unexpected-response screening report
unexpected_response_table( fit, diagnostics = NULL, abs_z_min = 2, prob_max = 0.3, top_n = 100, rule = c("either", "both") )unexpected_response_table( fit, diagnostics = NULL, abs_z_min = 2, prob_max = 0.3, top_n = 100, rule = c("either", "both") )
fit |
Output from |
diagnostics |
Optional output from |
abs_z_min |
Absolute standardized-residual cutoff. |
prob_max |
Maximum observed-category probability cutoff. |
top_n |
Maximum number of rows to return. |
rule |
Flagging rule: |
A response is flagged as unexpected when:
rule = "either": |StdResidual| >= abs_z_min OR ObsProb <= prob_max
rule = "both": both conditions must be met.
The table includes row-level observed/expected values, residuals, observed-category probability, most-likely category, and a composite severity score for sorting.
A named list with:
table: flagged response rows
summary: one-row overview
thresholds: applied thresholds
summary: prevalence of unexpected responses under current thresholds.
table: ranked row-level diagnostics for case review.
thresholds: active cutoffs and flagging rule.
Compare results across rule = "either" and rule = "both" to assess how
conservative your screening should be.
Start with rule = "either" for broad screening.
Re-run with rule = "both" for strict subset.
Inspect top rows and visualize with plot_unexpected().
For a plot-selection guide and a longer walkthrough, see
mfrmr_visual_diagnostics and
vignette("mfrmr-visual-diagnostics", package = "mfrmr").
The table data.frame contains:
Original row index in the prepared data.
Person identifier (plus one column per facet).
Observed score category.
Observed and model-expected score values.
Raw and standardized residuals.
Probability of the observed category under the model.
Most probable category and its probability.
Composite severity index (higher = more unexpected).
"Higher than expected" or "Lower than expected".
Logical flags for each criterion.
The summary data.frame contains:
Total observations analyzed.
Count and share of flagged rows.
Applied cutoff values.
"either" or "both".
diagnose_mfrm(), displacement_table(), fair_average_table(),
mfrmr_visual_diagnostics
toy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) t4 <- unexpected_response_table(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 5) summary(t4) p_t4 <- plot(t4, draw = FALSE) p_t4$data$plottoy_full <- load_mfrmr_data("example_core") toy_people <- unique(toy_full$Person)[1:12] toy <- toy_full[toy_full$Person %in% toy_people, , drop = FALSE] fit <- suppressWarnings( fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 10) ) t4 <- unexpected_response_table(fit, abs_z_min = 1.5, prob_max = 0.4, top_n = 5) summary(t4) p_t4 <- plot(t4, draw = FALSE) p_t4$data$plot
Return a compact, beginner-oriented template that explains where each
visual family normally belongs in a report, which helper to call, what to
say, and what not to claim. Use this static table together with the dynamic
reporting_checklist(fit, diagnostics)$visual_scope table: the template
answers "how should I use this figure?", while the checklist answers "is
this figure ready for the current run?".
visual_reporting_template( scope = c("all", "manuscript", "appendix", "diagnostic", "surface") )visual_reporting_template( scope = c("all", "manuscript", "appendix", "diagnostic", "surface") )
scope |
Which part of the template to return: |
This helper is intentionally conservative. It does not inspect a fitted
object and does not certify that a plot is available. Run
reporting_checklist() for run-specific readiness, then use this table to
decide how to describe the resulting figure.
A data.frame with columns:
FigureFamily: short visual family label.
Scope: broad reporting role used for filtering.
PrimaryHelper: public helper or plot route.
DefaultPlacement: recommended location in a report.
WhatToReport: wording focus for results sections or captions.
CaptionSkeleton: caption starter that must be tailored to the study.
ResultsWording: results-sentence starter that must be checked against
the fitted object and diagnostics.
WhatNotToClaim: common overclaim to avoid.
BeginnerCheck: first thing a new user should inspect.
ReadFirst: the first visual feature to inspect inside the figure.
NextLook: the next public helper, table, or checklist to consult.
ReportDecision: conservative rule for deciding main-text, appendix, or
exploratory-only placement.
GPCMBoundary: model-specific interpretation boundary for bounded
GPCM fits.
ThreeDPolicy: whether 3D is recommended, discouraged, or payload-only.
visual_reporting_template() visual_reporting_template("manuscript") visual_reporting_template("surface")visual_reporting_template() visual_reporting_template("manuscript") visual_reporting_template("surface")