mfrmr Visual Diagnostics

This vignette is a compact map of the main base-R diagnostics in mfrmr. It is organized around four practical questions:

  • How well do persons, facet levels, and categories target each other?
  • Which observations or levels look locally unstable?
  • Is the design linked well enough across subsets or forms?
  • Where do residual structure and interaction screens point next?

All examples use packaged data and preset = "publication" so the same code is suitable for manuscript-oriented graphics.

If you are selecting figures for a report, use reporting_checklist() before or alongside this vignette. Its "Visual Displays" rows now mirror the public plotting family shown here.

Minimal setup

library(mfrmr)

toy <- load_mfrmr_data("example_core")

fit <- fit_mfrm(
  toy,
  person = "Person",
  facets = c("Rater", "Criterion"),
  score = "Score",
  method = "JML",
  model = "RSM",
  maxit = 20
)

diag <- diagnose_mfrm(fit, residual_pca = "none")
checklist <- reporting_checklist(fit, diagnostics = diag)
subset(
  checklist$checklist,
  Section == "Visual Displays",
  c("Item", "Available", "NextAction")
)

1. Targeting and scale structure

Use the Wright map first when you want one shared logit view of persons, facet levels, and step thresholds.

plot(fit, type = "wright", preset = "publication", show_ci = TRUE)

Interpretation:

  • Compare person density on the left to facet and step locations on the right.
  • Large gaps suggest weaker targeting in that logit region.
  • Wide overlap in confidence whiskers means neighboring levels are not cleanly separated.

Next, use the pathway map when you want to see how expected scores progress across theta.

plot(fit, type = "pathway", preset = "publication")

Interpretation:

  • Steeper rises indicate stronger score progression.
  • Dominant-category strips show where each category is most likely to govern the score.
  • Flat or compressed regions suggest weaker category separation.

2. Local response and level issues

Unexpected-response screening is useful for case-level review.

plot_unexpected(
  fit,
  diagnostics = diag,
  abs_z_min = 1.5,
  prob_max = 0.4,
  plot_type = "scatter",
  preset = "publication"
)

Interpretation:

  • Upper corners combine large residual mismatch with low model probability.
  • Repeated appearances of the same persons or levels are more informative than a single extreme point.

Displacement focuses on level movement rather than individual responses.

plot_displacement(
  fit,
  diagnostics = diag,
  anchored_only = FALSE,
  plot_type = "lollipop",
  preset = "publication"
)

Interpretation:

  • Large absolute displacement indicates stronger tension between observed data and current calibration.
  • For anchored runs, this is especially useful as an anchor-robustness screen.

Strict marginal follow-up

When you need the package’s latent-integrated follow-up path, switch to MML and request diagnostic_mode = "both" so the legacy and strict branches stay visible side by side. The chunk below uses compact quadrature for optional local execution; final reporting should be refit with the package default or a higher quadrature setting.

fit_strict <- fit_mfrm(
  toy,
  person = "Person",
  facets = c("Rater", "Criterion"),
  score = "Score",
  method = "MML",
  model = "RSM",
  quad_points = 7,
  maxit = 40
)

diag_strict <- diagnose_mfrm(
  fit_strict,
  residual_pca = "none",
  diagnostic_mode = "both"
)

strict_checklist <- reporting_checklist(fit_strict, diagnostics = diag_strict)
subset(
  strict_checklist$checklist,
  Section == "Visual Displays" &
    Item %in% c("QC / facet dashboard", "Strict marginal visuals"),
  c("Item", "Available", "NextAction")
)

plot_marginal_fit(
  diag_strict,
  top_n = 12,
  preset = "publication"
)

Interpretation:

  • Treat strict marginal plots as exploratory corroboration screens, not as standalone inferential tests.
  • Use the checklist rows to confirm that the current run actually supports the strict branch before routing figures into a report.
  • When pairwise follow-up is needed, continue with plot_marginal_pairwise(diag_strict, preset = "publication").

3. Linking and coverage

When the design may be incomplete or spread across subsets, inspect the coverage matrix before interpreting cross-subset contrasts.

sc <- subset_connectivity_report(fit, diagnostics = diag)
plot(sc, type = "design_matrix", preset = "publication")

Interpretation:

  • Sparse rows or columns indicate weak subset coverage.
  • Facets with low overlap are weaker anchors for cross-subset comparisons.

If you are working across administrations, follow up with anchor-drift plots:

drift <- detect_anchor_drift(current_fit, baseline = baseline_anchors)
plot_anchor_drift(drift, type = "heatmap", preset = "publication")

4. Residual structure and interaction screens

Residual PCA is a follow-up layer after the main fit screen.

diag_pca <- diagnose_mfrm(fit, residual_pca = "both", pca_max_factors = 4)
pca <- analyze_residual_pca(diag_pca, mode = "both")
plot_residual_pca(pca, mode = "overall", plot_type = "scree", preset = "publication")

Interpretation:

  • Early components with noticeably larger eigenvalues deserve follow-up.
  • Scree review should usually be paired with loading review for the component of interest.

For interaction screening, use the packaged bias example.

bias_df <- load_mfrmr_data("example_bias")

fit_bias <- fit_mfrm(
  bias_df,
  person = "Person",
  facets = c("Rater", "Criterion"),
  score = "Score",
  method = "MML",
  model = "RSM",
  quad_points = 7
)

diag_bias <- diagnose_mfrm(fit_bias, residual_pca = "none")
bias <- estimate_bias(fit_bias, diag_bias, facet_a = "Rater", facet_b = "Criterion")

plot_bias_interaction(
  bias,
  plot = "facet_profile",
  preset = "publication"
)

Interpretation:

  • Facet profiles are useful for seeing whether a small number of levels drives most flagged interaction cells.
  • Treat these plots as screening evidence; confirm with the corresponding tables and narrative reports.

5. Custom figures without losing the evidence boundary

The built-in plots are intended as safe defaults. Use preset = "monochrome" when a journal, accessibility review, or print workflow needs grayscale output. For journal figures, teaching material, dashboards, or lab-specific styles, use draw = FALSE and the plot-data accessors instead of editing screenshots.

plot(fit, type = "wright", preset = "monochrome")

wright_payload <- plot(fit, type = "wright", draw = FALSE, preset = "publication")
plot_data_components(wright_payload)

locations <- plot_data(wright_payload, component = "locations")
head(locations)

pathway_long <- plot_data(
  fit,
  type = "pathway",
  component = "pathway_long",
  preset = "publication"
)
head(pathway_long[, c("Layer", "CurveGroup", "Theta", "Value")])

When you build a custom figure, keep the helper’s guidance tables with the plot data:

names(wright_payload$data)
wright_payload$data$reference_lines

Those metadata are the guardrails for captions and interpretation. They let you change colors, labels, panels, or rendering technology while preserving the same measurement scale, reference lines, caveats, and reporting role used by the package-native plot.

6. Secondary visual layer

The package ships a second-wave visual layer for teaching and diagnostic follow-up. These helpers are not default reporting figures; use them after the main screens above.

  • plot_guttman_scalogram(fit, diagnostics) renders a person x facet-level response matrix with an unexpected-response overlay, for teaching-oriented scalogram intuition and local triage.
  • plot_residual_qq(fit, diagnostics) plots a Normal Q-Q of person-level standardized residual aggregates as exploratory follow-up on residual tail behavior.
  • plot_rater_trajectory(list(T1 = fit_a, T2 = fit_b)) tracks rater severity across named waves. The helper does not perform linking; supply waves that have already been placed on a common anchored scale (see vignette("mfrmr-linking-and-dff")) before interpreting movement as rater drift.
  • plot_rater_agreement_heatmap(fit, diagnostics) renders a compact pairwise rater x rater agreement matrix; pass metric = "correlation" to colour by the Pearson-style Corr column instead of exact agreement.
  • response_time_review(data, person, facets, time) summarizes response-time metadata by person, facet, and score category. Pair it with plot_response_time_review() for distribution and grouped timing plots. This is a descriptive QC layer, not a joint speed-accuracy model.
  • plot_shrinkage_funnel(fit_eb, show_ci = TRUE) draws raw and empirical-Bayes shrunken facet estimates on the same row, with optional confidence whiskers for both estimates. Use this only after apply_empirical_bayes_shrinkage() or fit_mfrm(..., facet_shrinkage = "empirical_bayes").

Response-time QC context

If your rating-event data include response times, review them separately from the MFRM likelihood. Rapid and slow response-time flags are descriptive quality-control prompts; they do not change measures and should not be treated as proof of disengagement, cheating, or speededness.

toy_rt <- toy
toy_rt$ResponseTime <- 12 + (seq_len(nrow(toy_rt)) %% 7) +
  as.numeric(toy_rt$Score)
toy_rt$ResponseTime[1] <- 2
toy_rt$ResponseTime[2] <- 38

rt <- response_time_review(
  toy_rt,
  person = "Person",
  facets = c("Rater", "Criterion"),
  score = "Score",
  time = "ResponseTime",
  rapid_quantile = 0.10,
  slow_quantile = 0.90
)

summary(rt)
plot_response_time_review(rt, type = "distribution", preset = "publication")
plot_response_time_review(rt, type = "person", preset = "publication")

Interpretation:

  • Start with the distribution plot to see whether the rapid/slow thresholds are sensible for this administration.
  • Inspect person and facet summaries for concentrated rapid or slow rates rather than isolated events.
  • Keep timing flags separate from fit, bias, and validity claims unless the study design explicitly supports stronger speed-accuracy modeling.

Small-N shrinkage with uncertainty

When a non-person facet has few levels or sparse observations, a large raw severity estimate can be a noisy estimate rather than a stable facet signal. The shrinkage funnel shows how far empirical-Bayes pooling moved each level toward the facet mean and whether the uncertainty remains wide after pooling.

fit_eb <- apply_empirical_bayes_shrinkage(fit)

shrink <- plot_shrinkage_funnel(
  fit_eb,
  show_ci = TRUE,
  ci_level = 0.95,
  preset = "publication",
  draw = FALSE
)

head(shrink$data$table[, c(
  "Facet", "Level", "RawEstimate", "RawCI_Lower", "RawCI_Upper",
  "ShrunkEstimate", "ShrunkCI_Lower", "ShrunkCI_Upper",
  "ShrinkageFactor"
)])

plot_shrinkage_funnel(
  fit_eb,
  show_ci = TRUE,
  ci_level = 0.95,
  preset = "publication"
)

Interpretation:

  • Long raw-to-shrunken segments identify levels most affected by the partial-pooling prior.
  • Wide raw whiskers that narrow after pooling indicate estimation instability, not automatic rater-quality failure.
  • Report the shrinkage method and keep this display separate from bias, fit, or validity claims.