General experiment-design Desiderata

(Ozer) To be measureable you need measures of it. Measures should make sense within a theory and be appropriate to the measurement context, hence bidirectionally valid, both toward the theory and toward the measurement model.

A measure including its internal structure (sum/average, product, ..) should remain stable when generalized theoretically or practically.

A construct is valid if inferences are valid (accurately predictive) from measures of it to predicted (real world, criterion) consequences, using the logic of its theory. Its implications should be explored and known, Validating it means clarifying its logic and describing any problems and methodological solutions:

  • What does the construct and its measurements mean to us, to subjects, how did the idea come about, how general or context- or individual-specific is it?
  • What are the dimensions of the data (SLOT Self-report, Life-outcomes, Other-report, Test/psychophysiological tests)?
  • Are they about description, capability, behavior past or observed, recordings, (school etc.) records, observer effects or hidden observer, under producer task.

So, justify your methods describing their source process, why they are appropriate to theory & circumstance, call out likely sources of variance. Have a theory of the instrument (including S- and O-data) & how it measures the construct, how much information goes in, how different are the various items. Variance sources may include artificiality, ego-presentation, incomprehension, hidden effects, observer effects.

Preferably, provide predictions of phenotypically diverse 'criteria'. Explore widely, apply widely, then generalize appropriately, for science and for applications.

Control for method effects which might falsely support convergent validity; look for trait-nested method effects.

Item response theory asks us to estimate the std deviation of the measurement for each subject and each item. A necessary idea?

Chester & Lasko 2021, .. Validation of Experimental Manipulations (Ms).. Experimental validity means the experimental manipulation has its strongest effect on the target construct and theoretically appropriate effects on the nomological network surrounding it. An experiment run on the fly is unexplored as to EM validity. Preferably do pilot construct validation, manipulation check (MC) (did this indeed manipulate what we say?). Have a protocol. Set N by pilot estimates, estimate the EM's effects on the whole nomological network.

Prefer experimental realism to avoid artefacts of bias, observer effects, Watch group differences, fatigue, unreliability, invalidity. Is the M-to-construct pathway causal?

Review related knowledge theories and NNs. Capture the full range of the construct, and differentially measure it and others in the NN accurately, make suitable predictions about group differences & test them.

Watch the nomological shockwave of an experimental manipulation hitting direct and indirect latent targets, missing confounders. Minimize noise, instrument effects, nonspecific spatter effects. Validity test even pilots, to be sure you are on target, check the M had its intended effect by measuring the target construct and show it is influenced by M. Check comprehension, wakefulness, attention). Use discriminant validity and the MTMM pattern to show specificity and estimate the shockwave.

"Minimally intrusive validation assessments are preferable to overt self-report scales. " (Hauser et al 2018)

So: Have multiple Ms, with multiple conditions, use within-participants work, randomly assign (this is core to experimental validation), describe pilot validity checks and M checks, use benefits of previous Ms where they exist.

Use M checks in the pilot to simplify the main study.
Face validity is not enough. single-item self-report is not enough.
MCs themselves need validation.
Estimate time course of effects.

Review and justify versus all the checklist items (Chester & Lasko)