Chester & Lasko 2021, Construct Validation of Experimental
Manipulations in Social Psychology: Current Practices and
Recommendations for the Future

Experimental Manipulations = "Ms"

Ms prove construct validity ("CV") by influencing their constructs.
But how do we establish construct validity?

"On the fly experimentation": most of 348 JPSP Ms are ad hoc,
not validated before ?implementation?.

Some have pre-implementation pilot testing of CV.

Some have M checks, of them most merely face-valid single item
self-reports not meeting true validation criteria.

So use pilot CV, M checks, standardize protocols, set N by pilot
effect estimates, and estimate M effects on the whole nomological
network.

----

Ms must actually influence the psych proc they are intended to affect:
be valid.  So validate your Ms.

Internal validity is debiased, observer-effect-minimized, experimental
realism to avoid unwanted artifacts: clean M-to-construct control
pathway.  Watch for group diffs, fatigue, unreliability, construct
invalidity, M-to-construct is causal.

External validity is criterion validity: captures real world effects generalizeably.

Latent constructs are in nomological networks. If accurate
w.r.t. strong theory, NNs support construct validity.

CV means use and interpretations of scores from the measure are valid.
CV means M-based results are accurate.

Construct the construct after complete review of all related
knowledge, theories, and NNs.  Folks don't do this enough.  Capture
the full range of the construct, measure it and others in the NN
accurately.

Design measure contents in consultation with outside experts.  Test
it, clean it, show convergent and discriminant validities (MMTM), link
to theory-fitting real world outcomes (criterion validity), show group
differences to be as predicted.  Okay, so far so fastidious and great, but

what about M's?

Ms exert a nomological shockwave: hit its construct target in the right way,
and other latents in the NN, and NOT other extraneous confounder constructs.
I.e. ripples weaken with inferential distance.

Bad Ms confound with other causes than the intended M (noisy M,
incidentals in the instrument), and confound with other effects than
the construct (spatter consequences - nonspecificity of influence).

Pilots help tune up an M and can be validity tested themselves,
even on previously studied Ms to ensure you are on target.

Manipulation checks measure if the M had its intended effect: so
measure the construct intended to be influenced by the M and show it
is. (can also check for comprehension, wakefulness, attention).  USE
DISCRIMINANT VALIDITY; measure related constructs in the NN to show
specificity and estimate the nomological shockwave.

"Minimally intrusive validation assessments are thus preferable to
overt self-report scales (Hauser et al 2018)"

"Without manipulation checks, the validity of experimental
manipulations would be asserted by weaker forms of validity (e.g.,
face validity), which provide deeply flawed footing when used as the
sole basis for construct validity"

"widespread evidence for publication bias in the field of psychology"

"long-standing claims that pilot validity studies in social psychology
are underpowered" duh. It's a pilot.

"p-curve analyses" ??

"random assignment is the core aspect of experimental manipulation"

2/3 of studies have an M.
90% of those have only 1 or 2.
90% of Ms have 2 or 3 conditions
2x3x2 factorial designs inflate Type I and Type II errors and undermine statistical power.
90% of Ms are between-participants (assign subjects to different conditions)
  but within-participants work maximizes statistical power (N *= 2, self normalizing)
Random assignment was not even mentioned in 40%.
Pilot validity and M checks were cursorily described lacking methods & stats
Ad hoc Ms, w/o citations or validation history of Ms are 80%.
***Use prev Ms tho few exist. p15

"We therefore recommend that experimenters conduct well-powered pilot
validity studies for each manipulation prior to implementation in
hypothesis testing (Recommendation 3A)" This is idiotic. Pilots define
your N so you need a pilot for your pilot?

Do manipulation checks in pilots to not contaminate the main study.
Ok then.

"The purported validity evidence provided for the manipulation checks
was often simple face validity and in some cases, a Cronbach’s α. Many
were single-item self-report measures. These forms of purported
validity evidence are insufficient to establish the construct validity
of a measure."

"Indeed, just because manipulations are exerting some effect on their
manipulation checks, these findings do not tell us whether the
intended aspect of the manipulation exerted the observed effect or
whether the manipulation checks measured the target construct."p18

So instead of MCs giving validation, MCs themselves need validation.

Add timecourse estimates for your effects.

Do discriminant validity checks along the nomological shockwave.

Translate personality questionnaire validation approaches to
experimental manipulations.

Validity means "the manipulation has its strongest effect on the
target construct and theoretically appropriate effects on the
nomological network surrounding it. "(p20)

This paper is a prescription for Graduate-Student-osis.  Intellectual
sclerosis.  Also an exercise in moral posturing.  Somewhat repulsive.

Rich cheap data would advance the field given such shrill demands for
vast numbers of experiments and correlations and measures for every
simple concept.  Luckily we do have biotracking rings and watches and
VR glasses worlds, and many independent and dependent variables in the
new compute and social media and VR worlds.  A/B testing for sales
variables are typical.  Can CS support Psych?  What is the
interaction? What limitations?

Comment: linguistic categories have been useful in communicating
information of value to and between humans in communities for
typically thousands of years.  Hindi Paad/Spanish pedo/Catalan
pet/English fart/PIE *pard is 5000 years old as evidenced across the
indo european language family.  It's not in Language if it's not
useful in communicating something to others; hence "folk" categories
are prevalidated in this sense.  For a psychological construct to
compete with a linguistic category in Jamesian pragmatic utility would
be outrageous success, though semantic-featural decomposition may outperform;
cf Bliss Theory, N+V Humor Theory, my notes on Anger (tomveatch.com/anger.php),
https://tomveatch.com/ai.php#Maslow, just to enumerate some of my own.

Much more powerful and insight-bearing than the tentative, and
timorous psychometrically results of r<0.8 psych studies, are likely
to be a logical derivation of psychologically relevant evolutionary
generalizations from tautological necessities using such well
established categories as Organism Community Environment Reproduction
and Death.  Thus logical necessities of prioritizing urgency in
organismal motivation-system evolution, hence flight before sleep,
survival before reproduction, air before water before food, etc.  The
coocurrencies among Maslow, Hormonal, Chakra, Evolutionary-logical
prioritization levels suggest evolution has responded to such
selection pressures.

When Psych Review rejected N+V with the comment, Not psychological
enough, it failed to give significance to Cronbach and Meehl's (1955)
original notion: "Numerous successful predictions dealing with
phenotypically diverse 'criteria' give greater weight to the claim of
construct validity than do fewer predictions, or predictions involving
very similar behavior" (p. 295).'  I would argue N+V revealed a new
and true language for humor which has been experimentally validatable,
and finding encodings of N and V in common language around humor and
in humor judgements, further validate it. I wonder what is the
relative significance of a room of psych students vs a
shared-assumption semantic interpretation of humor descriptors which
we may reasonably infer populates our cognitive-affective universe.
It's nice to have both, sure.  I guess I'm just rationalizing my
position as armchair theoretician.

--------------------------------------------------------------------------------

Experimental Manipulations ("M"'s) relating to Validity Evidence ("VE")

Checklist: Every paper should include:

# of M's/study
# conditions / M.
Definition of construct for each M.
Was each M between participants with random-assignment (& how)
  or within participants with counterbalancing (& how)
Was each M new or from prev studies
Prev study VE for prev study M's.
Was a prev study M modified as used here
VE for mods
Yes/No M was pilot tested prior to implementation
VE for pilot measures
VE from pilot Ms
Detailed Method & Results for each Pilot study
Was each M check with an M check to quantify M's target construct?
VE for each M check
Was each M tested for discriminant validity to quantify confounds?
VE for each discriminant validity check
Was deception used a) by omission, for each M (hiding); b) by commission for each M (lying)
Was deception probed for suspicion?
Method details, VE, scoring, classifying methods for each suspicion probe,
Handling of suspects.

--------------------------------------------------------------------------------