Chester & Lasko 2021, Construct Validation of Experimental Manipulations in Social Psychology: Current Practices and Recommendations for the Future Experimental Manipulations = "Ms" Ms prove construct validity ("CV") by influencing their constructs. But how do we establish construct validity? "On the fly experimentation": most of 348 JPSP Ms are ad hoc, not validated before ?implementation?. Some have pre-implementation pilot testing of CV. Some have M checks, of them most merely face-valid single item self-reports not meeting true validation criteria. So use pilot CV, M checks, standardize protocols, set N by pilot effect estimates, and estimate M effects on the whole nomological network. ---- Ms must actually influence the psych proc they are intended to affect: be valid. So validate your Ms. Internal validity is debiased, observer-effect-minimized, experimental realism to avoid unwanted artifacts: clean M-to-construct control pathway. Watch for group diffs, fatigue, unreliability, construct invalidity, M-to-construct is causal. External validity is criterion validity: captures real world effects generalizeably. Latent constructs are in nomological networks. If accurate w.r.t. strong theory, NNs support construct validity. CV means use and interpretations of scores from the measure are valid. CV means M-based results are accurate. Construct the construct after complete review of all related knowledge, theories, and NNs. Folks don't do this enough. Capture the full range of the construct, measure it and others in the NN accurately. Design measure contents in consultation with outside experts. Test it, clean it, show convergent and discriminant validities (MMTM), link to theory-fitting real world outcomes (criterion validity), show group differences to be as predicted. Okay, so far so fastidious and great, but what about M's? Ms exert a nomological shockwave: hit its construct target in the right way, and other latents in the NN, and NOT other extraneous confounder constructs. I.e. ripples weaken with inferential distance. Bad Ms confound with other causes than the intended M (noisy M, incidentals in the instrument), and confound with other effects than the construct (spatter consequences - nonspecificity of influence). Pilots help tune up an M and can be validity tested themselves, even on previously studied Ms to ensure you are on target. Manipulation checks measure if the M had its intended effect: so measure the construct intended to be influenced by the M and show it is. (can also check for comprehension, wakefulness, attention). USE DISCRIMINANT VALIDITY; measure related constructs in the NN to show specificity and estimate the nomological shockwave. "Minimally intrusive validation assessments are thus preferable to overt self-report scales (Hauser et al 2018)" "Without manipulation checks, the validity of experimental manipulations would be asserted by weaker forms of validity (e.g., face validity), which provide deeply flawed footing when used as the sole basis for construct validity" "widespread evidence for publication bias in the field of psychology" "long-standing claims that pilot validity studies in social psychology are underpowered" duh. It's a pilot. "p-curve analyses" ?? "random assignment is the core aspect of experimental manipulation" 2/3 of studies have an M. 90% of those have only 1 or 2. 90% of Ms have 2 or 3 conditions 2x3x2 factorial designs inflate Type I and Type II errors and undermine statistical power. 90% of Ms are between-participants (assign subjects to different conditions) but within-participants work maximizes statistical power (N *= 2, self normalizing) Random assignment was not even mentioned in 40%. Pilot validity and M checks were cursorily described lacking methods & stats Ad hoc Ms, w/o citations or validation history of Ms are 80%. ***Use prev Ms tho few exist. p15 "We therefore recommend that experimenters conduct well-powered pilot validity studies for each manipulation prior to implementation in hypothesis testing (Recommendation 3A)" This is idiotic. Pilots define your N so you need a pilot for your pilot? Do manipulation checks in pilots to not contaminate the main study. Ok then. "The purported validity evidence provided for the manipulation checks was often simple face validity and in some cases, a Cronbach’s α. Many were single-item self-report measures. These forms of purported validity evidence are insufficient to establish the construct validity of a measure." "Indeed, just because manipulations are exerting some effect on their manipulation checks, these findings do not tell us whether the intended aspect of the manipulation exerted the observed effect or whether the manipulation checks measured the target construct."p18 So instead of MCs giving validation, MCs themselves need validation. Add timecourse estimates for your effects. Do discriminant validity checks along the nomological shockwave. Translate personality questionnaire validation approaches to experimental manipulations. Validity means "the manipulation has its strongest effect on the target construct and theoretically appropriate effects on the nomological network surrounding it. "(p20) This paper is a prescription for Graduate-Student-osis. Intellectual sclerosis. Also an exercise in moral posturing. Somewhat repulsive. Rich cheap data would advance the field given such shrill demands for vast numbers of experiments and correlations and measures for every simple concept. Luckily we do have biotracking rings and watches and VR glasses worlds, and many independent and dependent variables in the new compute and social media and VR worlds. A/B testing for sales variables are typical. Can CS support Psych? What is the interaction? What limitations? Comment: linguistic categories have been useful in communicating information of value to and between humans in communities for typically thousands of years. Hindi Paad/Spanish pedo/Catalan pet/English fart/PIE *pard is 5000 years old as evidenced across the indo european language family. It's not in Language if it's not useful in communicating something to others; hence "folk" categories are prevalidated in this sense. For a psychological construct to compete with a linguistic category in Jamesian pragmatic utility would be outrageous success, though semantic-featural decomposition may outperform; cf Bliss Theory, N+V Humor Theory, my notes on Anger (tomveatch.com/anger.php), https://tomveatch.com/ai.php#Maslow, just to enumerate some of my own. Much more powerful and insight-bearing than the tentative, and timorous psychometrically results of r<0.8 psych studies, are likely to be a logical derivation of psychologically relevant evolutionary generalizations from tautological necessities using such well established categories as Organism Community Environment Reproduction and Death. Thus logical necessities of prioritizing urgency in organismal motivation-system evolution, hence flight before sleep, survival before reproduction, air before water before food, etc. The coocurrencies among Maslow, Hormonal, Chakra, Evolutionary-logical prioritization levels suggest evolution has responded to such selection pressures. When Psych Review rejected N+V with the comment, Not psychological enough, it failed to give significance to Cronbach and Meehl's (1955) original notion: "Numerous successful predictions dealing with phenotypically diverse 'criteria' give greater weight to the claim of construct validity than do fewer predictions, or predictions involving very similar behavior" (p. 295).' I would argue N+V revealed a new and true language for humor which has been experimentally validatable, and finding encodings of N and V in common language around humor and in humor judgements, further validate it. I wonder what is the relative significance of a room of psych students vs a shared-assumption semantic interpretation of humor descriptors which we may reasonably infer populates our cognitive-affective universe. It's nice to have both, sure. I guess I'm just rationalizing my position as armchair theoretician. -------------------------------------------------------------------------------- Experimental Manipulations ("M"'s) relating to Validity Evidence ("VE") Checklist: Every paper should include: # of M's/study # conditions / M. Definition of construct for each M. Was each M between participants with random-assignment (& how) or within participants with counterbalancing (& how) Was each M new or from prev studies Prev study VE for prev study M's. Was a prev study M modified as used here VE for mods Yes/No M was pilot tested prior to implementation VE for pilot measures VE from pilot Ms Detailed Method & Results for each Pilot study Was each M check with an M check to quantify M's target construct? VE for each M check Was each M tested for discriminant validity to quantify confounds? VE for each discriminant validity check Was deception used a) by omission, for each M (hiding); b) by commission for each M (lying) Was deception probed for suspicion? Method details, VE, scoring, classifying methods for each suspicion probe, Handling of suspects. --------------------------------------------------------------------------------