https://www.newyorker.com/magazine/2005/01/03/the-dictionary-of-disorder

Spitzer and the unreliability issues of the DSM (forget validity, without reliability).

Cohen 1960 defined KAPPA

KAPPA = (p_o - p_e) / (1 - p_e)
p_o: frequency of agreement between observers
p_e: frequency of agreement by chance (depends on N of categories)

Inventory of Depression and Anxiety Symptoms IDAS, Watson et al 2007
stronger psychometrically than the usual D & A scales.

Test-retest (ie a valid reliability test) by separate interviewers
same DSM & method, is quite weakly reliable, KAPPA=.47 with
differences despite no self-report change and non-changes in diagnosis
despite a change in self-report.  Audio/video-recording (cannot ask
your own questions to satisfaction) shows higher correlations but do
not prove higher reliability.

Hence don't trust the diagnosis right away.