A comparative analysis between site-based and centralized ratings and patient self-ratings in a clinical trial of Major Depressive Disorder

Journal of Psychiatric Research 47(7), July 2013, Pages 944–954

Targum SD, Wedel PC, Robinson J,  Daniel DG, Busner J, Bleicher LS,  Rauh P,  Barlow C.


We compared scores from three different ratings methods in a clinical trial of patients with Major Depressive Disorder (MDD). The Quick Inventory of Depressive Symptoms (QIDS-SR16) was compared to site-based clinician and centralized (site-independent) ratings of the Inventory of Depressive Symptoms (IDSc30). An extracted QIDSc16 was used for a matched comparison with the QIDS-SR16. Patient self-ratings were more depressed at baseline than either site-based ratings (p = 0.131) or centralized ratings (p = 0.005), but significantly less depressed at the end of double-blind treatment than either site-based (p = 0.006) or centralized ratings (p = 0.014), and after 12 weeks (site-based ratings: p = 0.048; centralized ratings: p = 0.004). The matched comparisons with patient self-ratings revealed ICC of r = 0.55 (site-based raters) and r = 0.49 (centralized raters) at baseline. After baseline, the correlations between the two different clinician ratings and patient self-ratings improved to r-values between 0.78 and 0.89. At the end of double-blind treatment, site-based raters separated the combination treatment from placebo on the IDSc30 (p = 0.030) whereas neither centralized ratings nor patient self-ratings achieved statistical significance. Alternatively, patient self-ratings separated the combination treatment from buspirone (p = 0.030) whereas neither clinician rating method achieved significance. A “dual” scoring concordance range reduced the placebo response rate and increased the drug effect between the combination treatment and placebo. These findings reveal scoring variability between each of the three ratings methods and challenge the reliability of any single method to accurately assess symptom severity scores, particularly at baseline. The use of “dual” scoring criteria may help to confirm symptom severity scores and improve ratings precision, particularly prior to enrolling subjects into CNS trials.