Impact of BPRS Interview Length on Ratings Reliability In a Schizophrenia Trial Eur J Neuropsychopharmacology

Targum, S. D,,  Pendergrass J.C., Toner, C., Zumpano, L., Rauh, P., and DeMartinis, N. Impact of BPRS Interview Length on Ratings Reliability In a Schizophrenia Trial Eur J Neuropsychopharmacology In press

Abstract

Signal detection in clinical trials relies on ratings reliability.  We conducted a reliability analysis of site-independent rater scores derived from audio-digital recordings of site-based rater interviews of the structured Brief Psychiatric Rating Scale (BPRS) in a schizophrenia study.  “Dual” ratings assessments were conducted as part of a quality assurance program in a 12-week, double-blind, parallel-group study of PF-02545920 compared to placebo in patients with sub-optimally controlled symptoms of schizophrenia (ClinicalTrials.gov identifier NCT01939548).  Blinded, site-independent raters scored the recorded site-based BPRS interviews that were administered in relatively stable patients during two visits prior to the randomization visit.  We analyzed the impact of BPRS interview length on “dual” scoring variance and discordance between trained and certified site-based raters and the paired scores of the independent raters.

Mean total BPRS scores for 392 interviews conducted at the screen and stabilization visits were 50.4 ± 7.2 (SD) for site-based raters and 49.2 ± 7.2 for site-independent raters (t= 2.34; p= 0.025).  “Dual” rated total BPRS scores were highly correlated (r=0.812).  Mean BPRS interview length was 21:05 ± 7:47 minutes ranging from 7 to 59 minutes.  89 interviews (23%) were conducted in less than 15 minutes.  These shorter interviews had significantly greater “dual” scoring variability (p= 0.0016) and absolute discordance  (p=0.0037) between site-based and site-independent raters than longer interviews.

In-study ratings reliability cannot be guaranteed by pre-study rater certification.  Our findings reveal marked variability of BPRS interview length and that shorter interviews are often incomplete yielding greater “dual” scoring discordance that may affect ratings precision.