Impact of BPRS Interview Length on Ratings Reliability In a Schizophrenia Trial Eur J Neuropsychopharmacology

Targum, S. D,,  Pendergrass J.C., Toner, C., Zumpano, L., Rauh, P., and DeMartinis, N. Impact of BPRS Interview Length on Ratings Reliability In a Schizophrenia Trial Eur J Neuropsychopharmacology In press

Abstract

Signal detection in clinical trials relies on ratings reliability.  We conducted a reliability analysis of site-independent rater scores derived from audio-digital recordings of site-based rater interviews of the structured Brief Psychiatric Rating Scale (BPRS) in a schizophrenia study.  “Dual” ratings assessments were conducted as part of a quality assurance program in a 12-week, double-blind, parallel-group study of PF-02545920 compared to placebo in patients with sub-optimally controlled symptoms of schizophrenia (ClinicalTrials.gov identifier NCT01939548).  Blinded, site-independent raters scored the recorded site-based BPRS interviews that were administered in relatively stable patients during two visits prior to the randomization visit.  We analyzed the impact of BPRS interview length on “dual” scoring variance and discordance between trained and certified site-based raters and the paired scores of the independent raters.

Mean total BPRS scores for 392 interviews conducted at the screen and stabilization visits were 50.4 ± 7.2 (SD) for site-based raters and 49.2 ± 7.2 for site-independent raters (t= 2.34; p= 0.025).  “Dual” rated total BPRS scores were highly correlated (r=0.812).  Mean BPRS interview length was 21:05 ± 7:47 minutes ranging from 7 to 59 minutes.  89 interviews (23%) were conducted in less than 15 minutes.  These shorter interviews had significantly greater “dual” scoring variability (p= 0.0016) and absolute discordance  (p=0.0037) between site-based and site-independent raters than longer interviews.

In-study ratings reliability cannot be guaranteed by pre-study rater certification.  Our findings reveal marked variability of BPRS interview length and that shorter interviews are often incomplete yielding greater “dual” scoring discordance that may affect ratings precision.

Audio-digital recordings used for independent confirmation of site-based MADRS interview scores

Signal detection requires ratings reliability throughout a clinical trial. The confirmation of site-based rater scores by a second, independent and blinded rater is a reasonable metric of ratings reliability.

We used audio-digital pens to record site-based interviews of the Montgomery-Asberg Depression Rating Scale (MADRS) in a double-blind, placebo controlled trial of a novel antidepressant in treatment resistant depressed patients. Blinded, site-independent raters generated “dual” scores that revealed high correlations between site-based and site-independent raters (r=0.940 for all ratings) and high sensitivity, specificity, predictive values, and kappa coefficients for treatment response and non-response outcomes using the site-based rater scores as the standard. The blinded raters achieved an 89.4% overall accuracy and 0.786 kappa for matching the treatment response or non-response outcomes of the site-based raters.

A limitation of this method is that independent ratings depend on the quality of site-based interviews and patient responses to the site-based interviewers. Nonetheless, this quality assurance strategy may have broad applicability for studies that use subjective measures and wherever ratings reliability is a concern. “Dual” scoring of recorded site-based ratings can be a relatively unobtrusive surveillance strategy to confirm scores and to identify and remediate rater “outliers” during a study. Full Text.

Site-independent confirmation of subject selection for CNS trials: ‘dual’ review using audio-digital recordings

Targum, S. and Pendergrass, C. (2014) Site-independent confirmation of subject selection for CNS trials: ‘dual’ review using audio-digital recordings. Annals of General Psychiatry 2014, 13:21

Background

Site-independent review of subject eligibility for central nervous system (CNS) trials has been used as a surveillance method to enhance the integrity and precision of the subject selection process. We evaluated the utility of a customized review strategy that employs site-independent review of audio-digital recordings of site-based screen interviews.Methods: We applied a customized site-independent subject selection strategy in nine phase II double-blind,placebo-controlled clinical trials across the CNS spectrum. The Clinical Validation Inventory for Study Admission(C-VISATM, Boston, MA, USA) was developed as a site-independent review method that evaluates and confirms diagnoses,symptom severity, and subject validity prior to enrollment (randomization) into a clinical trial. The C-VISATM method uses audio-digital recordings of actual site-based interviews conducted at the screening visit. The recordings of these interviews accompanied by digital notes are electronically submitted for independent review and ‘dual’ scoring of key rating instruments. A multi-tiered system of site-independent reviewers either affirms subject eligibility or identifiesadministrative and/or clinical issues that may preclude study eligibility (screen failure).Results: In this meta-analysis, 404 of 2,515 submitted C-VISATM eligibility reviews (16.1%) were challenged by tier 1reviewers and escalated to a tier 2 reviewer. After telephone adjudication with the respective trial site investigator, 168 of these 404 tier 2 reviews (41.6%) were not approved yielding an overall screen fail rate of 6.7% for all C-VISATM submissions.The primary reasons for screen failure were insufficient documentation to support the intended diagnosis, symptom severity that did not meet protocol criteria, the presence of excluded comorbid conditions, and potential confounding factors that might obscure assessment during the trial.Conclusion: The C-VISATM review process coupled with dual independent scoring of key rating instruments is a qualityassurance strategy that provides a systematic site-independent eligibility filter to enhance the precision of subject selectionand the integrity of study data. The C-VISATM strategy has broad applicability across the CNS spectrum because it achieves the objective of confirmatory site-independent review without producing excessive site or subject burden.

Audio-digital recordings used for independent confirmation of site-based MADRS interview scores

Targum SD, Pendergrass JC, Toner C, Asgharnejad M, Burch DJ. (2014) Audio-digital recordings used for independent confirmation of site-based MADRS interview scores. Eur Neuropsychopharmacol.

Abstract

Signal detection requires ratings reliability throughout a clinical trial. The confirmation of site-based rater scores by a second, independent and blinded rater is a reasonable metric of ratings reliability. We used audio-digital pens to record site-based interviews of the Montgomery-Asberg Depression Rating Scale (MADRS) in a double-blind, placebo controlled trial of a novel antidepressant in treatment resistant depressed patients. Blinded, site-independent raters generated “dual” scores that revealed high correlations between site-based and site-independent raters (r=0.940 for all ratings) and high sensitivity, specificity, predictive values, and kappa coefficients for treatment response and non-response outcomes using the site-based rater scores as the standard. The blinded raters achieved an 89.4% overall accuracy and 0.786 kappa for matching the treatment response or non-response outcomes of the site-based raters. A limitation of this method is that independent ratings depend on the quality of site-based interviews and patient responses to the site-based interviewers. Nonetheless, this quality assurance strategy may have broad applicability for studies that use subjective measures and wherever ratings reliability is a concern. “Dual” scoring of recorded site-based ratings can be a relatively unobtrusive surveillance strategy to confirm scores and to identify and remediate rater “outliers” during a study.

Site-Independent Confirmation of Subject Selection for an Alzheimer’s Disease Trial.

Lyketsos, C.,  Targum, S.,  Drake, K.,  Pendergrass, J.C.,  Munro, C., Smith, G.,  Lozano, A. “Site-Independent Confirmation of Subject Selection for an Alzheimer’s Disease Trial.” Poster session presented at: Alzheimer’s Association International Conference, July 11th – July 17th 2014, Copenhagen, Denmark.

Background

Several studies have failed to detect a significant signal in clinical trials for Alzheimer’s disease (AD) due, in part, to inappropriate participant selection related to diagnosis or sufficient evidence of progressive symptom severity.  A “second” independent opinion regarding participant selection may improve selection of control groups that are likely to be progressive.   We created an enrollment review committee (ERC) to provide site-independent historical, medical, and diagnostic verification and progressive symptom severity confirmation prior to randomization in an ongoing clinical trial. Methods: ADvance (ClinicalTrials.gov identifier NCT01608061) is a double-blind, placebo-controlled evaluation of deep brain stimulation targeting fornix (DBS-F) in patients with mild probable AD being conducted at 7 academic sites within the United States and Canada.  Entry criteria include an ADAS-cog-11 score between 12-24 (inclusive).  As part of this study, we introduced an ERC that was responsible for review and approval of all participants proposed by the trial sites.  The ERC is composed of 3 clinicians who are not affiliated with the trial site proposing the participant: two AD clinicians (neurologists and/or psychiatrists) and a neurosurgeon.  Challenged cases generate adjudication between the site investigator and a member of the ERC. Results: To date, 40 of 80 potentially eligible subjects (50%) failed to meet eligibility criteria at screen (or baseline).  26 of these eligibility failures related to ADAS-cog-11 scores that were either too low (<12) or too high (>24).  Other sources of ineligibility included diagnostic, psychosocial, or surgical issues that would affect study participation or preclude accurate assessments. Conclusions: In this survey of patients proposed as eligible for a mild probable AD study, symptom severity (either too high or too low) as measured by the ADAS-cog-11 was the most frequent reason for eligibility failure.  The use of a site-independent review strategy for subject validation prior to randomization may improve the rigor of the assessment and facilitate better signal detection, in part by assuring the progressive nature of cognitive and functional change in the year prior to enrollment.

Impact of BPRS Interview Length on Ratings Precision in a Schizophrenia Trial

Poster Session Presented at: Annual American Society of Clinical Psychopharmacology Meeting, June 16th-19th, 2014, Hollywood FL.

Steven Targum, Chelsea Toner, J. Cara Pendergrass, Laura Zumpano, Philip Rauh, and Nick deMartinis

Background: Ratings precision is necessary to optimize signal detection in clinical trials, and is particularly important in studies that rely on subjective ratings. The ability of a second independent rater to replicate the scores given by the primary rater is one metric of ratings precision. We assessed ratings precision in a schizophrenia study using site-independent raters to blindly score audio-digital recordings of site-based interviews.

Methods: We examined ratings precision in a 12-week, double-blind, parallel-group study of PF-02545920 compared to placebo in patients with sub-optimally controlled symptoms of schizophrenia (ClinicalTrials.gov identifier NCT01939548). The study is currently being conducted at 26 trial centers in the United States. All patients consented to audio-digital pen recording of site-based interviews using the Brief Psychiatric Rating Scale (BPRS). Recorded interviews were electronically transmitted to Clintara LLC (Boston MA) via a secure website and distributed to 5 site-independent reviewers. These reviewers were blinded to the study site and visit and scored the BPRS based upon the audio recording and corroborative digital information they received. We analyzed “dual” ratings of the total BPRS score and the impact of BPRS interview length on scoring variability and discordance. Statistical analysis included intra-class correlation and Student’s t test.

Results: 392 BPRS interviews conducted at the screen and stabilization visits were recorded for “dual” scoring. The mean total BPRS scores were 50.4 ± 7.6 (SD) for the site-based raters and 49.2 ± 7.2 for the site-independent reviewer/raters (t= 2.34; p= 0.025). The total BPRS scores of the paired “dual” site-independent raters were highly correlated with the site-based scores (r=0.812), and the discordance rate (total score difference between site-based and site-independent raters ≥ 8 points) was less than 10% for all BPRS interviews.
Interview length significantly affected scoring discordance between site-based and site-independent ratings. The mean BPRS interview length was 21:05 ± 7:47 minutes ranging from 7 to 59 minutes. 89 interviews (22.7%) were conducted in less than 15 minutes. These “shorter” interviews yielded significantly greater “dual” scoring variability (p= 0.002) and absolute discordance (p=0.004) between site-based and site-independent raters than all other interviews.

Conclusion: Overall, “dual” scoring of the site-based BPRS interviews revealed a high correlation and minimal scoring discordance between site-based and blinded, site-independent raters. However, interview length had a significant impact on “dual” ratings concordance such that “shorter” interviews (< 15 minutes) were significantly more discordant than all other interviews in the sample. These data suggest that ratings precision may be compromised by short, incomplete interviews and may adversely affect the study outcome.

Use of band-pass filter analysis to evaluate Outcomes In an Antidepressant Trial for Treatment Resistant Patients

European Neuropsychopharmacology, June 2014

Steven D. Targum, Daniel J. Burch, Mahnaz Asgharnejad, Timothy Petersen, Roberto Gomeni, Maurizio Favae

Abstract
Band-pass filtering is a novel statistical methodology that proposes that filtering out data from trial sites generating non-plausible high or low levels of placebo response can yield a more accurate effect size and greater separation of active drug (when efficacious) from placebo.

We applied band-pass filters to re-analyze data from a negative antidepressant trial (NCT00739908) evaluating CX157 (a reversible and selective monoamine oxidase inhibitor-A) versus placebo.

360 patients from 29 trial sites were randomized to either CX157 treatment (n=182) or placebo (n=178). We applied two filters of<3 or>7 points (filter #1) or<3 and>9 points (filter #2) mean change of the total MADRS placebo scores for each site. Trial sites that had mean placebo MADRS score changes exceeding the boundaries of these band-pass filter thresholds were considered non-informative and all of the data from these sites were excluded from the post-hoc re-analysis.

The two band-pass filters reduced the sample of informative patients from 353 patients in the mITT population to 62 in filter #1 and 152 in the filter #2 group. The placebo response was reduced from 31.1% in the mITT population to 9.4% with filter #1 and 20.8% with filter #2. MMRM analysis revealed a non-statistically significant trend of p= 0.13 and 0.16 respectively for the two filters in contrast to the mITT population (p= 0.58).
Our findings support the band-pass filter hypothesis and highlight issues related to site-based scoring variability and inappropriate subject selection that may contribute to trial failure.

Identification and Treatment of Antidepressant Tachyphylaxis

Innov Clin Neurosci. 2014 Mar-Apr; 11(3-4): 24–28.

Steven D. Targum

Abstract
Antidepressant tachyphylaxis describes a condition in which a depressed patient loses a previously effective antidepressant treatment response despite staying on the same drug and dosage for maintenance treatment. It has been suggested that antidepressant tachyphylaxis is a form of relapse related to evolving drug tolerance, but it is also clear that there are other possible reasons for the loss of treatment response unrelated to tolerance, such as medication nonadherence. It has been reported that depressed patients with “true” antidepressant tachyphylaxis may be less responsive to new treatment interventions. Therefore, it is important to identify these patients as part of a comprehensive treatment planning process.

Assessment of Global Fatigue in Multiple Sclerosis: A Spanish Language Version of the CGI and PGI Fatigue Scales

Neuroscience & Medicine Vol.4 No.3, September 2013

Steven D. Targum, Pablo Richly, Vladimiro Sinay, Daniel Goldberg-Zimring, Facundo Manes

ABSTRACT
Background: Fatigue is often identified as weakness following muscular exertion in patients with multiple sclerosis (MS) but may be associated with other physical, cognitive and emotional symptoms. Objective: To develop a Spanish language global impression of fatigue scales to evaluate symptoms of fatigue distinct from a particular disease. Methods: 50 ambulatory patients with MS attending a clinical institute in Argentina consented to participate in this reliability study. The Spanish language version of the Clinical and Patient Global Impressions of Fatigue (CGI-S-F and PGI-S-F) instruments were administered with the Massachusetts General Hospital cognitive and physical functioning questionnaire (MGH-CPFQ). Results: The CGI-S-F and PGI-S-F scores were well correlated with each other (p < 0.00005). The mean CGI-S for fatigue was 2.28 ± 1.07 (SD) and PGI-S for fatigue was 2.30 ± 1.16 (p = ns) reflecting borderline to mild perception of fatigue. The total MGH-CPFQ was 16.68 ± 4.32. Both CGI-S-F and PGI-S-F measures were correlated with the MGH-CPFQ: CGI-Severity (r = 0.632; p < 0.00005); PGI-Severity (r = 0.717; p < 0.00005). Conclusions: In this study, the Spanish language versions of the CGI-S-F and PGI-S-F were reliable measures in an MS population and can be useful and easily applied metrics in a busy clinical practice.

A comparative analysis between site-based and centralized ratings and patient self-ratings in a clinical trial of Major Depressive Disorder

Journal of Psychiatric Research 47(7), July 2013, Pages 944–954

Targum SD, Wedel PC, Robinson J,  Daniel DG, Busner J, Bleicher LS,  Rauh P,  Barlow C.

Abstract

We compared scores from three different ratings methods in a clinical trial of patients with Major Depressive Disorder (MDD). The Quick Inventory of Depressive Symptoms (QIDS-SR16) was compared to site-based clinician and centralized (site-independent) ratings of the Inventory of Depressive Symptoms (IDSc30). An extracted QIDSc16 was used for a matched comparison with the QIDS-SR16. Patient self-ratings were more depressed at baseline than either site-based ratings (p = 0.131) or centralized ratings (p = 0.005), but significantly less depressed at the end of double-blind treatment than either site-based (p = 0.006) or centralized ratings (p = 0.014), and after 12 weeks (site-based ratings: p = 0.048; centralized ratings: p = 0.004). The matched comparisons with patient self-ratings revealed ICC of r = 0.55 (site-based raters) and r = 0.49 (centralized raters) at baseline. After baseline, the correlations between the two different clinician ratings and patient self-ratings improved to r-values between 0.78 and 0.89. At the end of double-blind treatment, site-based raters separated the combination treatment from placebo on the IDSc30 (p = 0.030) whereas neither centralized ratings nor patient self-ratings achieved statistical significance. Alternatively, patient self-ratings separated the combination treatment from buspirone (p = 0.030) whereas neither clinician rating method achieved significance. A “dual” scoring concordance range reduced the placebo response rate and increased the drug effect between the combination treatment and placebo. These findings reveal scoring variability between each of the three ratings methods and challenge the reliability of any single method to accurately assess symptom severity scores, particularly at baseline. The use of “dual” scoring criteria may help to confirm symptom severity scores and improve ratings precision, particularly prior to enrolling subjects into CNS trials.