Centralised Statistical Monitoring – ‘It’s Just Data Cleaning, Right?’ Implementation and Challenges in Industry Chris Wells JMP Discovery, Amsterdam 15th – 18th March Why Statistical Monitoring What is Statistical Monitoring The Methods Being Used Statistical Monitoring in Roche Interactive Demonstration Take Home Messages Why Statistical Monitoring What is Statistical Monitoring The Methods Being Used Interactive Demonstration Statistical Monitoring in Roche Take Home Messages What Statistical Monitoring is not !!!! • Although this is part of Risk Based Monitoring • It is NOT data cleaning • It IS a statistical analysis of the data – Not comparing treatments but an analysis comparing sites with each other, patients with each other irrespective of treatment • You should ignore anything you know about treatment • Is it an interim analysis? – Not strictly speaking because nothing is broken down by treatment. • This should be carried out by people with a statistics background Why Statistical Monitoring? • Much of this has background in fraud and falsification. • FDA proposed rule for “Reporting Information Regarding Falsification of Data” defines falsification of data as – creating, altering, recording or omitting data in such a way that the data do not represent what actually occurred • Examples of falsification of data include but are not limited to: – Creating data that were never obtained – Altering data that were obtained by substituting different data – Recording or obtaining data from a specimen, sample or test whose origin is not accurately described or in a way that does not accurately reflect the data – Omitting data that were obtained and ordinarily would be recorded Why Statistical Monitoring? • FDA estimates that it will see 73 reports of data falsification across all divisions. • The impact is critical. • A single investigator involved in data falsification could jeopardize the trial and potentially a drug submission. • Regulators have the power to terminate a product’s development immediately or withdraw approval of an already marketed drug • The U.S. Code on Crimes and Criminal Procedure makes submission of falsified information to the federal government a criminal act, and convictions can lead to substantial fines and even imprisonment of responsible parties. Why are we using Statistical Monitoring?? • Develop an approach for identifying and managing issues affecting data integrity such as non-random errors, GCP misconduct including fraud at investigational sites. up data issues that simple thresholds or the • Ability to use statistical testing to pick up data issues that simple thresholds or the naked eye cannot 7 Recent Examples in the News Impact of Fraud/Data Falsification • Jeopardising a submission could : – Have huge cost implications. Delay in launching a product could mean the loss of $6 – 15 million per drug, per day to the company – Moreover, it can lead to ineffective or harmful treatment being available or patients being denied of effective treatment. – Result in a Sponsor company missing getting to market first and hence the loss of a major patent and the associated rewards. Possible loss of exclusivity license. – Loss of Company credibility Motivation for Fraud Motivation comes in many forms: • Desire for Academic Prestige • Money (Getting to market first provides high rewards; Fabrication of patients) • Unintentional Fraud (Possible mix up of patient files when data entered at sites), patients/parents fabricating data • Sabotage (aggrieved employee) • Professional patients Why Statistical Monitoring What is Statistical Monitoring The Methods Being Used Interactive Demonstration Statistical Monitoring in Roche Take Home Messages What is Statistical Monitoring? • Statistical Monitoring comes under the Study Conduct arm of Risk Based Monitoring • SAS JMP/Clinical can be used to apply statistical algorithms to the clinical datasets in order to identify outliers in data that could indicate a risk to the study. • We need to use SAS JMP Clinical to identify and manage issues affecting data integrity and GCP misconduct 12 Data Integrity and GCP Misconduct REMIT Develop industry wide guidelines and approach for identifying and managing issues affecting data integrity such as non-random errors, GCP misconduct including fraud at investigational sites. UNMET NEED • No Industry approach currently exists for systematically and pro-actively detecting and handling Data Integrity and GCP Misconduct • Discovery of non-random errors affecting statistical analysis tends to be detected after code breaking without much opportunity to remediate the situation • Very limited investment in detecting fraud (e.g data falsification) • Non-random errors, GCP misconduct/fraud can raise serious doubts on the integrity of clinical study data and jeopardize a submission 13 Data Integrity and GCP Misconduct VALUE PROPOSITION • Recommend best practices detecting and handling data integrity risks GCP and misconducts • Quality improvement with a focus on Data Integrity and GCP Compliance • Prevent unexpected post code breaking data pattern discovery and allow mid-study corrective action • Audit and Inspection findings minimized with focus on filing approval threats • Increase Industry credibility by acting on rare but still harmful fraud cases affecting public perception of medical research 14 Why Statistical Monitoring What is Statistical Monitoring The Methods Being Used Interactive Demonstration Statistical Monitoring in Roche Take Home Messages Methods Being Used - Standard Statistical Oversight Assessments: Already Available TO BE RUN AT 25/50/75 AND 100% OF ENROLLMENT 1. Demographic Distributions 2. Birthdays – This test looks for patients who have duplicate dates of birth. 3. Cluster Subjects Across Sites – This test looks for a patient going to multiple sites or possible fabrication across sites 4. Weekdays and Holidays – This test looks for dosing at unexpected time points 5. Perfect Schedule Attendance – This test checks for sites with no variability in attendance 6. Constant Findings – This test looks for subjects that exhibit no variability in a measurement. 7. Duplicate Records – This test looks for records that are fully duplicated. 8. Digit Preference (leading and trailing) – This test is making comparisons across clinical sites in order to identify quality issues. 9. Multivariate Inliers and Outliers – If patients are overall too close or too far to the mean, this might indicate fabricated data 10. Cluster Subjects Within Study Sites – This test examines patients within a site to identify 16 possible fabricated data Why Statistical Monitoring What is Statistical Monitoring The Methods Being Used Interactive Demonstration Statistical Monitoring in Roche Take Home Messages Interactive Demonstration • Example of Reviewing for Duplicate Dates of Birth • Clustering Subjects Across Sites • Constant Findings • Digit Preference (Trailing) • Multivariate Outliers and Inliers • C:\Users\wellsc2\AppData\Local\SAS\JMPClinical\1 1\JMPC\Output\Nicardipine Why Statistical Monitoring What is Statistical Monitoring The Methods Being Used Interactive Demonstration Statistical Monitoring in Roche Take Home Messages Where are we now? • Challenges setting up licenses on server • Training – Richard Zink’s help invaluable • Work required to get ‘buy in’ from Study teams • Completed 10 studies and delivered reports to Study Teams. Have a further 18/20 studies between now and the end of the year (more possibly) • Some interesting findings, but we are waiting for the study team to inform us as to the relevance of these findings • Need to establish the team strategy for 2017 20 Future Strategy of who runs the tests Currently have a small team of 6 people on varying resource Separate SM Team • Advantages – Keeps all the analyses in one place. – Maintains consistency. – Keeps up to date with new methods – Unbiased – Cost of tools cheaper • Disadvantages – Resource could be an issue – The team may be swamped and it may not get done. – Never embedded in study teams Study Statistician Supported by Expert Team • Advantages – Embeds the methods within a study team. – Resource rests with study team. – The analysis can be run more times and more flexibly – Expert team are free to research new methods and keep up to date • Disadvantages – Steep learning curve – Need more JMP/Clinical licences Why Statistical Monitoring What is Statistical Monitoring The Methods Being Used Interactive Demonstration Statistical Monitoring in Roche Take Home Messages Take Home Messages and Discussion • Statistical monitoring is a reality. The FDA will be conducting Stats Monitoring on submitted studies. Do we want them to find problems before we do?? • Data misconduct is rare so let’s not panic too much, however, the impact is huge, so maybe we should. We MUST be proactive! • The whole industry is moving in on this – we can be (are) leaders • Think of Stats Monitoring like any other analysis of data. It uses statistical procedures. • IT IS NOT DATA CLEANING !!!!!!!!! Doing now what patients need next BACKUP SLIDES Example of Reviewing for Duplicate Dates of Birth This test looks for patients who have duplicate dates of birth. We want to see if there is a chance that we may have 1 patient entering a study/project multiple times (IT DOES HAPPEN!!) Example of Clustering Subjects Across What we are assessing is ‘How Sites similar is too similar?’ Principle Components Analysis, Euclidean Distance Matrices used for analyses. Box plots presented by Covariate subgroups (Gender and Race), then subset to pairs of subjects with very similar demographic characteristics. Once we identify the most similar pairs of subjects from the box plot, we can go to the heat map which can be useful for identifying the cluster membership for any selected pairs of subjects within the hierarchical clustering analysis or for identifying groups of 3 or more that could indicate that the subject has enrolled more than twice. Example of Constant Findings Example of Digit Preference (Trailing) We are now making comparisons across clinical sites in order to identify quality issues. We want to identify anomalies through tests of the trailing or leading digit for all procedures that provide numeric outcomes. CMH row mean scores are used to take advantage of the ordinality of the last/first digit. Further we apply standardized midrank scores to account for the possibility that the observed last/first digits may not be equally spaced from one another. Example of Multivariate Outliers and Inliers An outlier is a statistical observation that is markedly different in value from the others of the sample An inlier is a value that lies close to the mean. The relationship amongst the covariates needs to be considered. Hence looking into this multivariate space is where we can utilize Malahanobis distances (MD). Malahanobis distances can be used to calculate the distance between 2 vectors of data to assess similarity or from a vector to a particular point in multivariate space (typically the multivariate mean or centroid). Doing now what patients need next
© Copyright 2026 Paperzz