Current Edition

Addressing Missing Data in Clinical Trials –The Data Science Approach

Digital Health Technologies (DHTs) have revolutionized clinical trial data collection, while also promising to make research more efficient and more patient-centric. However, shifting the power to input data from clinicians to participants increases the risk of missed data points. This can compromise the ability to draw inferences or lead to incomplete submissions, threatening the success of otherwise highly promising clinical trials.

Digital Data Sources and Missed Datapoints

Recent years have seen the adoption of a wide range of digital health technologies in clinical trials, from wearables and sensors to electronic patient-reported outcomes (ePROs) and diaries. The potential benefits of this shift are well documented and include expanded patient accessibility and inclusivity to increased real-world data.

However, there are also potential pitfalls. It moves control of data generation and entry from highly trained staff to clinical trial participants who may not have the same precision focus or pay the same attention to detail as their professional counterparts. This is compounded by the fact that decentralized trials (DCTs), where the frequency of data collection may be hourly or even daily, vastly increase the number of data points being collected. In fact, phase III clinical trials currently generate an average of 3.6 million data points – three times the data collected by late-stage trials 10 years ago.1

The ability to capture data remotely without supervision can all add up to missing data or values within records or time series. A complete sample size and variability per study protocol, with complete data points, is required for efficient analysis. Missed data points can lead to diminished statistical power and affect analysis, thus negatively impacting a sponsor’s ability to demonstrate product efficacy. If participants with missing values are omitted from the analysis, misleading results might be obtained regarding the effect of treatment, unreliable P values may be obtained, and assessments of the importance of prognostic factors may be inaccurate.2

It follows, then, that it is of utmost importance to have complete data as much as possible.

Power of Data Science

Data science, which combines the power of statistics, advanced analytics, and artificial intelligence (AI) techniques such as machine learning (ML) to uncover actionable insights in large datasets, is at the forefront of many innovations in clinical research.

It is being used in the collection, management, and analysis of clinical data, automating the processes and reducing error rates. This is important because securing the overall quality of clinical data is paramount to ensuring quality care and appropriate decision-making in the medical and healthcare fields. Importantly, it also offers possible solutions to the missing data problem.