Saturday, June 15, 2024

New Insights into Handling Missing Data in EHR Analysis

Similar articles

In the realm of electronic health records (EHR), the challenge of partially observed confounder data often hampers the accuracy of statistical analysis. Traditional methods frequently overlook the underlying mechanisms of data missingness, leading to potential biases. A recent study provides a comprehensive approach to characterizing these missing data processes and evaluates the efficacy of various analytic methods in addressing this issue.

Methodology and Simulation Framework

Researchers focused on three empirical sub-cohorts of diabetic patients initiating either SGLT2 or DPP4-inhibitors. These sub-cohorts had complete data on HbA1c, BMI, and smoking, which served as the confounders of interest (COI). The study employed a plasmode framework for data simulation, incorporating a true null treatment effect and four missingness mechanisms: missing completely at random (MCAR), missing at random (MAR), and two not at random (MNAR) mechanisms. The MNAR mechanisms were further differentiated by whether the missingness was dependent on an unmeasured confounder or on the COI itself.

Diagnostic Evaluations and Results

Three diagnostic groups were employed to distinguish between the different missingness mechanisms. These included assessing differences in characteristics between patients with or without the observed COI (using averaged standardized mean differences [ASMD]), the predictive ability of the missingness indicator based on observed covariates, and the association of the missingness indicator with the outcome. Findings revealed that for MAR, patient characteristics exhibited significant differences (median ASMD 0.20 vs 0.05 for MCAR), enhancing the discrimination of prediction models for missingness (0.59 vs 0.50). For MNAR, there was a notable association between the missingness and the outcome, even after adjusting for observed covariates.

Analytic Method Comparison

Various analytic methods were compared for their ability to recover true treatment effects. Multiple imputation, particularly using a random forest algorithm, demonstrated the lowest root-mean-squared-error, indicating its superior performance in handling missing data.

Practical Implications for EHR Analysis

– When dealing with MAR, identifying characteristic differences between patients can improve missingness predictions.
– For MNAR mechanisms, it is crucial to consider the association between missingness and outcomes, even after adjusting for other covariates.
– Multiple imputation using nonparametric models like random forest can effectively reduce bias and improve the accuracy of treatment effect estimation.

The study underscores the importance of principled diagnostics in providing reliable insights into missing data mechanisms. When appropriate assumptions are met, employing multiple imputation with nonparametric models can significantly mitigate bias, offering a more robust approach to EHR data analysis.

Original Article: Clin Epidemiol. 2024 May 21;16:329-343. doi: 10.2147/CLEP.S436131. eCollection 2024.

You can follow our news on our Telegram and LinkedIn accounts.

Subscribe to our newsletter

To be updated with all the latest news, offers and special announcements.

Latest article