Research On an Optimal Nipt Timing Decision Model Driven by Multidimensional Factors Based on Machine Learning
DOI:
https://doi.org/10.54097/mqcwhs29Keywords:
Machine learning, CatBoost regression, Cox regression, K-means clustering, decision optimization.Abstract
The accuracy of non-invasive prenatal testing (NIPT) is highly dependent on selecting the optimal testing time, which varies due to individual differences among pregnant women. To address the limitations of the traditional “one-size-fits-all” testing approach, this paper constructs an intelligent model framework that integrates prediction, stratification, and decision-making. First, the framework employs a CatBoost regression model to accurately predict the key indicator of fetal cell-free DNA (cffDNA)—Y chromosome concentration—based on multidimensional physiological indicators of pregnant women, achieving a coefficient of determination (R²) of 0.717 on the test set. Next, a K-means clustering algorithm is used to achieve data-driven refined stratification of the study population, and principal component analysis (PCA) is applied to process high-dimensional features. Building on this, the study innovatively introduces a Cox proportional hazards model to construct a risk function and establishes an optimization model to calculate personalized optimal NIPT timing for different stratified groups of pregnant women. For example, in the comprehensive multi-factor model, the optimal testing times determined for the low-BMI and high-BMI groups were 11.997 weeks and 10.420 weeks, respectively. In addition, to address the issue of detecting female fetal chromosomal aneuploidy, a decision tree classification model was developed, achieving an average accuracy of 0.886 with five-fold cross-validation. The core contribution of this study lies in the deep integration of machine learning prediction models with operational optimization and decision models, achieving a paradigm shift from “passive prediction” to “active decision-making,” and providing a scientific, efficient, and individualized decision support solution for clinical NIPT practice.
Downloads
References
[1] Bianchi D W, Parker R L, Wentworth J, et al. DNA sequencing versus standard prenatal aneuploidy screening[J]. New England journal of medicine, 2014, 370(9): 799-808.
[2] Jayashankar S S, Nasaruddin M L, Hassan M F, et al. Non-invasive prenatal testing (NIPT): reliability, challenges, and future directions[J]. Diagnostics, 2023, 13(15): 2570.
[3] Hsieh V, Sherer D M, Davydovych K, et al. The Art (and Science) of Individualized Selection of Non-Invasive Prenatal Screening (NIPS)[J]. International Journal of Women's Health, 2025: 1271-1283.
[4] Faieta M, Falcone R, Duca S, et al. Test performance and clinical utility of expanded non‐invasive prenatal test: Experience on 71,883 unselected routine cases from one single center[J]. Prenatal Diagnosis, 2024, 44(8): 936-945.
[5] Kulkarni C S. Advancing gradient boosting: A comprehensive evaluation of the CatBoost algorithm for predictive modeling[J]. J. Artif. Intell. Mach. Learn. Data Sci, 2022, 1(5): 54-57.
[6] Dorogush A V, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support[J]. arXiv preprint arXiv:1810.11363, 2018.
[7] Biau G, Cadre B. Optimization by gradient boosting[M]//Advances in Contemporary Statistics and Econometrics: Festschrift in Honor of Christine Thomas-Agnan. Cham: Springer International Publishing, 2021: 23-44.
[8] Zubair M, Iqbal M D A, Shil A, et al. An improved K-means clustering algorithm towards an efficient data-driven modeling[J]. Annals of Data Science, 2024, 11(5): 1525-1544.
[9] Kalbfleisch J D, Schaubel D E. Fifty years of the cox model[J]. Annual Review of Statistics and Its Application, 2023, 10(1): 1-23.
[10] Abd ElHafeez S, D’Arrigo G, Leonardis D, et al. Methods to analyze time‐to‐event data: the Cox regression analysis[J]. Oxidative medicine and cellular longevity, 2021, 2021(1): 1302811.
[11] Wei L, Zhang J, Shi N, et al. Association of maternal risk factors with fetal aneuploidy and the accuracy of prenatal aneuploidy screening: a correlation analysis based on 12,186 karyotype reports[J]. BMC Pregnancy and Childbirth, 2023, 23(1): 136.
[12] Wei L, Zhang J, Shi N, et al. Effects of Maternal Factors on Fetal Aneuploidy and Reliability of Screening: A Cohort Study Based on 12,186 Karyotype Reports[J]. 2022.
[13] Mienye I D, Jere N. A survey of decision trees: Concepts, algorithms, and applications[J]. IEEE access, 2024, 12: 86716-86727.
[14] Kassambara A. Practical guide to cluster analysis in R: Unsupervised machine learning[M]. Sthda, 2017.
[15] Sjölander A, Dickman P W. Why test for proportional hazards—or any other model assumptions?[J]. American journal of epidemiology, 2024, 193(6): 926-927.
[16] Schober P, Vetter T R. Kaplan-Meier curves, log-rank tests, and cox regression for time-to-event data[J]. Anesthesia & Analgesia, 2021, 132(4): 969-970.
[17] Elmerdahl Frederiksen L, Ølgaard S M, Roos L, et al. Maternal age and the risk of fetal aneuploidy: A nationwide cohort study of more than 500 000 singleton pregnancies in Denmark from 2008 to 2017[J]. Acta obstetricia et gynecologica Scandinavica, 2024, 103(2): 351-359.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







