Research On an Optimal Nipt Timing Decision Model Driven by Multidimensional Factors Based on Machine Learning

Authors

  • Yangzhiping Chen

DOI:

https://doi.org/10.54097/mqcwhs29

Keywords:

Machine learning, CatBoost regression, Cox regression, K-means clustering, decision optimization.

Abstract

The accuracy of non-invasive prenatal testing (NIPT) is highly dependent on selecting the optimal testing time, which varies due to individual differences among pregnant women. To address the limitations of the traditional “one-size-fits-all” testing approach, this paper constructs an intelligent model framework that integrates prediction, stratification, and decision-making. First, the framework employs a CatBoost regression model to accurately predict the key indicator of fetal cell-free DNA (cffDNA)—Y chromosome concentration—based on multidimensional physiological indicators of pregnant women, achieving a coefficient of determination (R²) of 0.717 on the test set. Next, a K-means clustering algorithm is used to achieve data-driven refined stratification of the study population, and principal component analysis (PCA) is applied to process high-dimensional features. Building on this, the study innovatively introduces a Cox proportional hazards model to construct a risk function and establishes an optimization model to calculate personalized optimal NIPT timing for different stratified groups of pregnant women. For example, in the comprehensive multi-factor model, the optimal testing times determined for the low-BMI and high-BMI groups were 11.997 weeks and 10.420 weeks, respectively. In addition, to address the issue of detecting female fetal chromosomal aneuploidy, a decision tree classification model was developed, achieving an average accuracy of 0.886 with five-fold cross-validation. The core contribution of this study lies in the deep integration of machine learning prediction models with operational optimization and decision models, achieving a paradigm shift from “passive prediction” to “active decision-making,” and providing a scientific, efficient, and individualized decision support solution for clinical NIPT practice.

Downloads

Download data is not yet available.

References

[1] Bianchi D W, Parker R L, Wentworth J, et al. DNA sequencing versus standard prenatal aneuploidy screening[J]. New England journal of medicine, 2014, 370(9): 799-808.

[2] Jayashankar S S, Nasaruddin M L, Hassan M F, et al. Non-invasive prenatal testing (NIPT): reliability, challenges, and future directions[J]. Diagnostics, 2023, 13(15): 2570.

[3] Hsieh V, Sherer D M, Davydovych K, et al. The Art (and Science) of Individualized Selection of Non-Invasive Prenatal Screening (NIPS)[J]. International Journal of Women's Health, 2025: 1271-1283.

[4] Faieta M, Falcone R, Duca S, et al. Test performance and clinical utility of expanded non‐invasive prenatal test: Experience on 71,883 unselected routine cases from one single center[J]. Prenatal Diagnosis, 2024, 44(8): 936-945.

[5] Kulkarni C S. Advancing gradient boosting: A comprehensive evaluation of the CatBoost algorithm for predictive modeling[J]. J. Artif. Intell. Mach. Learn. Data Sci, 2022, 1(5): 54-57.

[6] Dorogush A V, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support[J]. arXiv preprint arXiv:1810.11363, 2018.

[7] Biau G, Cadre B. Optimization by gradient boosting[M]//Advances in Contemporary Statistics and Econometrics: Festschrift in Honor of Christine Thomas-Agnan. Cham: Springer International Publishing, 2021: 23-44.

[8] Zubair M, Iqbal M D A, Shil A, et al. An improved K-means clustering algorithm towards an efficient data-driven modeling[J]. Annals of Data Science, 2024, 11(5): 1525-1544.

[9] Kalbfleisch J D, Schaubel D E. Fifty years of the cox model[J]. Annual Review of Statistics and Its Application, 2023, 10(1): 1-23.

[10] Abd ElHafeez S, D’Arrigo G, Leonardis D, et al. Methods to analyze time‐to‐event data: the Cox regression analysis[J]. Oxidative medicine and cellular longevity, 2021, 2021(1): 1302811.

[11] Wei L, Zhang J, Shi N, et al. Association of maternal risk factors with fetal aneuploidy and the accuracy of prenatal aneuploidy screening: a correlation analysis based on 12,186 karyotype reports[J]. BMC Pregnancy and Childbirth, 2023, 23(1): 136.

[12] Wei L, Zhang J, Shi N, et al. Effects of Maternal Factors on Fetal Aneuploidy and Reliability of Screening: A Cohort Study Based on 12,186 Karyotype Reports[J]. 2022.

[13] Mienye I D, Jere N. A survey of decision trees: Concepts, algorithms, and applications[J]. IEEE access, 2024, 12: 86716-86727.

[14] Kassambara A. Practical guide to cluster analysis in R: Unsupervised machine learning[M]. Sthda, 2017.

[15] Sjölander A, Dickman P W. Why test for proportional hazards—or any other model assumptions?[J]. American journal of epidemiology, 2024, 193(6): 926-927.

[16] Schober P, Vetter T R. Kaplan-Meier curves, log-rank tests, and cox regression for time-to-event data[J]. Anesthesia & Analgesia, 2021, 132(4): 969-970.

[17] Elmerdahl Frederiksen L, Ølgaard S M, Roos L, et al. Maternal age and the risk of fetal aneuploidy: A nationwide cohort study of more than 500 000 singleton pregnancies in Denmark from 2008 to 2017[J]. Acta obstetricia et gynecologica Scandinavica, 2024, 103(2): 351-359.

Downloads

Published

12-03-2026

How to Cite

Chen, Y. (2026). Research On an Optimal Nipt Timing Decision Model Driven by Multidimensional Factors Based on Machine Learning. Highlights in Science, Engineering and Technology, 161, 212-220. https://doi.org/10.54097/mqcwhs29