METHOD FOR DETERMINING PREGNANCY STATUS OF PREGNANT WOMAN

20230115196 · 2023-04-13

Assignee

Inventors

Cpc classification

International classification

Abstract

Provided is a method for determining a pregnancy status of a pregnant woman, including: (1) constructing a training set and a selective verification set, each of the training set and the selective verification set being composed of pregnant woman samples each having a known pregnancy status; (2) determining predetermined parameters of each pregnant woman sample in the training set, the predetermined parameters including a concentration of fetal cell-free nucleic acids in peripheral blood and a gestational age in week at which sampling for the peripheral blood is conducted; (3) constructing a prediction model based on the known pregnancy status and the predetermined parameters; (4) determining predetermined parameters of the pregnant woman; and (5) determining the pregnancy status of the pregnant woman based on the predetermined parameters and the constructed prediction model.

Claims

1. A method for constructing a prediction model for determining a pregnancy status of a pregnant woman, the method comprising: (i) constructing a training set and a selective validation set, each of the training set and the selective validation set being composed of a plurality of pregnant women samples each having a known pregnancy status; (ii) determining predetermined parameters of each pregnant woman sample in the training set, the predetermined parameters comprising a concentration of fetal cell-free nucleic acids in peripheral blood of the pregnant woman sample and a gestational age in week at which sampling for the peripheral blood of the pregnant woman sample is conducted; and (iii) constructing the prediction model based on the known pregnancy status and the predetermined parameters.

2. The method according to claim 1, wherein the pregnancy status comprises a delivery interval of the pregnant woman.

3. The method according to claim 1, wherein the gestational age in week at which the sampling is conducted is 13 to 25 weeks.

4. The method according to claim 1, wherein the prediction model is at least one of a linear regression model, a logistic regression model, or a random forest.

5. The method according to claim 4, wherein the predetermined parameters further comprise a height, a body weight, and/or an age of the pregnant woman sample.

6. The method according to claim 1, wherein the step (iii) comprises: determining, by using the training set and the selective validation set, numerical values of β.sub.0, β.sub.icff, β.sub.isample, β.sub.iheight, β.sub.iweight, β.sub.iage, and ε.sub.i for the following formula: I.sub.i = β.sub.0 + β.sub.icff×.sub.icff + β.sub.isample×.sub.isample + β.sub.iheight×.sub.iheight + β.sub.iweight×.sub.iweight + β.sub.iage×.sub.iage + ε.sub.i, where i = 1, ..., p, wherein i represents a serial number of the pregnant woman sample in the training set; l.sub.i is a value determined for the known pregnancy status of the pregnant woman sample No.i, wherein l.sub.i is 1 for the pregnant woman sample with premature delivery, and l.sub.i is 0 for the pregnant woman sample with full-term delivery; x.sub.icff represents the concentration of fetal cell-free nucleic acids for the pregnant woman sample No.i; x.sub.isample represents the gestational age in week at which the sampling for the peripheral blood of the pregnant woman sample No.i is conducted; x.sub.iheight represents a height of the pregnant woman sample No.i; x.sub.iweight represents a body weight of the pregnant woman sample No.i; x.sub.iage represents an age of the pregnant woman sample No.i; and ε.sub.i represents a sequencing error of the peripheral blood of the pregnant woman sample No.i.

7. A method for determining a pregnancy status of a pregnant woman, comprising: (1) determining predetermined parameters of the pregnant woman, the predetermined parameters comprising a concentration of fetal cell-free nucleic acids in peripheral blood of the pregnant woman and a gestational age in week at which sampling for the peripheral blood of the pregnant woman is conducted; and (2) determining the pregnancy status of the pregnant woman based on the predetermined parameters and the prediction model constructed by the method according to claim 1.

8. The method according to claim 7, wherein the pregnancy status comprises a delivery interval of the pregnant woman.

9. The method according to claim 8, wherein the gestational age in week at which the sampling is conducted is 13 to 25 weeks.

10. The method according to claim 8, wherein the prediction model is at least one of a linear regression model, a logistic regression model, or a random forest.

11. The method according to claim 10, wherein the predetermined parameters further comprise a height, a body weight, and/or an age of the pregnant woman, and the prediction model is adapted to calculate the delivery interval of the pregnant woman based on the following formula: l = β.sub.0 + β.sub.cffx.sub.cff + β.sub.samplex.sub.sample + β.sub.heightx.sub.height + β.sub.weightx.sub.weight + β.sub.agex.sub.age + ε, wherein, l is a parameter determined based on a probability of premature delivery of the pregnant woman; β.sub.0, β.sub.cff, β.sub.sample, β.sub.height, β.sub.weight, and ε are each independently a predetermined coefficient; x.sub.cff is the concentration of fetal cell-free nucleic acids of the pregnant woman; x.sub.sample is the gestational age in week at which the sampling for the peripheral blood of the pregnant woman is conducted; x.sub.height is the height of the pregnant woman; x.sub.weight is the body weight of the pregnant woman; x.sub.age is the age of the pregnant woman; and ε.sub.i is a sequencing error of a peripheral blood sample of the pregnant woman.

12. The method according to claim 11, wherein l is determined based on the following formula: l = log b p 1 p , wherein, b is a base number of log and is generally a constant e; and p is the probability of premature delivery of the pregnant woman.

13. A computer-readable storage medium, having a computer program stored thereon, wherein the program, when executed by a processor, implements steps of the method according to claim 1.

14. The computer-readable storage medium according to claim 13, wherein the method further satisfies any one or more of the following conditions: the pregnancy status comprises a delivery interval of the pregnant woman; the gestational age in week at which the sampling is conducted is 13 to 25 weeks; or the prediction model is at least one of a linear regression model, a logistic regression model, or a random forest.

15. The computer-readable storage medium according to claim 13, wherein the step (iii) of the method comprises: determining, by using the training set and the selective validation set, numerical values of β.sub.0, β.sub.cff, β.sub.isample, β.sub.iheight, β.sub.iweight, β.sub.iage, and ε.sub.i for the following formula: l.sub.i = β.sub.0 + β.sub.icffx.sub.icff + β.sub.isamplex.sub.isample + β.sub.iheightx.sub.iheight + β.sub.iweightx.sub.iweight + β.sub.iagex.sub.iage + ε.sub.i, where i = 1, ..., p, wherein i represents a serial number of the pregnant woman sample in the training set; l.sub.i is a value determined for the known pregnancy status of the pregnant woman sample No.i, wherein l.sub.i is 1 for the pregnant woman sample with premature delivery, and l.sub.i is 0 for the pregnant woman sample with full-term delivery; x.sub.icff represents the concentration of fetal cell-free nucleic acids for the pregnant woman sample No.i; x.sub.isample represents the gestational age in week at which the sampling for the peripheral blood of the pregnant woman sample No. i is conducted; x.sub.iheight represents a height of the pregnant woman sample No.i; x.sub.iweight represents a body weight of the pregnant woman sample No.i; x.sub.iage represents an age of the pregnant woman sample No.i; and ε.sub.i represents a sequencing error of the peripheral blood of the pregnant woman sample No.i.

16. A computer-readable storage medium, having a computer program stored thereon, wherein the program, when executed by a processor, implements steps of the method according to claim 7.

17. The computer-readable storage medium according to claim 16, wherein the method further satisfies any one or more of the following conditions: the pregnancy status comprises a delivery interval of the pregnant woman; the gestational age in week at which the sampling is conducted is 13 to 25 weeks; or the prediction model is at least one of a linear regression model, a logistic regression model, or a random forest.

18. The computer-readable storage medium according to claim 16, wherein in the method, the prediction model is adapted to calculate the delivery interval of the pregnant woman based on the following formula: l = β.sub.0 + β.sub.cffx.sub.cff + β.sub.samplex.sub.sample + β.sub.heightx.sub.height + β.sub.weightx.sub.weight + β.sub.agex.sub.age + ε, wherein, l is a parameter determined based on a probability of premature delivery of the pregnant woman; β.sub.0, β.sub.cff, β.sub.sample, β.sub.height, β.sub.weight, and ε are each independently a predetermined coefficient; x.sub.cff is the concentration of fetal cell-free nucleic acids of the pregnant woman; x.sub.sample is the gestational age in week at which the sampling for the peripheral blood of the pregnant woman is conducted; x.sub.height is the height of the pregnant woman; x.sub.weight is the body weight of the pregnant woman; x.sub.a9e is the age of the pregnant woman; and ε.sub.i is a sequencing error of a peripheral blood sample of the pregnant woman.

19. An electronic device, comprising: a computer-readable storage medium according to claim 13; and one or more processors configured to execute the program in the computer-readable storage medium.

20. An electronic device, comprising: a computer-readable storage medium according to claim 16; and one or more processors configured to execute the program in the computer-readable storage medium.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0042] The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

[0043] FIG. 1 is a graph showing the correlation of premature delivery and fetal cfDNA concentrations in different gestational ages in week at which blood sampling was conducted according to an embodiment of the present disclosure;

[0044] FIG. 2 is a graph showing changes in specificity, sensitivity, and accuracy under different premature delivery probability thresholds that were set when predicting premature delivery using a test data set according to an embodiment of the present disclosure;

[0045] FIG. 3 is a graph showing the distribution of predicted gestational ages in week at delivery and actual gestational ages in week at delivery according to an embodiment of the present disclosure;

[0046] FIG. 4 is a schematic flowchart of a method for constructing a prediction model according to an embodiment of the present disclosure;

[0047] FIG. 5 is a block diagram of a system for constructing a prediction model according to an embodiment of the present disclosure;

[0048] FIG. 6 is a schematic flowchart of a method for determining a pregnancy status of a pregnant woman according to an embodiment of the present disclosure; and

[0049] FIG. 7 is a block diagram of an apparatus for a method for determining a pregnancy status of a pregnant woman according to an embodiment of the present disclosure.

DESCRIPTION OF EMBODIMENTS

[0050] Embodiments of the present disclosure will be described in detail below, examples of which are illustrated in the accompanying drawings. The examples described below with reference to the accompanying drawings are illustrative, which are merely intended to explain the present disclosure, rather than to limit the present disclosure.

Explanation of Terms

[0051] As used herein, the terms “first”, “second”, “third”, and other similar terms, unless specifically stated otherwise, are used for descriptive purposes to distinguish one from another and are not intended to imply or express any differences in order or importance, and it is not intended to mean that a content defined by terms such as “first”, “second”, “third” and the like consists of only one element.

[0052] In the present disclosure, unless otherwise clearly specified and limited, the terms “installation”, “interconnection”, “connection” and “fixation” etc. are intended to be understood in a broad sense, for example, it may be a fixed connection, removable connection or integral connection; may be a mechanical connection or electrical connection; may be a direct connection or indirect connection using an intermediate; and may be a communication within two elements or an interaction relationship between the two elements, unless explicitly limited otherwise. A person of ordinary skill in the art can understand specific meanings of these terms in the present disclosure based on specific situations.

[0053] According to one aspect of the present disclosure, a method for constructing a prediction model is provided. According to an embodiment of the present disclosure, referring to FIG. 4, the prediction model is configured to determine a pregnancy status of a pregnant woman, and the method includes:

[0054] S1000, constructing a training set and a selective validation set, each of the training set and the validation set being composed of a plurality of pregnant woman samples each having a known pregnancy status;

[0055] S2000, determining predetermined parameters of each pregnant woman sample in the training set, the predetermined parameters including a concentration of fetal cell-free nucleic acids in peripheral blood of the pregnant woman and a gestational age in week at which sampling for the peripheral blood of the pregnant woman is conducted; and

[0056] S3000, constructing the prediction model based on the known pregnancy status and the predetermined parameters. The method according to the embodiment of the present disclosure constructs a prediction model for the pregnancy status of the pregnant woman based on the concentration of fetal cell-free nucleic acids obtained via one-time blood sampling for a plurality of pregnant woman samples, the gestational age in week at which the sampling is conducted, the physical signs (such as height, body weight, BMI, and age) of the pregnant woman when the sampling is conducted, and the pregnancy status (such as premature delivery and gestational age in week at delivery) of the pregnant woman when the sampling is conducted, and the method includes two key factors, the concentration of fetal cell-free nucleic acids and the sampling gestational age in week, so that the accuracy of the model is improved. According to an embodiment of the present disclosure, the concentration of fetal cell-free nucleic acids is obtained by data processing using sequencing data of the cell-free nucleic acids in the plasma of a pregnant woman as input data, and specifically includes: after the quality control of raw sequencing data (fq format) is finished, aligning the sequencing data to human reference chromosomes by using alignment software (such as a samse mode in BWA); using sequencing data quality control software (such as Picard) to remove the repeated reads in the alignment results and calculate the repetition rate; completing the local correction of the alignment results by using mutation detection algorithm (such as Base Quality Score Recalibration BQSR function in GATK); and calculating the average depth of different chromosomes in each sample by using coverage depth calculation software (such as Depth of Coverage function in GATK). For male fetus samples, the mean depth of coverage of the unique alignment reads matching the non-homologous region of Y chromosome is calculated, and the ratio of this mean depth to the mean depth of the unique alignment reads matching autosome is the concentration of fetal cell-free nucleic acids. For female fetus samples, calculation can be performed using existing methods for calculating the concentration of fetal cell-free nucleic acids based on low-depth sequencing data of maternal plasma.

[0057] According to a specific embodiment of the present disclosure, in the method of the present disclosure, pregnant woman samples are selected as a training set and a validation set, a prediction model is constructed based on the known pregnancy status, concentration of fetal cell-free nucleic acids, height, body weight, age, BMI, and gestational age in week at which blood sampling is conducted (13 to 25 weeks) in the training set, and the magnitude of each fixed coefficient in the prediction model formula is then determined, so as to predict the pregnancy status of the pregnant woman to be detected.

[0058] According to an embodiment of the present disclosure, the pregnancy status includes a delivery interval of the pregnant woman. The method according to the embodiment of the present disclosure can be used to predict the probability of premature delivery, intrauterine growth retardation of a fetus at the gestational age in week at delivery, and other pregnancy complications associated with the concentration of fetal cell-free nucleic acids.

[0059] According to an embodiment of the present disclosure, the gestational age in week at which the sampling is conducted is 13 to 25 weeks. The inventors found that there was a weak correlation between fetal cell-free nucleic acid concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 12 weeks or less or between 26 weeks and 30 weeks, while there was a strong correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 13 to 25 weeks. Generally, there is a problem of weak correlation in the prediction of the pregnancy status of pregnant women using the concentration of fetal cell-free nucleic acids. According to the method of the embodiment of the present disclosure, the gestational age in week at which sampling is conducted is added as one of the parameters for constructing the prediction model, which improves the accuracy of prediction. Different pregnant woman samples can be used as model construction samples only with one-time blood sampling within a gestational age of 13 to 25 weeks, avoiding the risk and cost of repeated blood samplings for pregnant woman samples in the process of sample collection.

[0060] According to an embodiment of the present disclosure, the prediction model is at least one of a linear regression model, a logistic regression model, or a random forest. According to an embodiment of the present disclosure, the prediction model may be theoretically any statistical model that generalizes different difference distributions.

[0061] According to an embodiment of the present disclosure, the predetermined parameters further include a height, a body weight, and an age of the pregnant woman.

[0062] According to an embodiment of the present disclosure, the step (iii) includes determining, by using the training set and the validation set, numerical values of β.sub.0 ,

[00006]βicff,βisample,βiheight,βweightβiage,and εi

for the following formula:

[00007]li=β0+

[00008]βicffxicff+βisamplexisample+βiheightβiheight+βweightxiweight+βiagexiage+εi,

where i = 1,..., p, wherein i represents a serial number of the pregnant woman sample in the training set; l.sub.i is a value determined for the known pregnancy status of the pregnant woman sample No.i, wherein l.sub.i is 1 for the pregnant woman sample with premature delivery and l.sub.i is 0 for the pregnant woman sample with full-term delivery; x.sub.icff represents the concentration of fetal cell-free nucleic acids of the pregnant woman sample No.i; x.sub.isample represents the gestational age in week at which the sampling for the peripheral blood of the pregnant woman sample No.i is conducted; x.sub.iheight represents the height of the pregnant woman sample No. i; x.sub.iweight represents the body weight for the pregnant woman sample No.i; x.sub.iage represents the age of the pregnant woman sample No.i; and ε.sub.irepresents a sequencing error of the peripheral blood of the pregnant woman sample No.i. It should be noted that ε is the random error generated by the sequencer during the sequencing process, and this value is associated with the sequencing batch but independent of the pregnant woman sample, and will be directly generated by the sequencer when downloading the sequencing data from the sequencer.

[0063] According to a second aspect of the present disclosure, a system for constructing a prediction model is provided. According to an embodiment of the present disclosure, the prediction model is used to determine a pregnancy status of a pregnant woman, and with reference to FIG. 5, the apparatus includes: a training set construction module 1000 configured to construct a training set composed of a plurality of pregnant woman samples each having a known pregnancy status; a predetermined parameter determination module 2000 connected to the training set construction module 1000 and configured to determine predetermined parameters of each pregnant woman sample in the training set, the predetermined parameters including a concentration of fetal cell-free nucleic acids in peripheral blood of the pregnant woman and a gestational age in week at which the sampling for the peripheral blood of the pregnant woman is conducted; and a prediction model construction module 3000 connected to the predetermined parameter determination module 2000 and configured to construct the prediction model based on the known pregnancy status and the predetermined parameters. The system according to the embodiment of the present disclosure constructs a prediction model for the pregnancy status of a pregnant woman based on the concentration of fetal cell-free nucleic acids obtained via one-time blood sampling for a plurality of pregnant woman samples, the gestational age in week at which the sampling is conducted, the physical signs (such as height, body weight, BMI, and age) of the pregnant woman when the sampling is conducted, and the pregnancy status (such as premature delivery and gestational age in week at delivery) of the pregnant woman when the sampling is conducted. The apparatus uses two key factors, the concentration of fetal cell-free nucleic acids and the gestational age in week at which the sampling is conducted, as the key parameters for constructing the model, so that the accuracy of the constructed model is improved.

[0064] According to an embodiment of the present disclosure, the pregnancy status includes a delivery interval of the pregnant woman. The method according to the embodiment of the present disclosure can be used to predict the probability of premature delivery, intrauterine growth retardation of a fetus at the gestational age in week at delivery, and other pregnancy complications associated with the concentration of fetal cell-free nucleic acids.

[0065] According to an embodiment of the present disclosure, the gestational age in week at which the sampling is conducted is 13 to 25 weeks. The inventors found that there was a weak correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 12 weeks or less or between 26 weeks and 30 weeks, while there was a strong correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 13 to 25 weeks. Generally, there is a problem of weak correlation in the prediction of the pregnancy status of pregnant women using the concentration of fetal cell-free nucleic acids. According to the system of the embodiment of the present disclosure, the gestational age in week at which the sampling is conducted is added as one of the parameters for constructing the prediction model, which improves the accuracy of prediction. Different pregnant woman samples can be used as model construction samples only with one-time blood sampling within the gestational age of 13 to 25 weeks, avoiding the risk and cost of repeated blood samplings for pregnant woman samples in the process of sample collection.

[0066] According to an embodiment of the present disclosure, the prediction model is at least one of a linear regression model, a logistic regression model, or a random forest. In the system according to an embodiment of the present disclosure, the prediction model may be theoretically any statistical model that generalizes different difference distributions.

[0067] According to an embodiment of the present disclosure, the predetermined parameters further include a height, a body weight, and an age of the pregnant woman.

[0068] According to an embodiment of the present disclosure, the prediction model construction module is configured to determine, by using the training set and a validation set, numerical values of

[00009]β0,βicff,βisample,βiheight,βiweight,βiage,and εi

for the following formula:

[00010]li=β0+βicffxicff+βisamplexisample+βiheightxiheight+βiweightxiweight+

[00011]βiagexiage+εi,wherei=1,.Math.,p,

wherein i represents a serial number of the pregnant woman sample in the training set; l.sub.i is a value determined for the known pregnancy status of the pregnant woman sample No. i, wherein l.sub.i is 1 for the pregnant woman sample with premature delivery and l.sub.i is 0 for the pregnant woman sample with full-term delivery; x.sub.icff represents the concentration of fetal cell-free nucleic acids of the pregnant woman sample No.i; x.sub.isample represents the gestational age in week at which the sampling for the peripheral blood of the pregnant woman sample No.i is conducted; x.sub.iheight represents the height of the pregnant woman sample No.i; x.sub.iweight represents the body weight for the pregnant woman sample No.i; x.sub.iage represents the age of the pregnant woman sample No.i; and ε.sub.i represents a sequencing error of the peripheral blood of the pregnant woman sample No.i.

[0069] In a third aspect, the present disclosure provides a method for determining a pregnancy status of a pregnant woman. According to an embodiment of the present disclosure, referring to FIG. 6, the method includes:

[0070] S100, determining predetermined parameters of the pregnant woman, the predetermined parameters including a concentration of fetal cell-free nucleic acids in peripheral blood of the pregnant woman and a gestational age in week at which sampling for the peripheral blood of the pregnant woman is conducted; and

[0071] S200, determining the pregnancy status of the pregnant woman based on the predetermined parameters and the prediction model. According to the method of an embodiment of the present disclosure, the concentration of fetal cell-free nucleic acids is obtained by data processing using sequencing data of the cell-free nucleic acids in the plasma of the pregnant woman as input data, specifically including: after the quality control of raw sequencing data (fq format) is finished, aligning the sequencing data to human reference chromosomes by using alignment software (such as a samse mode in BWA); using sequencing data quality control software (such as Picard) to remove the repeated reads in the alignment results and calculate the repetition rate; completing the local correction of the alignment results by using mutation detection algorithm (such as Base Quality Score Recalibration BQSR function in GATK); and calculating the average depth of different chromosomes in each sample by using coverage depth calculation software (such as Depth of Coverage function in GATK). For male fetus samples, the mean depth of coverage of the unique alignment reads matching the non-homologous region of Y chromosome is calculated, and the ratio of this mean depth to the mean depth of the unique alignment reads matching autosome is the concentration of fetal cell-free nucleic acids. For female fetus samples, calculation can be performed using existing methods for calculating the concentration of fetal cell-free nucleic acids based on low-depth sequencing data of maternal plasma.

[0072] According to an embodiment of the present disclosure, the pregnancy status includes a delivery interval of the pregnant woman. The method according to the embodiment of the present disclosure can be used to predict the probability of premature delivery, intrauterine growth retardation of a fetus at the gestational age in week at delivery, and other pregnancy complications associated with the concentration of fetal cell-free nucleic acids.

[0073] According to an embodiment of the present disclosure, the gestational age in week at which the sampling is conducted is 13 to 25 weeks. The inventors found that there was a weak correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 12 weeks or less or between 26 weeks and 30 weeks, while there was a strong correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 13 to 25 weeks. Generally, there is a problem of weak correlation in the prediction of the pregnancy status of pregnant women using the concentration of fetal cell-free nucleic acids. According to the method of the embodiment of the present disclosure, the gestational age in week at which the sampling is conducted is added as one of the parameters for constructing the prediction model, which improves the accuracy of prediction, and blood sampling of the pregnant women only need to be conducted once within the gestational age of 13 to 25 weeks, which reduces the cost and risk of multiple blood samplings.

[0074] According to an embodiment of the present disclosure, the predetermined prediction model is at least one of a linear regression model, a logistic regression model, or a random forest. According to an embodiment of the present disclosure, the prediction model may be theoretically any statistical model that generalizes different difference distributions.

[0075] According to a specific embodiment of the present disclosure, the method of the present disclosure constructs a prediction model based on the known pregnancy status, concentration of fetal cell-free nucleic acids, height, body weight, age, BMI, and gestational age in week (13 to 25 weeks) at which blood sampling is conducted, and determines the magnitude of each fixed coefficient in the prediction model formula, so as to predict the pregnancy status of the pregnant woman to be detected. At the gestational age of 13 to 25 weeks, the peripheral blood of the pregnant woman to be tested is collected to detect the concentration of fetal cell-free nucleic acids, and the information about the concentration of fetal cell-free nucleic acids, height, body weight, age, BMI, and gestational age in week of the pregnant woman are input to the prediction model, so as to obtain prediction information of the pregnancy status of the pregnant woman to be tested.

[0076] According to a specific embodiment of the present disclosure, the predetermined parameters further include a height, a body weight, and an age of the pregnant woman, and the prediction model is adapted to calculate a delivery interval of the pregnant woman based on the following formula:

[00012]l=β0+βcffxcff+βsamplexsample+βheightxheight+

[00013]βweightxweight+βagexage+ε,

wherein l is a parameter determined based on the probability of premature delivery of the pregnant woman;

[00014]β0,βcff,βsample,βheight,βweight,

and ε are each independently a predetermined coefficient; x.sub.cff is the concentration of fetal cell-free nucleic acids of the pregnant woman; x.sub.sample is the gestational age in week at which the sampling for the peripheral blood of the pregnant woman is conducted; x.sub.height is the height of the pregnant woman; x.sub.weight is the body weight of the pregnant woman; x.sub.age is the age of the pregnant woman, and ε.sub.i is a sequencing error of a peripheral blood sample of the pregnant woman. According to the method of an embodiment of the present disclosure, the coefficients

[00015]β0,βcff,βsample,βheight,andβweight

may be freely selected as needed, for example, the pregnant woman BMI may be additionally added as one of the coefficients.

[0077] According to an embodiment of the present disclosure, l is determined based on the following formula:

[00016]l=logbP1p,

wherein b is a base number of log and is generally a constant e, and p is the probability of premature delivery of the pregnant woman.

[0078] In a fourth aspect of the present disclosure, the present disclosure provides an apparatus for determining a pregnancy status of a pregnant woman, and according to an embodiment of the present disclosure, with reference to FIG. 7, the apparatus includes: a parameter determination module 100 configured to determine predetermined parameters of the pregnant woman, the predetermined parameters including a concentration of fetal cell-free nucleic acids in peripheral blood of the pregnant woman and a gestational age in week at which sampling for the peripheral blood of the pregnant woman is conducted; and a pregnancy status determination module 200 connected to the parameter determination module 100 and configured to determine the pregnancy status of the pregnant woman based on the predetermined parameters and the prediction model. The apparatus according to the embodiment of the present disclosure can quickly and accurately predict the pregnancy status of the pregnant woman based on information about the concentration of fetal cell-free nucleic acids obtained via one-time blood sampling of the pregnant woman at early pregnancy, the gestational age in week at which the blood sampling is conducted, and the physical sign data of the pregnant woman, the pregnancy status including the gestational age in week at delivery, the probability of premature delivery, the intrauterine growth retardation of the fetus, and other pregnancy complications associated with the concentration of fetal cell-free nucleic acids. According to the apparatus of an embodiment of the present disclosure, the concentration of fetal cell-free nucleic acids is obtained by data processing using sequencing data of the cell-free nucleic acids in the plasma of the pregnant woman as input data, specifically including: after the quality control of raw sequencing data (fq format) is finished, aligning the sequencing data to human reference chromosomes by using alignment software (such as a samse mode in BWA); using sequencing data quality control software (such as Picard) to remove the repeated reads in the alignment results and calculate the repetition rate; completing the local correction of the alignment results by using mutation detection algorithm (such as Base Quality Score Recalibration BQSR function in GATK); and calculating the average depth of different chromosomes in each sample by using coverage depth calculation software (such as Depth of Coverage function in GATK). For male fetus samples, the mean depth of coverage of the unique alignment reads matching the non-homologous region of Y chromosome is calculated, and the ratio of this mean depth to the mean depth of the unique alignment reads matching autosome is the concentration of fetal cell-free nucleic acids. For female fetus samples, calculation can be performed using existing methods for calculating the fetal concentration based on low-depth sequencing data of maternal plasma.

[0079] According to an embodiment of the present disclosure, the pregnancy status includes a delivery interval of the pregnant woman. The apparatus according to the embodiment of the present disclosure can be used to predict the probability of premature delivery, intrauterine growth retardation of a fetus at the gestational age in week at delivery, and other pregnancy complications associated with the concentration of fetal cell-free nucleic acids.

[0080] According to an embodiment of the present disclosure, the gestational age in week at which the sampling is conducted is 13 to 25 weeks. The inventors found that there was a weak correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 12 weeks or less or between 26 weeks and 30 weeks, while there was a strong correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling is conducted is 13 to 25 weeks. Generally, there is a problem of weak correlation in the prediction of the pregnancy status of pregnant women using the concentration of fetal cell-free nucleic acids. According to the apparatus of the embodiment of the present disclosure, the gestational age in week at which the sampling is conducted is added as one of the parameters for constructing the prediction model, which improves the accuracy of prediction, and blood sampling of the pregnant women only needs to be conducted once within the gestational age of 13 to 25 weeks, which reduces the cost and risk of multiple blood samplings.

[0081] According to an embodiment of the present disclosure, the predetermined prediction model is at least one of a linear regression model, a logistic regression model, or a random forest. According to the apparatus of an embodiment of the present disclosure, the prediction model may be theoretically any statistical model that generalizes different difference distributions.

[0082] According to a specific embodiment of the present disclosure, the predetermined parameters further include a height, a body weight, and an age of the pregnant woman, and the prediction model is adapted to calculate a delivery interval of the pregnant woman based on the following formula:

[00017]l=β0+βcffxcff+βsamplexsample+βheightxheight+

[00018]βweightxweight+βagexage+ε,

wherein l is a parameter determined based on the probability of premature delivery of the pregnant woman;

[00019]β0,βcff,βsample,βheight,andεare

each independently a predetermined coefficient; x.sub.cff is the concentration of fetal cell-free nucleic acids of the pregnant woman; x.sub.sample is the gestational age in week at which the sampling for the peripheral blood of the pregnant woman is conducted; x.sub.height is the height of the pregnant woman; x.sub.weight is the body weight of the pregnant woman; x.sub.age is the age of the pregnant woman, and ε is a sequencing error of a peripheral blood sample of the pregnant woman. According to an embodiment of the present disclosure, the coefficients β.sub.0, β.sub.cff,

[00020]βsample,βheight,andβweight

may be freely selected as needed, for example, the pregnant woman BMI may be additionally added as one of the coefficients.

[0083] According to an embodiment of the present disclosure, l is determined based on the following formula:

[00021]l=logbP1p,

where b is a base number of log and is generally a constant e, and p is the probability of premature delivery of the pregnant woman.

[0084] In a fifth aspect of the present disclosure, provided is a computer-readable storage medium having a computer program stored thereon. The program, when executed by a processor, implements the steps of the above-described method for constructing the prediction model. Thus, the above-described method for constructing the prediction model can be effectively implemented, so that the prediction model can be effectively constructed, and the prediction model can be then used to perform prediction on an unknown sample to determine the pregnancy status of the pregnant woman to be detected.

[0085] In a sixth aspect of the present disclosure, provided is an electronic device including: the computer-readable storage medium; and one or more processors configured to execute the program in the computer-readable storage medium.

[0086] The present disclosure will be further explained below with reference to specific examples. The experimental methods applied in the following examples are conventional methods, unless otherwise specified. The materials, reagents, etc. used in the following examples are all commercially available, unless otherwise specified.

[0087] The technical solutions of the present disclosure will be explained below with reference to examples. Those skilled in the art will understand that these examples are illustrative only, and should not be considered as limiting the scope of the present disclosure. Examples, where specific techniques or conditions are not specified, are implemented in accordance with techniques or conditions described in the literature in the art (for example, refer to J. Sambrook et al. “Molecular Cloning: A Laboratory Manual” translated by Huang Peitang et al., 3rd edition, Science Press) or according to the product specification. All of the used reagents or instruments which are not specified with the manufacturer are conventional commercially-available products, for example, purchased from Illumina.

Example 1 Construction and Application of Prediction Model for Premature Delivery and Gestational age in Week at Delivery

[0088] 38964 samples were classified according to different gestational ages in week at which blood sampling was conducted, and the correlation between the concentration of fetal cfDNAs in plasma and the premature delivery was calculated respectively. With reference to FIG. 1, statistical analysis showed that the correlation between fetal concentration and premature delivery differed at different sampling gestational ages in week; there was a weak correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling was conducted was 12 weeks or less or between 26 weeks and 30 weeks, while there was a strong correlation between fetal concentration and premature delivery when the gestational age in week at which the blood sampling was conducted was 13 to 25 weeks.

[0089] Plasma cfDNA data of 38964 pregnant women in combination with the gestational age in week at which the blood sampling was conducted and the age, height, and body weight information of the pregnant woman served as a training set:

[0090] (1) A linear regression model was established with the gestational age in week at delivery as a continuous variable in the prediction of the gestational age in week at delivery.

[0091] Specifically, by taking the gestational age in week at delivery as Y value, and taking the fetal cfDNA concentration, the gestational age in week at which the blood sampling was conducted, and the height, body weight, age and BMI of pregnant women as covariates, a prediction model was established:

[00022]yi=β0+βicffxicff+βisamplexisample+

gestational age in week at delivery corresponding to sample i, x.sub.icff is the fetal cfDNA concentration corresponding to sample i, x.sub.isample is the gestational age in week at which the blood sampling is conducted, corresponding to sample i, x.sub.iheight is the height of the pregnant woman corresponding to sample i, x.sub.iweight is the body weight of the pregnant woman corresponding to sample i, x.sub.iage is the age of the pregnant woman corresponding to sample i, x.sub.ibmi is the BMI of the pregnant woman corresponding to sample i, and p is the total number of samples in the training set, where p = 38964.

[0092] The estimated values of coefficient β for different variables in the finally obtained prediction model are shown in the column of gestational age in week at delivery in Table 2.

[0093] (2) A logistic regression model was established by defining premature delivery events as Y = 0 and defining full-term delivery events as Y = 1 in the prediction of premature delivery.

[0094] Specifically, the probability of full-term delivery of a sample was set as p = P (Y = 1), the probability of premature delivery of the sample was set as p = P (Y = 0), and this probability p was subjected to log-odds transformation, i.e.,

[00023]l=logbP1p,

where b is the base number of log and is generally a constant e.

[0095] The transformed l was put into the linear regression model, and the fetal cfDNA concentration, gestational age in week at which blood sampling was conducted, and height, body weight, and age of pregnant women were also taken as covariates to establish a prediction model.

[0096] Specifically, by taking the gestational age in week at delivery as Y value, and taking the fetal cfDNA concentration, the gestational age in week at which blood sampling was conducted, and the height, body weight, age, and BMI of the pregnant women as covariates, a prediction model was established:

[00024]li=β0+βicffxicff+βisamplexisample+

[00025]βiheightxiheight+βiweightxiweight+βiagexiage+εi,wherei=1,.Math.,p,whereinli

is the logical transformation result of the gestational age in week at delivery corresponding to sample i, x.sub.icff is the fetal cfDNA concentration corresponding to sample i, x.sub.isample is the gestational age in week at which blood sampling was conducted, corresponding to sample i, x.sub.iheight is the height of the pregnant woman corresponding to sample i, x.sub.iweight is the body weight of the pregnant woman corresponding to sample i, x.sub.iage is the age of the pregnant woman corresponding to sample i, x.sub.ibmi is the BMI of the pregnant woman corresponding to sample i, and p is the total number of samples in the training set, where p = 38964.

[0097] The estimated values of coefficient β for various variables in the finally obtained prediction model are shown in the column of premature delivery in Table 1.

TABLE-US-00001 Statistical results of phenotype-related data of pregnant women in regression model for gestational age in week at delivery and regression model for premature delivery Predicted Value Covariate Estimated Value Standard Deviation Z/T Value p value Premature Delivery Age of Pregnant Woman -0.0461 0.0032 -14.3160 <2e-16 Height of Pregnant Woman 0.0612 0.0225 2.7200 0.0065 Body Weight of Pregnant Woman -0.0551 0.0299 -1.8400 0.0657 BMI of Pregnant Woman 0.1219 0.0774 1.5760 0.1151 Gestational Age in Week at Delivery Age of Pregnant Woman -0.0407 0.0014 -28.2810 <2e-16 Height of Pregnant Woman 0.0158 0.0100 1.5870 0.1120 Body Weight of Pregnant Woman -0.0050 0.0134 -0.3740 0.7080 BMI of Pregnant Woman 0.0055 0.0349 0.1590 0.8740

[0098] After obtaining the prediction models for premature delivery and gestational age in week at delivery, additional 32049 samples were used as a test set, the fetal concentration, gestational age in week at which blood sampling was conducted, and age, height, body weight and BMI of pregnant woman corresponding to each sample were respectively put into the linear regression model to predict the gestational age in week at delivery and into the logistic regression model to predict premature delivery.

[0099] Refer to FIG. 2 for the accuracy of the finally obtained premature delivery prediction results, and refer to FIG. 3 for the distribution of the predicted gestational ages in week at delivery and the actual gestational ages in week at delivery. Wherein, the prediction results of premature delivery are significantly correlated with the actual results, with the correlation reaching -0.13, and the probability threshold for filtering can be determined according to the requirements of actual scenario for sensitivity and specificity. The correlation between the predicted gestational age in week at delivery and the actual gestational age in week at delivery reached 0.12.

[0100] In addition, reference to the term “an embodiment”, “some embodiments”, “an example”, “a specific example” or “some examples” or the like means that a specific feature, structure, material, or characteristic described in combination with the example(s) or example(s) is included in at least one embodiment or example of the present disclosure. In this specification, illustrative expressions of these terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. In addition, without mutual contradiction, those skilled in the art may combine different embodiments or examples and features of the different embodiments or examples described in this specification.

[0101] Although the embodiment or examples of the present disclosure have been illustrated and described above, it should be understood that the embodiments or examples are illustrative and should not be construed as limiting the present disclosure, and persons of ordinary skill in the art may make various changes, modifications, replacements and variations to the above embodiments or examples within the scope of the present disclosure.