Decisions around pregnancy are personal, particularly when it comes to how a woman hopes to deliver her child. Some women prefer a vaginal birth, while others may prefer to have a C-section. In many cases, C-sections are conducted for medical reasons. Whatever the situation, Amino aims to give people the information they need to understand and make healthcare decisions with confidence.
We built our C-section predictor tool to help expectant mothers understand how pre-existing health factors and health conditions diagnosed during pregnancy influence the likelihood of delivering by C-section.
At the heart of our C-section predictor is a statistical model that draws from our secure, patient-deidentified database. We considered many factors when building the model, including medical conditions of every patient who delivered a baby, patient age at the time of delivery, and the geographic location of those patients (down to the level of ZIP3). In total, that’s data about 4.4 million deliveries and 3.5 million distinct women between 2010 and 2015. We use a multivariate logistic regression model for prediction.
We chose the features in our prediction model to maximize the usability of the C-section predictor for the end user, specifically by focusing on personal health factors, such as medical diagnoses, that are possible for a pregnant woman to know before her delivery day. Factors that would be unknown to a mother during the course of her pregnancy were excluded but did not substantially compromise the accuracy of the model.
It’s important to note that many factors can influence a woman’s chance of delivering via C-section, and not all of these factors are included in our C-section predictor. Some factors we specifically excluded are diagnostic factors that a pregnant woman is unlikely to know about herself or her pregnancy in advance of delivery, in particular some factors that relate to emergency C-section. In addition, there are some factors like race, socioeconomic status, and prior successful vaginal deliveries that are not accounted for in our database but may be associated with increased or decreased probability of C-section.
The list of features about the patient that were included in the C-section predictor are:
- patient age
- patient zip
- previous c-section (if the patient had a c-section in a previous pregnancy)
- malpositioned baby (including breech babies)
- large baby
- more than one baby (twins, triplets etc.)
- bleeding during pregnancy
- type 1 diabetes
- type 2 diabetes
- excessive amniotic fluid
- high blood pressure
- gestational diabetes
- Rh incompatibility
In the health insurance claims data that we use to train our model, the maximum number of diagnostic factors that can exist for any given delivery is 8. This is an attribute of the source data. Because of this hard limit on the number of diagnostic factors associated with a delivery in the source data, the C-section predictor’s output may be somewhat difficult to interpret if the user chooses more than 8 diagnoses while using the predictor. That said, it is extremely unlikely that a real pregnant patient would experience more than a handful of the diagnostic factors listed in the predictor over the course of a single pregnancy, so we don’t recommend selecting more than 8 diagnoses at a time while using the predictor.
Since the relationship between age and C-section is not perfectly linear, we transformed the ages using B-splines so that the ages are smoothed to more accurately reflect the varying influence age has on delivery method.
To test how accurate our C-section predictor is, we checked how it performed on a dataset that we had reserved for testing. This reserved testing dataset was separate from the training dataset used to decide the appropriate model to use for the tool (the final model for the tool was built on the complete dataset). Because we know whether a delivery was a C-section or not in our source data, we can compare the predictions to what actually happened. We use the rate of C-section from the training set, 36%, as the threshold for classifying C-sections based on the predicted probability returned from the model. When assessing the accuracy of our model on the testing set, our C-section predictor produced the correct prediction 78.7% of the time. With that in mind, we rebuilt the model on the complete dataset.