3.3.5 The Framingham Heart Study - Video 3: A Logistical Regression Model | Summary and Q&A

4.3K views
December 13, 2018
by
MIT OpenCourseWare
3.3.5 The Framingham Heart Study - Video 3: A Logistical Regression Model

TL;DR

Logistic regression is used to predict the 10-year risk of coronary heart disease (CHD) based on various risk factors collected at the first examination of patients.

Install to Summarize YouTube Videos and Get Transcripts

Q: How is logistic regression used to predict the 10-year risk of coronary heart disease?

Logistic regression is used by creating a model that predicts the dependent variable (10-year CHD) using all other variables in the dataset as independent variables. The model is built using the glm function with the family argument set to "binomial".

Q: What are some significant variables in the logistic regression model?

The significant variables in the model include male, age, prevalent stroke, total cholesterol, systolic blood pressure, and glucose levels. These variables have positive coefficients, indicating that higher values contribute to a higher probability of 10-year CHD.

Q: What is the accuracy of the logistic regression model?

The accuracy of the model is approximately 84.8%, which is calculated by dividing the sum of correct predictions (1069 true positive + 11 true negative) by the total number of observations in the dataset.

Q: How does the model compare to a baseline method in terms of accuracy?

The baseline method, which always predicts 0 or no CHD, would have an accuracy of approximately 84.4%. Therefore, the logistic regression model slightly outperforms the baseline in terms of accuracy.

Summary & Key Takeaways

• The content discusses the process of using logistic regression to predict the 10-year risk of CHD based on risk factors collected at the first examination.

• The data set used for analysis contains information on demographic, behavioral, medical history, and physical exam risk factors, as well as the outcome variable of whether or not the patient developed CHD in the next 10 years.

• The training and testing sets are created using sample.split, and a logistic regression model is built using the training set.

• The significant variables in the model include male, age, prevalent stroke, total cholesterol, systolic blood pressure, and glucose levels.