In this study we used machine learning and analytic techniques to spot patterns in students, find correlations in attributes of students, and create and compare predictive models based on student’s academic performance and their demographics. Our overall objective is to show whether we can ethically predict student dropouts given this particular set of attributes and to defend our methods using our analysis to back up our choices in the development and results of our machine learning models.
We used unsupervised learning models, K-Means Clustering and Principal Component Analysis (PCA) to explore potential patterns/different groupings of students and top contributing features. We developed multiple machine learning models via supervised learning; Logistic Regression, Decision Tree, Bootstrap Aggregation, Bootstrap Sampling, Random Forest, Gradient Boosting, and Multi-Layer Perceptron that attempt to predict whether a student stays enrolled/graduates or drops out based on their demographic information, academic progress and social-economic factors. We also carried out comparative analysis of the differing attributes to spot any correlations that may exist.
Both the Gradient Boosting Model and the Multi-Layer Perceptron have a high recall which if these models were used to make predictions in an attempt to identify students who could use more resources to help them not drop out then this a great thing, if these models were being used for admissions purposes we would want to focus more on precision. In many of our models it was shown that features such as the number of credits enrolled and whether a student had a debtor were most indicative to whether they were dropping out or not. This is a cause of ethical concern as we can see that the most important factors were not whether a student had high/low academic scores but rather financial and planning factors. This is an ethical concern because instead of a student being predicted to do well academically based on their academic ability it is other outside factors that are leading these predictions. This is a big reason why colleges have whole departments to help students with finances, planning, health, etc. in addition to all the academic resources. Although these resources exist, that does not eliminate the ethical concerns of predicting a student’s success, especially if used to determine whether to admit a student for a program or not, based on non academic factors.