Happiness and Riches

Like a primadonna girl, all I ever wanted was the world.

It was 2017. I was in constant panic that sooner or later, I was going to die of a sudden fatal disease. It dug me deeper into a spiral that I cannot get out. I felt stuck in that situation like nothing good was gonna happen anymore and I am just delaying the inevitable. I felt trapped. I felt miserable. I wanted to just disappear so I could not feel the pain.

SPOILER ALERT: I’m still here but, depression, nor anxiety, hasn’t left. Back in its peak (or should I say the lowest low), a coworker asked me, “Ot ma-de-depress ka e mas masalese ka biye karing aliwa?” [FIL: “Bakit ka na-de-depress e mas maayos ang buhay mo kaysa sa iba?” ENG: “Why are you feeling depressed when you live better than others?”] I was dumbfounded and it infuriated me at the same time. To everyone, this type of question invalidates how a person feels. You are doing more harm than not speaking at all. So anyway, fast forward to 2021, the question stuck with me.

Does living well means you are not prone to depression? Since I am enrolled in Machine Learning for Bioinformatics and I need a topic for a research paper that I need to submit by the end of the semester, I channeled all my rage into this one.

Kudos to Center for Disease Control and Prevention (CDC) for providing a dataset for everyone to play with. I acquired the 2020 Behavioral Risk Factor Surveillance System from their website. Using the foreign library in R, the BFRSS data was imported into a dataframe. The dataframe contained 401,958 observations of 279 variables.

Pre-processing

However, our only interest right now is the quality of life variables so we keep 18 (GENHLTH, PHYSHLTH, MENTHLTH, POORHLTH, HLTHPLN1, PERSDOC2, MEDCOST, CHECKUP1, HLTHCVR1, RENTHOM1, CPDEMO1B, EMPLOY1, INCOME2, EDUCA, CHILDREN, EXERANY2, SLEPTIM1, ADDEPEV3) and remove the rest. ADDEPEV3 will be the dependent variable in the model as it denotes if the person has a depressive disorder or not.

The data dictionary is found in BFRSS website, too. By looking at it, some factors need to be imputed as they contained “Don’t Know” and “Refused”. The said values were replaced with the median value if the variable is numerical and mode if the variable is categorical. This was only done for the predictors. For the dependent variable, those will be removed from the dataframe. The final dataset was a 394,029 observations of 18 variables dataframe.

Distribution of participants and their depression status

The ratio of persons with no depression over those with depression is 76.6%. To fix it, I opted for downsampling. Besides, my computer can’t handle huge amount of data. GONNA NEED SOME SUGAR DADDY. The dataset was split into 75% training set and 25% testing set, using caret library.

Machine Learning

The supervised machine learning models I used are Naive Bayes (using caret library), K-Nearest Neighbors (using caret, class, and gmodels libraries), Decision Tree (using rpart and rpart.plot libraries), Support Vector Machine (using e1071 library), Logistic Regression (using e1071 library), and Random Forest (using randomForest library). KNN, SVN, and Random Forest underwent parameter tuning as an additional step.

The training set was fed into the ML models and test against the test set. The confusion matrix was generated for each models. Using the matrix, the metrics accuracy, sensitivity, specificity, precision, and F-score were computed.

Findings

Machine learning models with their performance metrics

KNN predicted with the best accuracy at 91.02%while Naive Bayes was the lowest with 76.65%. However, looking at the results, all the models predicted correctly more than 75% of the time.

Top 5 predictors

Interestingly, health and income are the most significant predictor for depression. If we interpret it using odds model, the odds of having a depressive disorder, while holding all other variables constant, is 8.67% higher for every increase in the number of days having a bad mental health day (MENTHLTH), 24.73% higher for every worsening level of general health (GENHLTH), 2.45% higher for every increase in the number of days having a poor health (POORHLTH), and 5.93% lower for every increase in income (INCOME2).

Remarks

Take this study with a grain of salt. Depression is a complex disease and this was just looking at one aspect of our life. You can be “normal” even if the hurdles are stack against you or, its reverse, you can suffer from depression even if you have all the riches in the world.

So does momey make you happy? Somehow, it does. Let’s be real. We live in capitalism. All our basic needs has a price. For example, since health is a determining factor, how do you manage being healthy? Food, exercise, sleep, and medicine. Food and medicine costs a lot. For someone living making ends meet, how do they handle changes? It will definitely affect their mental health and the cycle continues.

I wish Philippines has something like this. We mostly live in poverty so does that mean most of us suffer from depression? It’s a stretch but it should be something to ponder.

The full paper can be viewed here. This paper wasn’t peer reviewed and published. I don’t know how nor if I can afford it. Plus, I don’t think it’s good enough for journal publication. A dumb bitch definitely won’t make the cut. Feel free to scrutinize the source code here.

PS. Still in need of a glucose guardian. LIKE DESPERATELY NEED.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jose Marie Cordova

Jose Marie Cordova

Mostly for Health Informatics and Bioinformatics assignments. But I’ll write whatever I feel writing.