← back to blog
March 2026NLP

Fine-Tuning BERT for Stress Detection: Notes from the Trenches

Why BERT outperformed RNNs and Transformers, what the confusion matrices revealed, and how transfer learning saved a small dataset.

Classifying stress in text is a deceptively tricky problem. The language people use when stressed on Reddit looks superficially similar to language used in many other emotional contexts. Getting a model to reliably distinguish stress from general negativity requires picking up on subtle semantic and contextual cues.

Why BERT won

The RNN model showed systematic bias toward false negatives — missing stressed posts. The vanilla Transformer had the opposite problem: false positive bias, flagging too many posts as stressed. BERT was the only architecture that achieved balanced classification across both classes. The reason is transfer learning: BERT's pre-training gives it contextual embeddings that understand how words relate to each other in ways smaller architectures simply can't match on a limited training set.

The numbers

Fine-tuned BERT: Precision = 0.88, Recall = 0.87, F1 = 0.88, ROC-AUC = 0.94. The AUC is the number I'm most proud of — 0.94 means the model is genuinely discriminating, not just getting lucky on class balance.

What I'd do differently

More careful error analysis earlier. I spent time tuning hyperparameters before I looked closely at which posts were being misclassified. When I did, the pattern was obvious: sarcastic posts were the hard cases for every model. That insight should have driven the project from day one.