Sensitivity and Specificity: Mastering the Key Classification Metrics
Abdessamad Touzani

Abdessamad Touzani @__abdessamadtouzani__

About: First-year Master's student in Data & AI | Aspiring Python Expert

Location:
Paris, France
Joined:
May 1, 2024

Sensitivity and Specificity: Mastering the Key Classification Metrics

Publish Date: Jul 3
0 0

You've already mastered confusion matrices, but do you really know how to interpret their results? Sensitivity and specificity are two fundamental metrics that transform the raw numbers from your matrix into actionable insights. These concepts aren't just academic — they can literally make the difference between life and death in medicine, or between success and failure in your machine learning project.

This article follows my guide on confusion matrices. If you're not yet familiar with this concept, I recommend checking it out first.

Recap: Anatomy of a Confusion Matrix

Before diving into calculations, let's briefly recall the structure of a 2x2 confusion matrix:

                    REALITY
                 Diseased | Healthy
PREDICTION  Diseased |  TP   |  FP
            Healthy  |  FN   |  TN
Enter fullscreen mode Exit fullscreen mode

Where:

  • TP (True Positives): Diseased patients correctly identified
  • TN (True Negatives): Healthy patients correctly identified
  • FN (False Negatives): Diseased patients missed by the algorithm
  • FP (False Positives): Healthy patients incorrectly identified as diseased

Sensitivity: The Positive Detector

Definition and Formula

Sensitivity (or recall) measures the percentage of positive cases correctly identified by your model.

Formula: Sensitivity = TP / (TP + FN)

In other words: "Among all patients who are actually diseased, how many did my algorithm detect?"

Concrete Example

Let's revisit our medical example with logistic regression:

                    REALITY
                 Diseased | Healthy
PREDICTION  Diseased |  139  |  20
            Healthy  |  32   |  112
Enter fullscreen mode Exit fullscreen mode

Sensitivity calculation:

  • TP = 139 (diseased patients correctly identified)
  • FN = 32 (diseased patients missed)
  • Sensitivity = 139 / (139 + 32) = 139 / 171 = 0.81

Interpretation: Our logistic regression model correctly identifies 81% of diseased patients.

Specificity: The Negative Guardian

Definition and Formula

Specificity measures the percentage of negative cases correctly identified.

Formula: Specificity = TN / (TN + FP)

In other words: "Among all patients who are actually healthy, how many did my algorithm correctly classify?"

Calculation with Our Example

Specificity calculation:

  • TN = 112 (healthy patients correctly identified)
  • FP = 20 (false alarms)
  • Specificity = 112 / (112 + 20) = 112 / 132 = 0.85

Interpretation: Our model correctly identifies 85% of healthy patients.

Model Comparison: Logistic Regression vs Random Forest

Let's now analyze the performance of two different models:

Random Forest — Results

                    REALITY
                 Diseased | Healthy
PREDICTION  Diseased |  142  |  22
            Healthy  |  29   |  110
Enter fullscreen mode Exit fullscreen mode

Calculations:

  • Sensitivity = 142 / (142 + 29) = 0.83 → 83%
  • Specificity = 110 / (110 + 22) = 0.83 → 83%

Direct Comparison

Model Sensitivity Specificity
Logistic Regression 81% 85%
Random Forest 83% 83%

Strategic Choice

Which model to choose? It depends on your priorities:

  • If identifying all diseased patients is crucial → Choose Random Forest (higher sensitivity)
  • If avoiding false alarms is the priority → Choose Logistic Regression (higher specificity)

In medicine, missing a diseased patient (false negative) is generally more serious than a false alarm (false positive). In this context, we would favor Random Forest.

Beyond Binary: Multi-Class Classification

Things get more complex with more than two classes. Unlike 2x2 matrices, there are no single sensitivity and specificity values for the entire matrix. Instead, we calculate these metrics for each class individually.

Example: Favorite Movie Predictor

Let's revisit our amusing example with three terrible movies:

                    REALITY
              Troll2 | Gore | Cool
PREDICTION Troll2 |  12   |  102 |  93
           Gore   |  112  |  23  |  77
           Cool   |  83   |  92  |  17
Enter fullscreen mode Exit fullscreen mode

Calculation for Troll 2

Sensitivity for Troll 2:

  • TP = 12 (people liking Troll 2 correctly identified)
  • FN = 112 + 83 = 195 (Troll 2 fans missed)
  • Sensitivity = 12 / (12 + 195) = 0.06 → 6%

Only 6% of Troll 2 fans were correctly identified!

Specificity for Troll 2:

  • TN = 23 + 77 + 92 + 17 = 209 (non-fans correctly identified)
  • FP = 102 + 93 = 195 (false predictions for Troll 2)
  • Specificity = 209 / (209 + 195) = 0.52 → 52%

Calculation for Gore Police

Sensitivity:

  • TP = 23, FN = 102 + 92 = 194
  • Sensitivity = 23 / (23 + 194) = 0.11 → 11%

Specificity:

  • TN = 12 + 93 + 83 + 17 = 205
  • FP = 112 + 77 = 189
  • Specificity = 205 / (205 + 189) = 0.52 → 52%

General Pattern

For an n×n matrix, you need to calculate:

  • n sensitivities (one per class)
  • n specificities (one per class)

The more classes you have, the more complex the analysis becomes!

Practical Applications and Strategies

In Medicine

  • High sensitivity required: Screening for serious diseases
  • High specificity required: Expensive confirmation tests

In Marketing

  • High sensitivity: Identify all potential customers
  • High specificity: Avoid spam and preserve reputation

In Security

  • High sensitivity: Fraud or threat detection
  • High specificity: Minimize false alerts

Trade-offs and Compromises

The Inevitable Dilemma

There's generally a trade-off between sensitivity and specificity:

  • Increasing sensitivity often decreases specificity
  • Increasing specificity may reduce sensitivity

ROC Curves and AUC

To explore these trade-offs, data scientists use:

  • ROC curves (Receiver Operating Characteristic)
  • AUC (Area Under the Curve)

These topics deserve a dedicated article — stay tuned!

Complementary Metrics

Precision vs Sensitivity

  • Precision = TP / (TP + FP) → "Among my positive predictions, how many are correct?"
  • Sensitivity = TP / (TP + FN) → "Among true positives, how many did I detect?"

F1-Score

Combines precision and sensitivity: F1 = 2 × (Precision × Sensitivity) / (Precision + Sensitivity)

Practical Decision Guide

Steps to Choose Your Model

  1. Define your business priorities

    • What type of error is most costly?
    • False positives vs false negatives?
  2. Calculate sensitivity and specificity for each candidate model

  3. Analyze the context:

    • Error costs
    • Available resources
    • Impact on users
  4. Make an informed decision based on your business constraints

Limitations and Precautions

Imbalanced Datasets

With highly imbalanced classes, overall accuracy can be misleading. Sensitivity and specificity provide a more nuanced view.

Multi-Class Interpretation

The more classes you have, the more complex the interpretation becomes. Consider grouping approaches or aggregated metrics.

Conclusion: Essential Metrics

Sensitivity and specificity aren't just mathematical calculations — they're the keys to making informed decisions in machine learning. By mastering these concepts, you evolve from "someone who trains models" to "a data scientist who solves business problems."

Key takeaways:

  • Sensitivity measures your ability to detect positives
  • Specificity measures your ability to identify negatives
  • The choice between models depends on your business priorities
  • For multi-class problems, calculate these metrics per class

The next time you compare models, don't just look at accuracy — dive into sensitivity and specificity. These metrics will reveal crucial insights about your algorithms' real behavior.

In our next article, we'll explore ROC curves and AUC, even more sophisticated tools for evaluating and comparing your classification models.

Comments 0 total

    Add comment