⚠️ Alpha内测版本警告:此为早期内部构建版本,尚不完整且可能存在错误,欢迎大家提Issue反馈问题或建议。
Skip to content

5.3 Classification and Detection

This mainline page answers the Chapter 5 question "How should we think about classification and detection?" For runnable demos, complete outputs, and experiment entry points, continue to 5.6 Code Labs / Practice Appendix and src/ch05/README_EN.md.

"Medical image classification and detection are moving from computer-aided detection (CADe) to computer-aided diagnosis (CADx), and gradually becoming an important assistant for clinicians." — — Litjens et al., "A survey on deep learning in medical image analysis", Medical Image Analysis 2017

In the previous sections, we learned in detail about preprocessing techniques and U-Net-based segmentation methods. Now, we enter another important area of medical image analysis: classification and detection. Unlike the pixel-level precision requirements of segmentation, classification and detection focus more on accurately identifying diseases and locating lesions.

Medical image classification and detection face unique challenges: extreme class imbalance (the ratio of positive to negative samples can reach 1:1000), tiny lesion sizes, image quality variations, and the need for high precision and recall. In this section, we will explore how to use deep learning technology to solve these problems.

🔍 Classification vs Detection: Core Concepts and Differences

Basic Task Definition

Image Classification

Image classification determines whether an image contains specific disease or abnormality:

  • Binary classification: Normal vs Abnormal
  • Multi-class classification: Specific disease type identification
  • Multi-label classification: An image may contain multiple diseases

Object Detection

Object detection not only identifies diseases but also determines their location:

  • Bounding box detection: Frame the lesion area
  • Lesion localization: Provide precise coordinates
  • Multi-lesion detection: Detect multiple lesions simultaneously
Task TypeInputOutputClinical ApplicationDifficulty Level
ClassificationComplete medical imageDisease label/categoryInitial screening, triage
DetectionComplete medical imageBounding box + categoryLesion localization, surgical planning
SegmentationComplete medical imagePixel-level maskPrecise measurement, 3D reconstruction

Medical Particularities

Class Imbalance Problem in Medical Imaging

Class imbalance is a fundamental challenge in medical image classification. Unlike natural image datasets, medical datasets exhibit extreme imbalance due to inherent clinical characteristics.

Root Causes of Medical Class Imbalance
  1. Disease Prevalence Characteristics

    • Most diseases have very low prevalence in general populations
    • Example: Cancer typically affects 0.5-2% of screened population
    • Example: Tuberculosis affects <1% in low-prevalence regions
    • Clinical Reality: Screening data naturally reflects population disease prevalence
  2. Data Collection Bias

    • Medical imaging is expensive and resource-intensive
    • Positive (diseased) cases are actively collected for research
    • Negative (normal) cases are passively collected from routine screening
    • Result: Dataset distribution is more skewed than true population distribution
  3. Annotation Cost Asymmetry

    • Normal cases can be easily labeled as "healthy"
    • Abnormal cases require expert radiologist review
    • Complex cases need multiple expert consensus
    • Economic Impact: High cost makes balanced datasets economically unfeasible
Severe Impact of Class Imbalance on Model Training
Impact CategoryProblem DescriptionConsequenceClinical Risk
Prediction BiasModel biased toward majority classMajority class prediction dominates, minority class ignoredHigh false negative rate (missing rare diseases)
Decision Threshold MismatchDecision thresholds optimized for majority classMinority class confidence scores unreliableInappropriate clinical decisions
Feature Learning DistortionMinority class features under-learnedModel learns superficial patterns instead of medical characteristicsPoor generalization to new data
Evaluation Metric MisleadingOverall accuracy high even with poor minority class performance99% accuracy could mean missing 100% of disease cases if disease is 1% prevalenceFalse sense of model reliability

Readers often face pain points like these:

  • chest X-ray screening only needs a first abnormal/normal decision;
  • emergency triage cares more about high recall than fine delineation;
  • large-scale screening needs fast routing before detailed review.

So not every problem should jump straight to segmentation. In many medical AI workflows, classification and detection are the first gate.

Intuitive explanation

A simple way to frame this section is as three levels of questions:

  • classification answers “whether / what”;
  • detection answers “where”;
  • segmentation answers “where exactly is the boundary.”

Classification models focus on image-wide patterns related to diagnosis. Detection builds on classification and learns to provide rough location information as well.

The hard part in medicine is not only model architecture. It is also that:

  • positive cases are much rarer than negative ones;
  • tiny lesions may occupy only a tiny fraction of the image;
  • clinicians want more than a score—they want something they can inspect and question.

Chest X-ray classificationFigure: classification emphasizes image-level diagnostic patterns, while detection adds explicit localization of suspicious regions.

Core method

This section keeps only 4 key ideas.

1. Decide whether you need a global label or candidate locations

If the goal is screening, triage, or first-pass warning, classification may be enough. If clinicians need quick review of suspicious regions, detection is often a better fit.

2. Put recall first when the task demands it

In medical screening especially, it is often better to flag extra suspicious cases than to miss a serious lesion.

3. Handle class imbalance explicitly

Rare positives are the norm. Resampling, weighted losses, and threshold tuning often matter earlier than swapping the backbone.

4. Make outputs reviewable

Probabilities, heatmaps, confusion matrices, ROC/AUC curves, and error analysis all help clinicians judge whether the model is trustworthy.

Typical case

Case 1: Binary or multi-label chest X-ray classification

  • Goal: predict normal/abnormal, pneumonia, effusion, nodules, and similar labels.
  • Difficulty: positive cases are sparse and many abnormalities occupy only a small region.
  • Local code: src/ch05/medical_image_classification/main.py.

Case 2: Lesion detection as a triage entry point

  • Goal: provide candidate boxes for a clinician or a later segmentation model to review.
  • Suitable for: lung nodules, breast calcifications, suspicious fractures, and similar findings.
  • Section focus: build the intuition for classification first, then see why detection must additionally learn location.

Case 3: Model interpretation and error analysis

  • Goal: understand not just the score, but where the model is looking.
  • Suggested outputs: prediction probabilities, confusion matrix, ROC/AUC, heatmaps, or attention maps.
  • Local result file: src/ch05/medical_image_classification/output/medical_classification_report.json.

Practice tips

The text only keeps short fragments for intuition; the full network, training loop, and visualizations are in the local scripts.

1. Minimal classification head

python
import torch.nn as nn


def classification_head(in_features, num_classes):
    return nn.Sequential(
        nn.Linear(in_features, 256),
        nn.ReLU(inplace=True),
        nn.Dropout(0.5),
        nn.Linear(256, num_classes),
    )

📊 Performance Comparison and Best Practices

Evaluation Metrics

Classification Metrics

python
import torch
import torch.nn.functional as F


def weighted_ce(logits, targets, class_weights):
    return F.cross_entropy(logits, targets, weight=torch.tensor(class_weights))

3. Convert logits into readable probabilities

python
import torch


def to_probabilities(logits):
    return torch.softmax(logits, dim=1)

Model Selection Guidelines

Task-driven Model Selection

Medical Image Analysis Model Selection Guide *Figure: Selecting appropriate deep learning models based on medical image data types (2D X-ray, 3D CT/MRI, WSI whole slide images) and task types *

📖 View Original Mermaid Code

Performance Comparison

ModelData TypeTaskmAP/AccuracyMemory UsageTraining TimeClinical Applicability
ResNet502D X-rayClassification0.85-0.922GBMedium
DenseNet1212D X-rayClassification0.87-0.942.5GBMedium
3D ResNet3D CT/MRIClassification0.82-0.898GBHigh
Faster R-CNN2D X-rayDetection0.78-0.854GBHigh
YOLOv52D X-rayDetection0.75-0.821.5GBLow
Attention MILWSIClassification0.80-0.886GBVery High

1. Classification Techniques

  • 2D CNN for X-ray: ResNet, DenseNet-based transfer learning
  • 3D CNN for volumetric data: 3D ResNet, memory optimization strategies
  • Data imbalance handling: Focal Loss, balanced sampling

2. Detection Strategies

  • Classic frameworks: Faster R-CNN, YOLO medical adaptation
  • Medical-specific: Hard negative mining, anchor adjustment
  • Evaluation metrics: mAP, IoU, clinical indicators

3. Whole Slide Image Analysis

  • MIL framework: Attention mechanism, instance-level learning
  • Memory efficiency: Patch-based processing, caching strategy
  • Interpretability: Attention map visualization

4. Best Practices

  • Clinical requirements: Accuracy first, speed second
  • Data quality: High-quality annotation, multi-center validation
  • Regulatory compliance: Model interpretability, decision support
  • Multimodal fusion: Comprehensive analysis combining imaging and clinical data
  • Weakly supervised learning: Reducing annotation requirements
  • Federated learning: Multi-center collaboration, privacy protection
Details

Datasets

DatasetPurposeOfficial URLLicenseNotes
NIH ChestX-ray14Chest X-ray Classification Detectionhttps://nihcc.app.box.com/v/ChestX-ray14PublicContains 14 types of chest disease labels
CheXpertChest X-ray Classificationhttps://stanfordmlgroup.github.io/competitions/chexpert/CC-BY 4.0Stanford standard dataset, with 5 abnormality labels
MIMIC-CXRChest X-ray Multi-label Classificationhttps://physionet.org/content/mimic-cxr-jpg/2.0.0/MIT LicenseReal clinical data from Boston Children's Hospital
PadChestChest X-ray + Clinical Datahttps://bimcv.cipf.es/bimcv-projects/padchest/CC BY 4.0Contains 100,000 X-ray images with clinical reports
DeepLesionLesion Detection Datasethttps://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=53683303PublicContains annotated data for various body lesions
MedicalDecathlonMulti-organ Classification Segmentationhttps://medicaldecathlon.com/CC BY-SA 4.010 organs' CT/MRI dataset
ChestX-Ray8Chest Disease Classificationhttps://www.kaggle.com/paultimothymooney/chest-xray-pneumoniaPublicContains pneumonia, normal X-ray images
ISIC ArchiveSkin Lesion Classificationhttps://www.isic-archive.com/#!/topWithHeader/onlyHeaderTop/galleryPublicDermoscopy image classification benchmark

Papers

Paper TitleKeywordsSourceNotes
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep LearningChest X-ray Pneumonia DetectionarXiv:1711.05225Stanford University, using 121-layer DenseNet
Focal Loss for Dense Object DetectionFocal Loss Loss FunctionarXiv:1708.02002Classic loss function paper addressing class imbalance

Open Source Libraries

LibraryFunctionGitHub/WebsitePurpose
MONAIMedical Imaging Deep Learning Frameworkhttps://monai.io/PyTorch library designed specifically for medical imaging, including classification, detection, and segmentation tools
TorchIOMedical Image Transformation Libraryhttps://torchio.readthedocs.io/Supports multiple medical image formats and enhancement transformations
deepmedic3D Medical Image Classificationhttps://github.com/DeepMedic/deepmedicHigh-performance 3D medical image classification framework, especially suitable for brain images
Grad-CAM++Explainable Visualizationhttps://github.com/jacobgil/grad-cam-plus-plusAttention visualization tool for medical image classification

The next section moves to augmentation and restoration because the first three sections quietly assume the input is already “usable.” In reality, we still need to ask: what do we do when data are scarce, image quality is poor, or contrast is not strong enough?

Released under the MIT License.