Introduction
Resampling techniques are one of the most effective ways to handle class imbalance in machine learning. These methods modify the dataset to ensure that both the majority and minority classes have a more balanced representation. This helps models learn meaningful patterns from all classes instead of being biased toward the majority class.
Why Use Resampling?
In an imbalanced dataset, a model trained without resampling may achieve high accuracy by simply predicting the majority class. However, this would lead to poor performance in detecting the minority class. Resampling helps improve model learning by either increasing minority class examples (oversampling) or reducing majority class examples (undersampling).
Types of Resampling Techniques
1. Oversampling (Increasing Minority Class Samples)
This technique involves adding more samples of the minority class to balance the dataset.
- Random Oversampling
- Simply duplicates random instances from the minority class.
- Increases class balance but may cause overfitting since duplicated samples do not introduce new information.
- SMOTE (Synthetic Minority Over-sampling Technique)
- Generates synthetic samples instead of duplicating existing ones.
- Uses nearest neighbors to create new, slightly different samples.
- Reduces overfitting risk compared to random oversampling.
- ADASYN (Adaptive Synthetic Sampling)
- A variation of SMOTE that focuses more on generating samples for harder-to-classify instances.
2. Undersampling (Reducing Majority Class Samples)
This technique involves removing samples from the majority class to balance the dataset.
- Random Undersampling
- Randomly removes majority class samples to match the minority class count.
- Can lead to loss of important data, potentially reducing model performance.
- Tomek Links
- Removes majority class samples that are closest to minority class samples, helping refine class boundaries.
- Improves decision boundaries without excessive data loss.
- NearMiss
- Selects majority class samples that are hardest to classify by keeping those closest to the minority class.
- Ensures that the remaining majority class samples provide meaningful learning signals.
3. Hybrid Techniques (Combining Oversampling & Undersampling)
Sometimes, using both oversampling and undersampling together provides the best results.
- SMOTE + Tomek Links
- SMOTE generates new minority class samples.
- Tomek Links removes noisy majority class samples.
- Helps improve class balance while refining decision boundaries.
- SMOTE + Edited Nearest Neighbors (ENN)
- SMOTE generates new minority samples.
- ENN removes misclassified samples from the majority class, reducing noise.
Choosing the Right Resampling Technique
| Scenario | Recommended Technique |
|---|---|
| Small dataset with severe imbalance | Random Oversampling or SMOTE |
| Large dataset with imbalance | Random Undersampling or Tomek Links |
| High risk of overfitting | SMOTE + Tomek Links |
| Dataset with noisy labels | SMOTE + ENN |
Conclusion
Resampling techniques are powerful tools for addressing class imbalance in machine learning. Oversampling helps by increasing minority class representation, while undersampling reduces the dominance of the majority class. Choosing the right resampling method depends on dataset size, class imbalance severity, and potential overfitting risks.

