Skip to content
Pusat Penelitian, Pengabdian kepada Masyarakat dan Publikasi Internasional
twitter
youtube
instagram
Pusat Penelitian, Pengabdian kepada Masyarakat dan Publikasi Internasional
Call Support 0822-7473-7806
Email Support [email protected]
Location Jl. Kolam No. 1 Medan Estate
  • Beranda
  • Tentang
    • Profil
    • Visi dan Misi
    • Struktur Organisasi
    • Pimpinan Pusat
    • Program Kerja
    • Sasaran, Program Strategis dan IK
  • Berita Kegiatan
  • Layanan & Informasi
    • Aplikasi
      • UMA
        • Penjaminan Mutu
        • Himpunan Aplikasi Online
        • Jurnal Ilmiah Online
        • Repositori UMA
        • Open Access Public Catalog
      • Unit
        • Aplikasi Penelitian & Pengabdian (LIPAN)
        • SWAMP-D
        • SUSITAO
        • SINTA Verifikator
        • BIMA Kemdiktisaintek
    • Arsip Digital
    • Helpdesk
    • Pendanaan
      • Penelitian
        • Penelitian Pendanaan Nasional
        • Penelitian Kerjasama Internasional
      • Pengabdian Kepada Masyarakat
        • PKM Pendanaan Nasional
    • Publikasi
      • Internasional Bereputasi
    • Reviewer Penelitian dan PKM
  • Kerjasama
  • Jadwal Kegiatan

Data Cleaning: The Unseen Hero of Every Great Data Project

Posted on June 17, 2025June 29, 2025 by Fachrur Rozi
0

In the world of data science, everyone loves talking about AI, big models, and fancy dashboards. But behind every successful project is something far less glamorous — yet absolutely essential: data cleaning.

Think of data cleaning as washing the vegetables before cooking. No matter how good your recipe is, if your ingredients are dirty, the final dish won’t taste right. The same goes for data: if it’s messy, your model will suffer — or worse, make bad decisions.


🤔 What is Data Cleaning?

Data cleaning is the process of fixing or removing incorrect, corrupted, duplicate, or incomplete data from a dataset. It’s the first (and arguably most important) step in any data analysis or machine learning workflow.

Some even say:

“80% of data science is cleaning the data. The other 20% is complaining about cleaning the data.”


🧹 Why Data Cleaning Matters

Clean data isn’t just nice to have — it’s critical. Here’s why:

  • 🧠 Better Models: Dirty data leads to wrong insights and poor predictions.
  • 📊 Accurate Reports: Ensures your dashboards tell the right story.
  • 🤝 Trust: Clean data builds confidence in your analysis and decisions.
  • 💡 Efficiency: Saves time in the long run by avoiding confusion later.

🛠️ Common Data Cleaning Tasks

Let’s look at some of the dirty work involved in cleaning data:

1. Remove Duplicates

df = df.drop_duplicates()

Two identical rows? Get rid of one.

2. Handle Missing Values

  • Fill in with mean/median
  • Use forward/backward fill
  • Or just remove the rows/columns
df.fillna(df.mean(), inplace=True)

3. Fix Data Types

Dates stored as strings? Numbers as text? Fix that!

df['date'] = pd.to_datetime(df['date'])

4. Standardize Formatting

“yes”, “Yes”, “YES”, “y”? Turn them all into one standard format.

df['response'] = df['response'].str.lower()

5. Remove Outliers

Extreme values can throw off your analysis. Use techniques like IQR or Z-score.

6. Correct Typos & Inconsistencies

“Califorina” should be “California”, right? Tools like fuzzy matching can help fix this.


🧪 Real-World Examples

Scenario Cleaning Example
Sales reports Fix date formats, merge regional codes
Customer feedback Normalize sentiment labels (“happy”, “Happy”, etc.)
Medical records Impute missing patient ages or blood pressure
E-commerce transactions Remove duplicate purchase logs

⚠️ Data Cleaning Challenges

  • Time-consuming: It’s slow, but worth it.
  • Decision-heavy: Should you drop or fix a missing value?
  • No one-size-fits-all: Every dataset is different.
  • Risk of over-cleaning: Be careful not to throw away valuable data.

✅ Tips for Smart Cleaning

  • Always make a backup of the raw data.
  • Visualize your data: graphs often reveal problems fast.
  • Automate repetitive tasks with scripts.
  • Document your changes — future-you will thank you.

💬 Final Thoughts

Data cleaning may not sound exciting, but it’s the foundation of good data science. You don’t need a PhD to clean data, just attention to detail, a bit of logic, and patience. Once your data is clean, your analysis can truly shine.

So next time someone asks what’s the most underrated skill in data science, you can confidently say:

“Cleaning data. It’s where the magic begins.”

Tags: 2025, Digital University, Dosen Terbaik, Green University, Kampus Internasional, Kampus Terakreditasi, Mahasiswa Berprestasi, Sustainable University, UMA Keren, UMA Terbaik, Universitas Swasta, Universitas Terbaik

Berita Terbaru
UMA Kukuhkan Posisi sebagai Kampus Swasta Terbaik di Sumut Versi SJR
Universitas Medan Area kembali mencatatkan pencapaian membanggakan di tingkat nasional dengan meraih predikat sebagai perguruan tinggi swasta terbaik di Sumatera...
UMA Terima Kunjungan STIE Graha Kirana: Perkuat Kolaborasi Tridharma dan Pengelolaan HKI
Medan, 24 April 2026 — Universitas Medan Area (UMA) menerima kunjungan akademik dari Sekolah Tinggi Ilmu Ekonomi (STIE) Graha Kirana...
KAMPUS I
Jalan Kolam Nomor 1 Medan Estate / Jalan Gedung PBSI, Medan 20223
(061) 7360168 CALL CENTER : 0811-6013-888
[email protected]
KAMPUS II
Jalan Sei Serayu No. 70 A / Jalan Setia Budi No. 79 B, Medan 20112
(061) 42402994
[email protected]

Statistik Pengunjung

  • 0
  • 35
  • 33
  • 21,870
  • 23,822
@Copyright 2026 BPDI | Universitas Medan Area

This will close in 10 seconds