Data Scientist was dubbed "the sexiest job of the 21st century" by Harvard Business Review in 2012. But what does the role actually involve day to day? The answer is: it depends enormously — and often less glamorous than the title implies.
Studies consistently show data scientists spend 60–80% of their time on data cleaning and preparation — not on fancy machine learning. The unglamorous work is the real work.
The core responsibilities
A data scientist's work typically spans several activities:
- Data collection and cleaning — gathering data from databases, APIs, and other sources; fixing quality issues
- Exploratory analysis — understanding data shape, distributions, and relationships through statistics and visualisation
- Feature engineering — creating useful input variables from raw data for ML models
- Model building — selecting, training, and tuning machine learning models
- Evaluation — measuring model performance rigorously and honestly
- Communication — presenting findings to non-technical stakeholders through charts, reports, and presentations
- Deployment support — working with engineers to put models into production
Data Scientist vs Data Analyst vs ML Engineer
| Role | Focus | Key skills |
|---|---|---|
| Data Analyst | Answering business questions with existing data | SQL, Excel, visualisation, statistics |
| Data Scientist | Building predictive models and finding hidden patterns | Python, ML, statistics, communication |
| ML Engineer | Building and deploying ML systems at scale | Python, systems engineering, MLOps |
| Data Engineer | Building the infrastructure that data flows through | SQL, pipelines, databases, cloud |
The typical project lifecycle
- Problem definition — what question are we actually trying to answer?
- Data collection — where does the relevant data live? How do we access it?
- EDA & cleaning — understand and prepare the data
- Modelling — build and evaluate candidate models
- Insights / deployment — share findings or push model to production
- Monitoring — track model performance over time; retrain when needed
The T-shaped data scientist
The best data scientists are "T-shaped": broad knowledge across statistics, coding, and domain expertise, plus deep expertise in at least one area. A data scientist at a hospital who also understands clinical workflows is far more valuable than one who knows only the algorithms.
Key takeaways
- Data scientists spend most of their time on data cleaning, not modelling
- The role bridges statistics, programming, and domain expertise
- Data analysts answer questions; data scientists build predictive systems
- ML engineers focus on deploying and scaling models in production
- Communication is as important as technical skill — insights must be acted on