There are many different types of Data Science roles. The first step is to determine which role(s) are well suited to your interests and skill set. While all of these roles can be grouped under Data Scientist, some may appear under other job titles.
For a lot of companies, Data Scientists are synonymous for Data Analysts. They perform ad-hoc analysis through SQL or Excel and use business intelligence tools to produce dashboards and visualizations for company reporting. These reporting duties can range from aggregating metrics on company performance to running A/B tests to determine product direction.
While not the flashiest job, these Data Scientists have a strong footing in the business-product world and the opportunity to inform important decisions for the company. These roles are particularly well suited for candidates transitioning from a less technical background.
Tip: Get comfortable with communicating to non-technical stakeholders and drawing insight from data. A deeper understanding of mathematics/Machine Learning/programming is nice to have, but not imperative to this role.
On the other hand, there are Data Scientists whose duties are more aligned with Data Engineers and Software Engineers. These Data Scientists are responsible for building and maintaining the data infrastructure to support the rest of the company. Their responsibilities include monitoring the data pipelines, improving the data warehouse and maintaining API endpoints for serving model predictions.
Typically, these Data Scientists are the first hires for budding data teams. Strong programming skills are more important for this role than being an expert in mathematics and Machine Learning.
Tip: Get familiar with dev-ops and data-ops practices. The hiring process for these roles are very similar to Software Engineers: practice leetcode and algorithms.
Companies with more established data teams or companies that incorporate Machine Learning as a part of their core product will hire scientists for the sole purpose of maintaining, improving and building new models and AI systems. These positions typically require a graduate degree and or prior research experience in Machine Learning.
For smaller companies whose core product is based in Machine Learning, these scientists are also expected to write code in production.
Tip: It is very hard to get these roles without the required education and or research background. To get your foot through the door, look for Machine Learning roles in smaller companies where they are more lenient about these criteria.
In addition to these three types of Data Scientists, some companies exclusively look for Data Generalists, or someone who is familiar with all of the above. Data generalists are typically hired to provide general support to other company functions by building dashboards, maintaining the data warehouse and occasionally building models to improve operational processes.
The required skills are very different based on which of these three roles you pursue.
Analytics Data Scientist
Recommended Project: analyze a dataset by relating it to a tangible business problem, providing visualizations and thorough explanations on casual effects and how these findings can be leveraged.
Recommended Course(s): Applied Data Science with Python Specialization
This 5-course specialization covers inferential statistical analysis, practical data visualizations and how to use graphs and networks to visualize and analyze data.
The Developer Scientist
Recommended Project: make data more palatable by writing API wrappers or set up your own workflow management platform to orchestrate simple jobs and processes.
Recommended Course(s): Specialization: Python for Everybody
This 5-course specialization introduces fundamental dev-ops concepts entirely in Python. The capstone project involves retrieving, processing and visualizing data using Python.
Machine Learning Scientist
Recommended Project: dissect and reproduce results from research papers in your field of interest.
Recommended Course(s): Machine Learning Specialization
In addition to these specialized skills, here's a common check list for all aspiring Data Scientists irregardless of which role they want to pursue.
Familiarity with at least one programming language
Python is the de facto language for Data Science at the moment, although R remains a popular choice for more analytics and statistics heavy roles.
Experience working with Data
Regardless of the role, expect to get close and personal with data. Data in the real world is messy and prone to errors, which makes data wrangling an important skill to have.
Business Acumen and Communication Skills
These skills will come in handy irregardless of how theoretical or research heavy your job is. Understanding the revenue drivers of the company and the Key Performance Indicator of each stakeholder helps Data Scientists stand out in the company and deliver value.
Mathematics and Statistics
You don't need to be a linear algebra or multivariate calculus expert for most Data Science roles. You should, however, have at least a basic grasp in these areas to understand commonly used algorithms and how to interpret their outputs.
A Scientist's Curiosity for Uncovering the Truth
Never settle for assumptions. Don't let your own biases and the biases of others influence the results of your work. Build systems on ground truths and always set proper expectations on what these systems can accomplish.
If you never worked in a data related role before, prioritize side projects and contributions to highlight your skills, technical aptitude and passion for this field. Don't forget to emphasize any relevant skills that you demonstrated in previous roles, even if these roles are unrelated.
For each project description, don't forget to include:
For one of our practice guides, we used text reviews to predict wine sentiment. Here's a sample description for what this project might look like on our resume.
Predicting Wine Sentiment using Text Reviews [link to Jupyter Notebook]
Tech stack: Python, Keras, Pandas
The easiest way to host projects is on GitHub or GitLab. Make sure that there is a detailed README to inform readers about the contents of the project with instructions on how they can reproduce the results.
For Python projects, this should include:
GitHub also offers free web hosting through GitHub Pages.