What is data science?
- Data science is all about extracting insight from data using appropriate methodology to uncover hidden patterns and information.
- I like to think that data science roles can be broken into 3 categories.
- Translator: is a business analyst with strong commercial understanding and possesses the technicalities of data scientist and data engineer work to an extent where he or she might be able to perform simple coding in R and Python. This person is the bridge between business and technical. Translator usually focuses on basic descriptive data instead of models, which is less technical.
- Data scientist: Uses a data driven approach to solve a business problem by applying the right methodology. In certain extent the data scientist needs to have both a translator’s capabilities and some data engineering skillsets.
- Data engineer: Provides the readily available data for the data scientist in development and implementation phase. This involves building data pipelines and ensuring proper collection of data for further analysis.
Could you tell us about some of your favourite data science projects you’ve done?
- Built an end-to-end credit scoring engine for a banking client to help achieve the following:
– Expedite loan approval process by 3x faster.
– Increase predictive power of model by leveraging multiple data sources.
– Digitize loan application process and minimize manual intervention.
- Some of the statistical techniques I employed within this project was:
– Used various statistical and machine learning models to predict customers that will default.
– Leveraged multiple and unique data sources to enrich the information of a customer (e.g. demographic data)
What skill sets do you look for in a data scientist?
As a data scientist, one is expected to be technical sound and also required to have solid non-technical skills.
Technical Skills >>
- Programming: Above all, a data scientist should be proficient at programming. The most popular and sought after skills are Python and R as they are open source for data manipulation and building models. In addition, SQL is vital for database query as most companies would have SQL server to store data. Furthermore, in some cases you might need to wrangle data in big data environment using PySpark and Hive.
With good programming capabilities, a data scientist should at minimum be able to perform data manipulation and data wrangling. Hence, anyone who looks to break in should be comfortable to:
– Massage data in various format and variable types.
– Transform the data into appropriate structure for analysis and modelling purpose.
- Problem Solving: Applying the right methodology on data to solve the problem at hand (e.g. statistics, machine learning, deep learning and optimisation). Understanding the nature of the problem and being capable of applying the right method is key. For example, if your objective is identifying churn, then a data scientist should be able to correctly identify the need to build a classification model (e.g. logistic regression or random forest).
- Data visualization: A picture is worth a thousand words. Or, in this case, a picture is worth a thousand lines of data. Data scientists need to be able to present their findings in visual manner as it helps make trends easy to follow as the data grows in volume and complexity.
Non-Technical Skills >>
- Domain Knowledge: Having a background in sectors like financial services and marketing can help develop effective sector-specific analysis skills. Individuals with strong expertise are able to leverage on their previous experiences to identify use cases and ensure proper implementation of projects.
- Communication: A data scientist must be adept in explaining their analysis in an easy-to-comprehend manner especially when communicating to other non-technical stakeholders.
What kind of mindsets make a good data scientist?
Knowledgable: High intellectual curiosity and thirst for knowledge are must-have traits for a data scientist. This is important as the field is expanding exponentially. Hence, we, data scientists are expected to keep up to date with the latest technology and methodology.
Sharp: Fast-learner is a must as often times we need to understand new areas within a short time frame. This is particularly true when designing use cases that could potentially add value from data science.
Good people skills: Those days where technical coders sitting at the corner of the room doing their own work are over. Data scientists nowadays need to be an integral part of the project to ensure timely delivery and project success. Setting the right expectations with non-client stakeholders is key alongside with the ability to work in flexible fashion with incremental end results.
Data scientists can come from very different career paths. What are some of the most common career transitions you’ve seen people make into this field?
- Science Background: Computer science, engineering and statistics degree holders are typically a natural transition into this field.
- Technically Savvy Business: Nowadays people from business backgrounds become data scientists. They tend to have coding skills from undergraduate studies.
- Consulting Transition: Ex-Consultants who want to take on a more technical role.
The key thing to remember is that there are a lot of materials available online which help people to self-learn and get up to speed. In addition, there is an abundance of datasets to play around (e.g. Kaggle) allowing people from all walks of life to break into this field.
These points reflect the personal views of our interviewee. Any of the discussion points does not reflect any of the organisations he has worked or is currently working at.