7 Steps to Prepare a Data Scientist Job Interview in Silicon Valley in 2020. Analytics & Inference Track
Photo credit: https://unsplash.com
About the author: Leon is a data science and machine learning executive from a FAANG (Facebook, Apple, Amazon, Netflix, Google) company in Silicon Valley. Prior to his current role, he runs the machine learning org at EdTech company Chegg and worked as a research scientist at amazon.com building large scale machine learning systems.
He has interviewed thousands of data science or machine learning candidates as a hiring manager or part of a hiring committee in his career.
Generally speaking, there are 2 kinds of data scientists in top technology companies such as a FANG (Facebook, Amazon, Netflix, Google) in Silicon Valley, and here is how I would separate them into these 2 camps.
1. Data scientists: Analytics/Inference track
Photo credit: https://unsplash.com
Those are the people whose daily job typically involves collecting data, running analytics and experiments, sharing reports, and scorecard on product health and customer churn, marketing campaigns performance.
After data analysis, they are often asked to come up with innovative ideas and proposals to improve product features and validate those new ideas using techniques such as a/b testing.
In summary, those people use data to ‘tell a story’ and to drive business decisions.
Most of them that I met come from statistics, mathematics, economics, psychology, physics, or other quantitative but non-computer science background.
The salary range for this track usually is not as good as a software engineer or a machine learning engineer, typically 15%–20% lower, but the advantage of this track is that it’s much easier to get into or find a job.
It can also be a great cornerstone for those who are interested in getting into a machine learning track later on.
2. Data Scientists: Machine Learning Engineering/Algorithm Development track
Photo credit: https://unsplash.com/photos/fch6vkbouCc
People who have been very successful in this track are usually hardcore computer scientists or software engineers, they not only understand basic or even advanced machine learning theories, but can also implement ideas and make things happen.
The biggest unique advantage for those machine learning engineers or algorithm developers is they can quickly convert ideas into a prototype, and create production-level source code that efficiently implements machine learning models into production, or external customer-facing environment, thanks to their computer engineering background.
The salary for the machine learning engineer track is at least on par with the software engineer track, if not much higher. The bar to get into the machine learning engineer track is high, usually requires a good understanding of machine learning theories and practices, but also solid software engineering skills.
Table 1: skills comparison for 2 data scientists tracks: Analytics/Inference vs. Machine Learning/Algorithm
In this article, I will focus on the analytics/inference track and walk you through my 7 steps to prepare for a data scientist job interview.
SQL is a must-know programming language for any data analytics professionals.
However, many college graduates or young professionals are starting their job search without a solid understanding of SQL, or are struggling with coding questions — which ultimately costs them their dream jobs.
The SQL interview can bear other names such as Technical Analysis during an interview at a FAANG company, you will be asked to perform a series of SQL operations to extract data and insights, and answer follow-up questions about their products.
(*) FAANG: Facebook, amazon, apple, Netflix, and google
Given a table of user sign-up dates and their registered countries, write a query to produce the number of newly joined daily users in the last 30 days by our top 2 countries.
- user_id |BIGINT
2. joined_at | DATE
3. country | VARCHAR
How to prepare
a. If you are an absolute beginner:
Consider taking an online SQL course to get a basic understanding of SQL, then jump into coding practices.
A resource to consider: Cracking the SQL Interview for Data Scientists, to learn SQL basic SELECT statements to advanced WINDOW functions step by step, with a lot of coding assignments to reinforce your learning.
b. If you are an experienced SQL user:
There is no better way to prepare for a SQL interview than practicing coding exercises.
A resource to consider: sqlpad.io, where you can practice and solve 80 SQL coding interview questions.
The 80 questions range from basic SELECT statements to advanced window functions, which will get you ready to answer a wide range of SQL interview topics.
(Full disclaimer: I am the author of both the Cracking the SQL interview for Data Scientists course and sqlpad.io.)
c. Pay special attention to WINDOW functions.
WINDOW functions are a family of SQL utilities that are asked quite often during a data scientist job interview.
Writing a bug-free WINDOW function query could be quite challenging for any candidates, especially for those who just get started with SQL. It takes time and practice to master those functions.
2. Product Sense
Photo credit: https://unsplash.com/photos/7OFnb7NOvjw
One of the data scientists’ main responsibilities is to extract insights from data and work with product managers and engineering teams to deliver actionable plans to improve the product. Think about how you would measure the success of different parts of the product. Why do you think the placement of the text box is at that specific locations, what can you do to improve it?
The interviewer will try to evaluate your ability to apply data to the real product problem, how you systematically approach and structure the problem, form a hypothesis with reasonable assumptions, design, and test hypotheses through A/B testing, and use data and facts to convince others to adopt your recommended approaches.
- If revenue dropped in a given week, what metrics would you look at to understand and why?
- How would you measure the health of our product search functionality?
How to prepare
- I highly recommend going through this book Lean Analytics: Use Data to Build a Better Startup Faster (Lean Series), which gives you a very good sense of how startup companies use analytics to drive their product decisions. Top technology companies, especially those in Silicon Valley, regardless of their sizes, tend to think of themselves as still a startup, at least with a startup mindset in terms of growing the company.
- If you still have time, consider reading this book: Cracking the PM interview. If you are short on time, I would go through those 3 chapters: product, case studies, behavior questions.
3. Data processing with Python/R
The interviewer will evaluate your skills in basic operations in Python/R, 2 of the most popular programming languages, in most of the data science teams in Silicon Valley.
The bad news is that if you are not familiar with neither of the two languages, you will most likely not even getting a chance for a phone interview.
The good news is that you don’t actually need to know both of them, pick either one and become very good at it. Build a project using either R or Python.
A side note: from my observation, it is highly likely Python will become the dominant player because of its great ecosystem, it’s a general programming language and much easier to productionize and serve a python model on the internet, comparing to R.
If you are brand new to either R or Python and are going to choose a language to start with, I would pick Python.
I used to be a heavy R user and have presented at useR!, but completely switched to Python 5 years ago, and never regretted it.
In addition to basic data processing, you will very likely be asked to perform a series of analytics, visualization, or modeling with the data sets to make sure you will be hands-on with the tool, and get a sense of your experience level.
Read a CSV file into Python/R, handle missing data, build and train a classification model, evaluate its performance, and prepare a report and share the Jupiter notebook with the interviewer.
How to prepare
a. For Python people
- For people new to Python: datacamp has classes that cover pandas, matplotlib, seaborn and good enough for you to get started;
- After you familiarized yourself with basic data processing, you can jump onto sci-kit learn libraries which have some very good tutorials including data processing, feature selection, and modeling with real data: https://scikit-learn.org/stable/tutorial/index.html
b. For R people
- Coursera’s R programming class can help you brush up your skills in a couple of weeks. https://www.coursera.org/learn/r-programming
In the end, if you still have time, I also highly recommend creating a Kaggle account, join a couple of competitions there, and read other top competitor’s R/Python code, which will significantly help you understand how to solve a real-world problem, (e.g., normalize data, handle missing data, create ensemble learning to boost models performance), and become a better data scientist.
4. A/B testing
Photo credit: https://unsplash.com/photos/E1eCQdiO_E4
A/B testing is a statistical framework that helps validate an idea or a theory, through data.
For example, a product manager wants to know if changing the color of a buy button from green to blue can encourage more purchases, as a data scientist, it is your job to work with the product manager and quite often engineering team(can help implement the testing settings) to come up with a testing plan.
You need to decide at least, how many people will see the different colors of the button (sample size), and how many days will the testing run (usually multiples of a week, 7 days), and where should it be running (US only, or some other small countries just in case testing group is a failure, you don’t want to have a very negative impact to the revenue).
The key assumption of A/B testing is that the control group and the testing group have to be independent, you will probably be asked several questions around this assumption.
You will also need to understand key concepts such as novelty effect, learning effect, A/A testing, Simpson’s paradox, etc.
The engineering team just invented a people-you-may-know widget, if it is implemented, a user will see their friends on the right-left corner of their homepage, how do you design an experiment to decide whether we should launch this feature or not.
How to prepare
Udacity has a free introduction class taught by practitioners from Google, which I highly recommend, as long as you get yourself through this class and feel comfortable with key concepts and finished the home assignments, you should be able to handle most of the A/B testing related questions. https://www.udacity.com/course/ab-testing--ud257
A side note: very often you will be asked to make recommendations based on different scenarios, e.g., if the results are significant, what should the product marketing team do, and vice versa.
To answer this question, always use a framework, for example: if it is confirmed significantly positive, double down on this approach, expand this success story to other markets and repeat the test.
If it turns out the results are not significant or significant but in a negative way, come up with new theories and start testing new ideas.
It’s a never ending new ideas/proposals => A/B testing => recommendation cycle 😃.
5. Statistics/Statistical Inference
Photo credit: https://unsplash.com/photos/WY302kitn7U
A data scientist is a statistician lives in San Francisco.
Jokes aside, as a data scientist you will most likely encounter many situations happening in the real world, for example, missing data, unbalanced samples, how to decide sample size, perform hypothesis testing, form reasonable assumptions, explain to your business leaders what significance interval means. Therefore statistics skills are absolutely necessary to ace a data scientist interview.
What is Type I and Type II error, how do you explain p-value to a non-technical people? What are the assumptions for 2 sample t-test?
How to prepare
You can practice statistics questions on brilliant.org, which I found it quite easy to quickly brush up my skills in preparing statistics interview questions:
Side note: probability questions are not the same as statistics questions. You can think of probability questions are more about math, while statistics questions are more about dealing with real data.
For 2 fair dices that with marks 1–6, how many times on average we have to roll so the sum of the two dices ends up greater than 10?
How to prepare
brilliant.org is a good resource.
7. Behavior questions
Photo credit: https://unsplash.com/photos/GoXNygZlftg
Behavior questions are probably the easiest part to prepare that generate the most ROI (return on investment), but many people spend very little time on this and get caught off guard with questions like: tell me a time when you disagreed with your boss.
- Tell me about your biggest failure/success/favorite project.
- Describe an unpopular decision you made with the product team. How did you handle the situation and implement it?
How to prepare
List your past 5 projects with interesting stories using the SAR framework ( situation, action, and results) that can demonstrate your leadership, successes, failures/mistakes, challenges(disagreement with your manager, coworker).
Find a partner and practice through a mock interview and get their feedback, the important things is that your stories have to be ‘meaty’, and be prepared when an interviewer dive into the details.
Another resource to consider is amazon’s top leadership principle.
Those are the 7 areas I recommend you to focus on interviewing analytics/inference track data scientist positions.
It is the same process I use to ace my interviews at some of the top tech companies.
I hope they are useful, and if you have any questions, please feel free to reach out to me.
Whether you are a first-time job seeker or a professional who wants to make a change to your career, you can find me on Twitter, or online chat with me on sqlpad.io.
About the author
Leon is a serial entrepreneur with 15 years of industry experience in data and software. He currently serves as a senior executive at a top Fortune 100 company in Silicon Valley.
Prior to that, he leads the machine learning engineering org at Chegg, and also worked as a research scientist at Amazon solving large scale machine learning problems.
He has interviewed thousands of candidates as a hiring manager or part of a hiring committee, and this is an article based on lessons he learned from those interviewing experiences, dedicated to anyone entering a data science career.