Fundamentals

What is the bias-variance trade-off?
What should you do when your model is suffering from high bias?
How do you determine if your model is overfitting?
How would you solve the problem of overfitting?
What are some common regularization techniques? Explain the effect of L1 versus L2 regularization on a model.
What is ensemble learning?
What is dimensionality reduction? When would you use it?
How do you detect anomalies or outliers in your data? What would you do with the anomalies?
What is transfer learning, and how does it work?
What is self-supervised learning, and when would you use it?
What is weak supervision, and how would you train a model using noisy labels?
How would you identify and prevent a model from going stale when it’s deployed?
What is your exploration process when tackling a new problem?

Data & Featurization

How would you train a model when your data is highly imbalanced?
How would you train a model when you only have a few labels (i.e., semi-supervised)?
What is 1-hot encoding, and why is it important?
Why is it important to normalize/standardize your data?
What is the difference between normalizing and standardizing your data?
How would you tackle the problem of a shifting data distribution when deploying a model?
What types of challenges have you had to overcome when working with large-scale data?

What is the difference between a Generative and Discriminative model?
What is an example of a Parametric and a Non-Parametric model?
What is hyerparameter optimization, and what are some techniques?
How does a Decision Tree work?
How does a Random Forest work?
What are the advantages of a Random Forest model over a Decision Tree?
How does the bias-variance trade-off apply in the selection of a Decision Tree versus a Random Forest model?
Both being tree-based algorithms, how is Random Forest different from Gradient Boosting?
How does Gradient Boosting work? Is it robust to overfitting?
How does K-Means clustering work?
How does KNN work?
What is the difference between K-Means and KNN?
What are support vectors in SVM?
What are the common categories of Neural Networks, and when would you prefer one architecture over another?
What techniques can you use to explain your model’s predictions? Can you give an example for Neural Networks and Random Forest models?

How would you explain False Negative, False Positive, True Negative, and True Positive?
What is Precision and Recall?
What metrics would you use to evaluate model performance on imbalanced datasets?
What is F1 score, and when would you use it?
What is an ROC curve, and how would you use it to evaluate model performance?
How would you select an operating point on the ROC curve?
What is AUC?
What is cross validation, and when would you use it?

What programming languages do you usually work in?
What’s your database experience?
Have you deployed your work to a production environment before (e.g., from an internship)? What was the tech stack?
Have you worked with parallel processing or distributed systems?

Why are you excited to work for this team/company?
What makes you want to go into industry rather than academia?
What is your ideal split between research and engineering work?
Why are you interested in working on a product team?
Describe one of the biggest research challenges you’ve had to overcome.
Tell us about a scenario where you had to collaborate with others to solve a problem.
Tell us about about a situation where someone on your team wasn’t pulling their weight. How did you resolve it?

Check out Section II. The Art of Interviewing in my previous blog post on Landing Your ML PhD Dream Job for an idea on the types of programming questions to study.