The Data Science Application Cycle

Many people have asked me how to prepare for data science interviews.  Now that I have finished this grueling process, I would like to wrap up that experience by consolidating everything I used and including it here for others.  I know MANY other articles and blog posts have been written (as referenced later in this post), but I am writing this for me and for those who have asked.

businesswomen businesswoman interview meeting
Photo by Tim Gouw on Pexels.com

Every application cycle is like a funnel.  You must apply to a plethora of positions, in order to get several HR interviews, in order to get some technical phone interviews, in order to get a few on-site interviews, in order to get one (or hopefully more than one) offer.  Your goal is to make the funnel as steep as possible.  Once a company shows interest in you, do everything in your power to make it to the on-site interview.  Hopefully these suggestions at each step will help.

Steps to a new data science/analytics job:

Step 0: Prepare to apply

  • Prepare a baseline version of your resume. Ask several people to look for typos, inconsistencies, and awkward or unclear phrasings, and to provide formatting suggestions.(You can send it to me if you want. I’m good at finding typos.)  Ask at least one person with an analytics background.
  • Create a CV with all of the classes, jobs, and extracurricular/volunteering activities you have participated in since freshman year of undergrad, including one-day volunteer activities. This will serve you well later as you tweak your resume.
  • Update your LinkedIn. Check for consistency and typos.  Look at the LinkedIn profiles of those you worked closely with for inspiration.
  • Prepare a list of the projects you are most proud of. Indicate the goal of the project, your contribution, and the technologies you used.  If possible (keep NDA constraints in mind), put some of your work on an online portfolio or in a GitHub repository.
  • Make a list of companies to apply for. Sign up for job boards (e.g. Indeed) and create alerts for companies you are really interested in (if and only if you are willing to receive a lot of annoying emails).
  • Make a list of job non-negotiables. Are you only interested in jobs in Sunnyvale, CA? Are you only interested in jobs in which you can use R?  Those criteria will help narrow your search quite a bit, but hopefully not so much that you cannot find a single job meeting your criteria.

Note: These are nice-to-haves and not necessary to begin applying.

Step 1: Find positions to apply for

I know some people say you should never apply for a job online and should instead use your network to put your name in front of the hiring manager, even if the “hiring manager” is not looking to hire at present.  That may be true for some people, but so far it has not worked for me.  Perhaps you need a certain amount of experience or saturation within an industry or sheer guts to pull it off.

There is nothing wrong with applying for jobs online without a referral. Your interview / applications percentage will be lower, but so long as you are willing to submit more applications to get the same number of interviews, that is not a problem.  I applied online without a referral for two of my recent positions, so it is certainly possible to get an offer via this route.

When searching for jobs on a company’s website, expand your search terms from “Data Scientist,” since each of the following roles could be data science-type work: “Quantitative Analyst,” “Statistical Programmer,” “Research Analyst,” “Data Consultant,” “Product Analyst.”  I typically start with the keywords “data,” “statistic,” and “analyst,” and hope that captures most of the relevant positions.

Even if you do apply online, if you have a connection at the company, ask for a referral.  The internal recommendation system is typically both easy and win-win.  You get a job, your friend gets a referral bonus, and HR gets a position filled relatively painlessly.  If your friend volunteers to pass along your resume directly to the hiring manager (bypassing HR), even better.  I only ask for referrals from people I actually know who can at least vouch for my character, even if they are unfamiliar with my work.  But if you are comfortable asking random people on LinkedIn to refer you to their company, by all means do that.

Before, during, and after your application cycle (i.e. always), meet as many people as possible doing what you want to do (or something you could see yourself doing).  If nothing else, meeting people who use data to do something interesting and/or impactful encourages me.  Sometimes after reading too many job descriptions for repetitive “data science” (more like “data entry and preliminary modeling”) roles, I get discouraged.  But meeting people with interesting jobs encourages me to stick with the application cycle and not settle for a boring job.  Go to meetups, volunteer at conferences, and take advantage of LinkedIn’s Career Advice system.

Note: In my experience, it is better to first find jobs I am interested in, then look for people to recommend me than to ask my connections about roles within their company. Under the latter paradigm, I do get interviews but am not actually interested in the position, sometimes because I have heard too many negative things about the company or work from my connection, and sometimes because the position was not a good fit to begin with.

Step 2: Apply for Jobs

Resume:

I always tweak my resume for the position I am applying for to incorporate keywords.

Many companies, especially the larger ones, use a software program to identify certain keywords in each resume in order to find a short list of candidates.  Humans reading your resume will also benefit from seeing the same language in your resume as in the job description.  For example, if the hiring manager is looking for someone with experience in data manipulation, change all synonyms (e.g. data munging, data wrangling) to “data manipulation.”  Why make the reader think?

I always read the job description a couple times and make a list of words or phrases that could appear as keywords, including both tools (i.e. python, R shiny, SQL) and actions (i.e. data manipulation, forecasting).  I also find a few words to summarize the type of person they are looking for to help with the cover letter.  I then change my resume as needed, typically by adjusting the classes listed under “Relevant Coursework,” adjusting the order of languages listed and bolding the ones mentioned in the job description, and adjusting the language used in the bullet points for each former role.  There are a few roles I swap in and out, depending on relevance for this position.  This is where your CV from Step 0 will serve you well.

In practice, I typically take the latest resume I created with a similar job description, “Save As,” and start tweaking.  Late in the application cycle, I realized it would be better to have a resume for each type of job description: modeling-focused, coding-focused, or research-focused, and then make changes off of those.  (I did not actually implement this system.)

Cover Letter:

If there is an option to submit a cover letter, I always do it unless I am unqualified and applying “just in case” HR is willing to give me a shot.  I am not sure if it really matters, but I would hate for this to be the reason I am not passed onto the next round of interviews.

Organization:

Organization is key when applying for jobs, given the number of applications you may submit.  I use an Excel sheet to record the company, position, application status and date (e.g. “applied online 1/29/19”), hint to remember my password for the job portal, and link to the position.

I recommend downloading the job description after applying; the job posting is frequently taken down before the final interview and it is slightly embarrassing to ask HR for a copy.

On my local computer, I have a folder for each company with the following documents per position: resume, cover letter, job description, and any other relevant documents.  I submit each file as a pdf to avoid formatting changes between computers.  If using a Mac, open the Word files in Pages and export them as pdfs to ensure hyperlinks are preserved.  I save my files with the following format: [First Name Last Name]_[resume/cover letter]_[Position Name].

Step 2b: Complete a Pre-Interview

Sometimes recruiters request more information before inviting you to interview.  There are several different formats this could take:

  • Short Answer Questions: Questions relevant for the role, such as “What are steps you undertake when beginning a data analysis?” These are typically questions HR or the Hiring Manager theoretically could ask you over the phone, but it’s easier to read responses than listen to responses.
  • Take-home Data Challenge/Case Study: I was fortunate to not have to do any this last application cycle, but this can undertake many forms. You may be asked to do an analysis and create a PowerPoint presentation or write a report.  They may be looking for a specific answer (i.e. one particular gene is significantly associated with the dependent variable), may want to see if you are capable of conducting an analysis, or may want to measure your ability to present your findings.
    • Data Masked: For practice, scroll down for the questions under “Take-Home Challenge ebook Samples.”
  • Coding Test: Either timed and on an online platform like Hacker Rank or untimed and via Word (typically for SQL tests).
  • Coding Sample: You may have to create a coding sample from scratch, if everything you have coded belongs to your employer or the scope of your homework exercise is too small. Once you have a coding sample, go through your code again to see were you can make it clearer.  Document every function/chunk.  Note: I am still not sure what a good analytics coding sample looks like, because oftentimes my scripts preprocess one particular dataset and so I don’t write functions because every step applies only to this dataset.

Take your time with these, but ensure you submit it on time, if there is a deadline!  (I once submitted my pre-interview after the deadline and hoped the recruiter would not notice its tardiness.  I still made it to the HR interview, but don’t test your luck!)

Step 3: Survive an HR interview

Although much is uncertain in the application cycle, you can be fairly confident that someone will ask you a version of at least one of these questions at some point.  Thus, prepare at least vague answers to these questions.  Even if you are not explicitly asked any of these, your answers to other questions should be shaped by your answers to these:

  • What attracted you to this position?
  • What attracted you to this company?
  • Why are you qualified for this position?
  • What do you know about this company?
  • What are you looking for in a job?
  • What is your experience with [Insert tool, e.g. Excel, here]?

Have some questions in mind for the recruiter too.  Some of my favorites:

  • What are the non-negotiables for this position?
  • What other teams does this team work closely with? Who does this role work closely with?
  • What do you like best about this company? Where do you see this company headed in five years?

Ask whatever questions you need to gain a better understanding of what the team is looking for. And since this is typically the first interview, it is best to glean as much information as possible about the role as early as possible.

Organization Tips:

  • Create a cheat sheet for each interview with answers to the questions from above and your questions for the interviewer.
  • Print your resume, the job description, and your cheat sheet.  It is much easier to describe why you are attracted to this position if you have the description in front of you.  And your interviewer will almost certainly have your resume in front of him/her, so this way you are on the same page.
  • During all interviews, take notes in a notebook so all your notes are in one place.  You will want those notes later when prepping for the next interview.

Step 4: Execute the Technical/Hiring Manager phone interview

Step 5: Ace the on-site interview

Preparations for Steps 4 and 5 are fairly similar, so I will be describing both together.  The technical phone interview is like homework, and the on-site interview is like the exam, where sometimes homework is harder and sometimes the exam is harder, but you need to know the same material for both and need greater stamina for the exam.

A better way of subdividing these interviews types are: talking (i.e. behavioral/HR questions) interviews and coding/technical interviews.  I am always concerned if asked exclusively talking questions or exclusively technical questions.  If not asked enough technical questions, I can only assume the team’s technical skills are not particularly sharp.  If not asked enough talking questions, I can only assume the company doesn’t care about my goals or if I work well with others.  Because you oftentimes will not know in advance if your interview will be talking or technical, and because many interviews are frequently a cross between the two, you need to prepare for both types.

For both talking and technical interviews:

  • Look at Glassdoor reviews.
    • Go to glassdoor.com.
    • In the search bar at the top, enter the company name, “Companies,” and the location (i.e. Mountain View, CA).
    • Click on “Interviews,” and then filter by clicking on “Data Scientist.”  Take all these responses with a grain of salt, especially if they are from several years ago, but make sure you know what you would do if asked those questions.
glassdoor_fb
Complete the search boxes at the top, then click on Interviews (4.8k Interviews here).
glassdoor_fb_interviews
Filter to Data Scientist interviews only by clicking on the “Data Scientist” button.
  • Leverage LinkedIn profiles. LinkedIn profiles may give you a broader view for what the position entails, the types of questions your interviewer may ask, and the educational/work backgrounds of members of this team.
    • If you have the name of the person interviewing you, look him/her up on LinkedIn.
    • If you only know the department or title for the position you are interviewing for, search LinkedIn using that information.

“Talking Interviews”:

“Talking interviews” will typically be with someone with a stronger technical background than HR.  This may also be someone you report to, so s/he is interested in asking you behavioral questions in order to know if s/he can work with you and what your goals are.  Or this may be someone assigned to test your technical capabilities who has no connection to the position.  Here are my suggestions:

  • Know each project on your resume well. If you mention a model on your resume, know what features and model evaluation criterion you used and why.  Know where the data came from and how large it was.  (Recruiters and hiring managers always want to know the size of the largest dataset you have worked with.)  Know why you started working on this project and what the impact was.  Know what next steps you intend to take with this project or what you would change if you did the project again.
  • Be able to answer standard behavioral questions, in addition to the questions you prepared for your HR interview:
    • Tell me about a collaborative experience you had.
    • Tell me about a challenge you faced.
    • Tell me your greatest strength and greatest weakness.
    • Always, always, always have some questions ready for the interviewer. Remember you are interviewing the manager/team/company as much as you are being interviewed, and this is your chance to ask a few questions.
  • Learn about the company: look at their website and look at their Wikipedia page (to read about the company’s history from an unbiased source). I have been asked on two or three occasions to detail everything I know about the company. Prepare to overload your interviewer with knowledge.
  • Read one or two articles about the company in order to know something about the company’s current issues. The first place I would look is TechCrunch.

Technical Interviews:

Technical interviews vary a lot.  These are the types of formats I have had:

  • Brainstorm how you would tackle a business question using analytics (i.e. talk through an analytics project your interviewer worked on)
  • Short answer stats/data analytics questions, either verbally or using a whiteboard (e.g. What are the assumptions of linear regression?)
  • Pseudocode/actual code either on a whiteboard or a computer

Sometimes you will know in advance which format your technical interview will be; oftentimes you will not. Regardless, there are seven sections to focus on, some of which overlap a bit.

However, before I describe those seven sections…There are some great resources which touch on all seven sections.  If you only read a few resources, I would read these.

General Resources:

The seven sections:

a. SQL: This is my favorite section to study for because a little studying can go a long way and you either got a right answer or you did not.

  • Practice makes perfect. Do these exercises, pretending someone is watching you and you have to explain your decisions (i.e. why you chose to use a left join instead of an inner join, what assumptions you are making).  These exercises are designed for data science interviews, and the nice thing about them is that you don’t know which commands to use.
    • Data Masked: Scroll down for the three free questions under “SQL Query Samples.” Unfortunately, there are no answers, but you can test your queries using R (although `sqldf` does not allow window functions) or python.
    • The Ultimate Guide to Data Science Interviews (mentioned above under General Resources).
  • Learn window functions. It’s possible that you will not need to use a single window function during your interviews, but why wouldn’t you learn them if there is even a slight chance you will need them?  Interviewers will be impressed if you use them correctly.
  • If you have the time and need to actually learn SQL (as opposed to just practicing), go through sections of Querying Data with Transact-SQL, part of the Microsoft Professional Program in Data Science. The exercises are a little too basic, so try writing the entire query rather than just filling in the blanks. This class does not cover window functions but does cover subqueries and table expressions.

b. Machine Learning:

Know the following methods, ranked in order of decreasing importance:

 

 

  • Definitely know:
    • Linear regression, including ridge and lasso penalties
    • Logistic regression, including ridge and lasso penalties
    • Random forest
  • Nice-to-know:
    • K-NN
    • K-Means
    • PCA
    • Neural nets
  • Probably unnecessary:
    • GBM
    • SVM

For each method, know:

  • When you would use each method
  • How each method works at a high level
  • Evaluation criteria to use and why (at a minimum, know MSE/SSE, accuracy, AUC, precision, recall, log loss)
  • Bonus: the loss functions in mathematical notation (for particularly difficult interviews)

Other topics to know:

  • Dealing with an imbalanced dataset
  • Bias-variance tradeoff
  • Cross-validation
  • Training vs. validation vs. testing sets: relative sizes, what to do when they are dissimilar

Resources:

  • Interview Questions from Springboard Blog: Most of these questions are not great, but you should know the material.
  • Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani: Really intuitive approach to machine learning methods.
  • The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman: The “big brother” version of the book listed above.The math is beyond the scope of data science interviews, except for the regression section.
  • How to Ace Data Science Interviews: Statistics (as mentioned above under General Resources).

c. Probability

Probability questions typically fall under one of two types:

  • Which probability distribution does this come from? Know the support (e.g. all real numbers) and use cases for each of the commonly used distributions (e.g. Gaussian, Poisson, uniform, exponential). I think companies like these questions because it’s easy to take something from their domain (e.g. number of reviews) and turn it into a probability question.
  • Combinatorics/Bayes Rule/Random variables: Do a few questions like those found in a Probability class and you should be set.

Resources:

d. Statistics (i.e. Statistical Inference)

These are the types of questions you should be able to answer:

  • Describe basic statistics terminology in layman’s terms (e.g. p-value, confidence interval, Type I error, Type II error, R-squared).
  • Be familiar with and able to use the Central Limit Theorem and Law of Large Numbers.
  • List assumptions of various tests/methods and know nonparametric alternatives (e.g. Wilcoxon signed-rank as an alternative to the paired t-test).
  • Know about testing for and correcting for multicollinearity (could also fall under the Data Analytics section).

Resources:

e. Data Analytics/Visualization

If you have been doing analytics for a decent amount of time, you should not actually need to prepare for this section. I am including it here for the sake of completeness.  The format will most likely be answering questions verbally, but may involve cleaning and analyzing an actual dataset on a computer, although the latter is better suited to Data Challenges (see Step 2b: Complete the Pre-Interview).

Data Analytics Things to Know:

  • Initial checks (e.g. checking range of each variable)
  • Preprocessing data (e.g. dealing with missing data, converting categorical variables to dummy variables, discretizing numeric variables)

Data Visualization Things to Know: Visualization questions only come up rarely (but should come up more given its importance).

  • Describe basic types of graphs and know when to use each.

Resources:

  • The Visual Display of Quantitative Information by Edward Tufte

f. A/B Tests and Product Sense

This section may or may not be applicable.  Certain companies value product sense (i.e. understanding which specific metrics indicate how a product is used) more than others, and only certain roles involve A/B tests.

These are the types of questions you should be able to answer:

  • What factors affect sample size for an A/B test?
  • Describe power in layman’s terms.
  • What are the components of the conversion funnel? What might each piece be affected by?
  • Suggestion: Play around with the product as much as possible and think about potential metrics.What is measurable?  Which metrics do you think are related to other metrics? Be as specific as possible.

Resources:

g. R/Python

This is the most difficult section to prepare for.  I have been unsuccessful in finding good practice problems.  You can anticipate writing a function to accomplish a sterilized task (either on a computer or on a whiteboard).

It may help to practice these while someone else watches you code.  I always get nervous when other people are watching me code and my thought process is slightly cloudy.

Resources/Suggestions:

  • PREP: Mnemonic for whiteboard coding steps.
  • Go back to your Intro to CS class, and be able to code the types of problems asked on exams or homework in the language of your choice (i.e. R or Python).
  • Write functions to implement a few basic functions (e.g. mean, left_join, percentile) from scratch without using the built-in functions.
  • Cracking the Coding Interview by Gayle Laakmann McDowell: Personally, I believe this is overkill and did not read this book.This book is geared towards software engineers.

Step 6: Get an offer (a non-step)

You survived the long interview process. Congrats!  Hold your breath and hope there is no follow-up interview from someone who wants to talk to you one more time…

Do send thank you notes after on-site interviews (but not phone interviews).  I really don’t think they make a difference (I did not send thank you notes this application cycle because by the time I remembered, a week had passed), but I would hate for an absent thank you note to be the reason the job is offered to someone else.

Now wait…No news after about three days is bad news, unless you have been told the team is still interviewing the remaining candidates and will make a decision after everyone has been interviewed.

HR will typically either send an email or leave a voicemail asking for your availability for a “quick chat.”  Unless the HR representative say “Next Steps” (and sometimes not even then), you are not guaranteed to be receiving an offer.  S/he could be calling to attain additional information from you, update you on when the team will reach a decision, or even reject you.

Step 7: Accept an offer

You received an offer.  Congrats! Be excited!  But then hold your horses and think.  Might you soon be receiving another offer you would rather take? Who do you need to consult before accepting an offer? What do you not like about the position?  Are any of those non-negotiables?  Do you need to speak to the hiring manager again?

HR likes to disseminate information over the phone as opposed to over email, so be prepared to have a couple calls with the recruiter.  The two of you could become best friends by the time you start your position.  If you are in a position to negotiate your offer, this is the time to do so, which would mean a few more calls with HR.

The unanticipated bonus step: Look for housing

After accepting a job offer, you might think you are done.  I did. And then I realized I would have to relocate, and thus needed a place to live.  I opted for the roommate option because Bay Area housing is among the priciest in the country.  I had six or seven appointments in two days for potential rooming situations, and it was exhausting.  It felt like being interviewed again; I needed the potential roommates to like me, and tried to evaluate the whole package (apartment, location, roommates) based on limited interactions.

 

Final advice/What helped me

  • Think of the interview process as a test. I don’t mind studying really hard for a test, or knowing far more than I need to.  I don’t “wing” tests, so why would I “wing” interviews?

If you don’t like tests, think of the interview process as a game.  The goal is to get the best possible job offer (which means something different for each person).  There are a lot of rules, for example: during each turn you can either keep going along a particular route that might lead to an offer or take a first step towards a new route, your routes depend on both your actions and the actions of other players, each offer is only valid for a certain amount of time, and when you end the game by accepting an offer you never know if any of the other routes would have led to a better offer.

  • Know that applying for jobs is really hard. There are very, very few people who only apply for one job and get it (although I know four insanely lucky people). The rest of us have to apply for fifty positions, to get interviews with five, to get a single offer.
  • Don’t take rejections personally. They are a normal and (large) part of the process.  There were three occasions on which I thought the interview went really well (which rarely happens for me; I almost always think the interview went terribly), but got rejected anyway.  The reality is: there are other factors besides my interview performance that lead to a rejection. Some companies like to interview lots of candidates and only make offers to a few, while others like to interview very selectively and make offers to almost all who interview on-site. Thus, what is “good enough” for one company that gives offers to almost everyone who interviews on-site might not be “good enough” at another that rejects almost everyone who interviews on-site. Sometimes the difference between candidates is very slight and so tiny details push the hiring manager towards one candidate over another. By analogy: when looking into housing, I was really sad to have to turn down roommate options where the “interview” went well and I liked the potential roommate. More often than not, it was a fit situation determined by something relatively insignificant: my commute would be a little too long or the price was a little too high or the pets were a little too energetic.
  • Be realistic. Finding a new job takes time. Interviews get rescheduled, HR can be slow, email exchanges are slightly delayed because I don’t check my personal email while at work.
  • You can find a job at any time of year. I will never understand the seasonality associated with the job application cycle. Recruiters do respond in waves. Sometimes I will hear nothing for weeks, and then hear from three companies in a week.  And companies oftentimes do try to fill a position by the end of the year, although I’m not sure how much the holidays slow down the application cycle. A recruiter once emailed me on Thanksgiving to invite me to an on-site interview, and I once applied for a position on Christmas and was rejected on December 26 (but I assume no human looked at my application). As a stats person, I believe if I don’t understand the underlying model of recruiter response, I should keep applying as normal, regardless of the season.  I would not recommend using the holidays as an excuse to slow down on sending out applications.

That’s all.  You made it to the end of an over-five-thousand words blog post.  Best of luck with your application cycle!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s