Crosspost: Some Hard Truths About Becoming a Data Scientist

I’ve spent a couple of years in a data and analytics startup, that has a consulting focus. As I’ve said elsewhere on this blog, the background in engineering and quality data analysis led me (now it seems inexorably) to this interesting role as a consultant with a focus on data and analytics. While I’ve worked on several solution development activities, my primary mandate in the organization is as a business consultant and a data scientist with experience in specific industry areas, such as manufacturing. Over the last couple of years, I’ve spent a significant amount of time doing data science, working with other data scientists, and leading and mentoring data science professionals in projects. I’ve also had to conduct numerous interviews (I’ve lost count) of engineers and non-engineers who are interested in breaking into the data science world. And with good reason – after all, data scientist careers received a lot of (sometimes undeserved) hype recently. Some of my insights below on becoming an effective data scientist were published as a Quora answer originally – but in this blog post, I hope to expand on that answer, and provide a bit of a guide for those charting out data science careers. So, here we go.

What are some hard truths about becoming a (good) data scientist?

  1. Your higher degree matters, but much less than you think. If you have a degree such as a PhD or a Masters in a specific area such as machine learning and computer science, you will do better at data science than many others who don’t have such credentials. If you have a PhD or Masters in a technical field that didn’t involve much data analysis, you’re likely to not be a great fit without acquiring new skills. However, the degree can only take you so far, as you have to be cognizant of the frameworks and technologies often used for doing data science, and constantly learn from them.
  2. Stay away from data analytics specific masters programs – or evaluate them very critically. In my experience, a number of these programs don’t teach what they claim to, and many are overpriced. The latter is in fact a big reason why I wouldn’t recommend a data science specific higher degree to anyone right now, especially given there is no dearth of MOOCs or such resources. If you come in with significant experience, you may be actually be diminishing your profile’s worth by studying in such a program (much like what certain MBA programs do to successful functional experts’ careers)
  3. Business experience counts a great deal in data science. Domain knowledge does too. If you have neither, expect to spend a good deal of time learning about a specific domain or requiring a subject matter expert to work with you.
  4. Hypothesis generation and validation are more important than you might imagine. Frameworks, tools and software are only one aspect of a data scientist’s work. You have to be able to think of business-relevant hypotheses and ideas based on the data, and ask hard questions. If you’re unable to do this, regardless of your degree, or your knowledge of this tool or framework or that one, you’ll not be a successful data scientist.
  5. Ignore the basics to your own peril. Many data scientists are hired without truly testing their knowledge in numerical analysis, linear algebra, optimization and machine learning. Numerous data scientists are also guilty of not checking underlying assumptions of algorithms, or making assumptions about their data in other ways. Few data scientists really understand computational engineering and how optimization is used in machine learning, to the point that they’re able to build algorithms on their own. If you want to stand out, make sure your basics in these areas are solid. It may mean going back to the books often, but it is rewarding and worth it, ultimately.
  6. Communication and presentation skills matter a great deal. Being a good data scientist also means having great communication and presentation skills – without which you’ll be a fish out of water, building models and systems but not able to stand up for why they work, and without being able to explain their benefits.
  7. The “ideal data scientist” unicorns are truly mythical. Data scientists who have the required domain experience, and have sufficient mathematics/statistics, programming and communication skills – these are unicorns, and you’ll rarely find someone who checks all the boxes. So if you’re looking to become a unicorn, expect to put in significant effort, time and energy in keeping yourself up to date.
  8. Prototyping is central to data science work. When you’re building models, more and more models will be throwaway models and prototypes, and a few will perform well enough with training and new data – often times, the only way to make your model perform better is knowing the domain well.
  9. Data platform understanding is more important than you might imagine. Without an understanding of data platforms on which data science is done, there is little chance of being a successful data scientist. You need to have sufficient knowledge of databases, query languages, data storage, management and governance, distributed databases, and so on.
  10. There is still a talent crunch in the data science world, but perhaps not for long. More management teams have come to prime their expectations on how to do data science and what to expect from data scientists. Additionally, data science frameworks and tools are being democratized, and many people are learning and skilling up on the job. This means that management teams have solved the data scientist skill shortage problem we used to hear about a lot in 2016, by using in-sourcing to a large extent.
  11. AI and knowledge modeling are key and underrated areas of data science. Because a number of companies that use data in company systems are also looking for expert systems, knowledge based systems are making a comeback – many of these are newfangled versions of old rule based systems that learn from data, and use different kinds of knowledge representations. AI is more closely related to data science than many people care to reason, and I think they will merge as a discipline.
  12. A static skill set will get you nowhere in data science. Perhaps a repetition, but worth repeating, if you’re an aspiring or current data scientist. Unless data scientists continue to learn new skills, new methods of analysis and new frameworks, there’s a very low chance of continued success and satisfaction at work for data scientists.

Note: This is originally an answer to a Quora question on becoming a data scientist.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s