Data scientists are new age explorers. Their field of exploration is rife with data from various sources. Their methods are mathematics, linear algebra, computational sciences, statistics and data visualisation. Their tools are programming languages, frameworks, libraries and statistical analysis tools. And their rewards are stepping stones, better understanding and insights. The data science process for […]Read more "Hypothesis Generation: A Key Data Science Challenge"
I’m given to spurts of activity on Quora. Over the past year, I’ve had the opportunity to answer several questions there on the topics of data science, big data and data engineering. Some answers here are career-specific, while others are of a technical nature. Then there are interesting and nuanced questions that are always a […]Read more "Quora Data Science Answers Roundup"
As a data science consultant that routinely deals with large companies and their data analysis, data science and machine learning challenges, I have come to understand one key element of the data scientist’s skill set that isn’t oft-discussed in data science circles online. In this post I hope to elucidate on the importance of domain […]Read more "Domain: The Missing Element in Data Science"
Being data-driven in organizations is a bigger challenge than it is made out to be. For managers to suspend judgement and make decisions that are informed by facts and data is hard, even in this age of Big Data. I was spurred by a set of tweets I posted, to think through this subject. Decision […]Read more "“Small Data”and Being Data-Driven"
In my earlier posts on hypothesis testing and confidence intervals, I covered how there are two hypotheses – the default or null hypothesis, and the alternative hypothesis (which is like a logical opposite of the null hypothesis). Hypothesis testing is fundamentally a decision making activity, where you reject or fail to reject the default hypothesis. […]Read more "Power, Difference and Sample Sizes"
Businesses are increasingly beginning to use data to drive decision making, and are often using hypothesis tests. Hypothesis tests are used to differentiate between a pair of potential solutions, or to understand the performance of systems before and after a certain change. We’ve already seen t-tests and how they’re used to ascribe a range to […]Read more "Hypothesis Tests: 2 Sample Tests (A/B Tests)"
If you were to walk into a restaurant and order a cup of coffee, you’d expect to get a standard cup of the stuff, and you’d expect to get a sufficient quantity of it too. How much coffee for a certain price is too much, and how much is too little? More realistically, when you […]Read more "Confidence Intervals and t-tests in R"