As a data science consultant that routinely deals with large companies and their data analysis, data science and machine learning challenges, I have come to understand one key element of the data scientist’s skill set that isn’t oft-discussed in data science circles online. In this post I hope to elucidate on the importance of domain […]Read more "Domain: The Missing Element in Data Science"
Data products are one inevitable result and culmination of the information age. With enough information to process, and with enough data to build massively validated mathematical models like never before, the natural urge is to take a shot at solving some of the world’s problems that depend on data. Data Product Maturity There are some […]Read more "Insights about Data Products"
This may sound weird, but one sure way to not have perspective about the business in an innovative and constantly changing industry is to bury yourself within regular work. This is the meaning of the title – which comes from a book of the same name. By regular work, I mean work in which you […]Read more "Data Perspectives: “Orbiting The Giant Hairball”"
Being data-driven in organizations is a bigger challenge than it is made out to be. For managers to suspend judgement and make decisions that are informed by facts and data is hard, even in this age of Big Data. I was spurred by a set of tweets I posted, to think through this subject. Decision […]Read more "“Small Data”and Being Data-Driven"
I recently came across this Hortonworks Data Flow presentation where the concept of the jagged edge of real time data analytics is discussed. The context that suits a discussion on this is to me, centred around prioritization of what comes in through the sensors (or other big data gathered from these “jagged edge” sources). This is a […]Read more "The “Jagged Edge” of Real Time Analytics"
The insights we get from data depend on the quality of the data itself, and as the saying goes, “Garbage In, Garbage Out”. The volumes of data don’t matter as much as the quality of the data itself. Data quality and data quality assurance are therefore of growing importance in today’s Big Data arena. With the […]Read more "Quality and the Data Lifecycle"
A number of inferential statistical tests (A/B tests and significance tests) assume that the underlying that we’re comparing come from a normal (Gaussian) distribution. However, this isn’t generally true for a number of data sets in practice. In order to use the tools that assume normality, we have to transform the data (and the limits […]Read more "Johnson Transformation for Non-Normal Data"