A decade ago, Microsoft looked very different from the Microsoft we see today – it has been a remarkable transformation. One of the areas where MS have made a big push is machine learning and data analytics. Although the CRAN repository is going strong with >10,000 packages as of today, the MRAN repository (Microsoft’s Managed R Archive Network) is adding libraries and functionality that was missing from the R stack. Ever since they acquired Revolution R, they’ve also integrated R into their data science and ML offerings in a big way. For instance, Power BI comes with the ability to write R scripts that can produce visualizations for dashboards. They’ve come out with a number of products that add to or complement the Office suite that is the bedrock of Microsoft’s software portfolio, and of late, they have pushed Azure and their own machine learning algorithms in a big way. A year is a long time in the world of big data and machine learning, and now, on Azure ML Studio, most people just interested in big data and data science can get started with data analysis in a pleasant, user friendly interface.
I have had the chance to play around a little with Azure ML, and here are what I find to be some of its strong points. Above you can see a simple data processing step I set up within Azure ML Studio – to take a simple data set and subject it to some transformations.
It is possible to summarize and visualize this data pretty quickly, using some of the point-and-click summaries you get from the outputs of the boxes in the workflow.
What’s nice about this simple interface is the ability to view multiple variables into one view, and explore a given variable in different ways. Here, I’ve scaled both axes to a log-log plot, and am able to see variation in the MPG values for the sample data set in question. Very handy when you want to quickly test one or two hypotheses.
What ML Studio seems adept at doing is bringing together R, Python and SQL in the same interface. This makes it particularly powerful for multi-language paradigm data analysis. True to this capability, you can bring in an R kernel for doing data analysis. Sure enough, you can use Python too (if you’re like me, you use Python and R almost equally).
Once you have a Jupyter Notebook opened up, you can perform analysis of all kinds in it – everything available with Open R is apparently supported within Azure ML Studio. The thing about Jupyter notebooks, of course, is that you can’t yet use multiple kernels in the same notebook. You can use either R, or Python, or Julia, for instance, and that language choice is static within a given notebook. There is a discussion around this, but unsure if it has been resolved. Although R support in Jupyter notebooks is a little sketchy, seasoned R coders can use it well enough. The REPL interface of R Studio is a bit nicer (and harder to get away from, for me personally) compared to Jupyter for R programming, but it does work well, for the most part. Kernels are managed remotely and abstracted away from the user, so there is no need to SSH into a Jupyter server and so on. The data analysis can start right away this way, because the distractions are gone.
One bug I seemed to run into, is the inability to change graph sizes with the standard par() and mar() commands. Other than that, graphs render well enough within Jupyter. Building models is easy enough in R as it is – so many packages provide a very simple interface. Doing this in Jupyter therefore is no different – a breeze as usual.
Overall, with Azure ML Studio, we’re looking at a mature web app for doing machine learning and data science, that is user friendly and provides some amount of interactivity and code that can be integrated right into the workflows, which is quite a coup, in my opinion. For prototyping and doing exploratory data analysis, this may produce a good repeatable workflow that can be easily understood by others and shared.
- The interface is great – it brings together notebooks, data sets, models pre-trained by Microsoft, and so on, together in one nice interface.
- One value addition in the interface is the ability to separate out different contexts very clearly. You can clean data with a certain part of it, organize your dataframes with another, and so on.
- The drag and drop functionality is actually pretty good and works conveniently for those interested in mixing code with a visual interface
- The Jupyter notebook integration is sketchy with R (more an issue with Jupyter than Azure ML studio, in my experience) – but works well enough for most things to do with data frames and simple visualizations.
- In addition to what we saw in the notebook, there’s also the possibility of directly embedding R code into the ML Studio workflow as a cell.
Hope you liked this little tour of ML Studio. I enjoyed playing around with it!