How do you define data science? How different is it from analytics or Big Data?
In simple terms, data science is about applying a scientific method to solve business problems. If you looked back ten years ago, we’d have statistics or descriptive analytics, and we are using the same tools today for what we call data science. The big difference is the abundance of data and specifically, labeled data. The other difference is now we have the compute power to allow these data models to be retrained – statistical models tend to be static and a machine learning-based model, as the data changes over time, retrains itself and stays continuously refreshed. When you start applying ML, you are creating a self-fulfilling prophecy where you are going to change the data because you are going to do things differently. Big Data plays into this from the fact that now you have a lot of labeled data and it is more than Hadoop, which is just one of part of the eco-system. Now you have new methodologies leveraging Spark for these types of analysis, which are more efficient.
Why should enterprises embrace data science?
There has been a lot of business journal studies on this where they show that companies, which leverage statistical methodologies are much more efficient in achieving their business goals compared to their peers who don’t. They may not outperform their peers when they are at this descriptive analytical stage, but they are consistent in hitting their business goal. However, when you start applying AI and ML, the research shows that those companies significantly outperform their peers that who don’t. When you leverage these tools to get new insights it also helps to drive a change in business models.
Is data the new oil?
I don’t like to think of data as the new oil because that limits your thinking. When you consume oil or any other commodity, it is done. But, when you consume data, you make it more valuable. And when you bring different data types together and start applying analytics, you get insights, which is also new data from data.
Data is really the driver of the fourth industrial revolution, which is fundamentally changing the way we work. Unlike the previous three industrial revolutions, which happened region by region, 4IR is a global one, changing how we operate, types of work and how efficiently we do it.
Data science is still like a dark science to IT leaders. What are you doing to demystify this?
There are many companies where the data science programme starts in the CIO office and most CIOs understand it. However, I think the deeper issue is the underlying foundation. CIOs want to move towards containerisation. They want to build a cloud environment, and in some parts of the world they can’t use the traditional public cloud; it is important for them to bring these container platforms to the private cloud. So, we built something called IBM Cloud Private for data which enables a cloud-like environment on-premise, built on containers and Kubernetes. That helps you to get your data foundation and infrastructure into the cloud environments, and it also has a governance layer, which is important because that makes it much easier for you to find your data.
Once you have these components in place, then you start talking about what business outcomes you want to drive with your data without talking about data science or ML. I will give you an example at IBM, where I am responsible for internal transformation as well. When I joined IBM two years ago, one of my peers came to me and said, ‘we have a problem – we have subscriptions we don’t know when companies are going to renew.’ We never talked about data science, but about the outcome he wanted to drive, which was to reduce the churn on support and services subscription. We ended up building a predictive model which not only tells them which
Should data science be part of IT or a stand-alone department?
My opinion is that the ultimate goal should a hub-and-spoke model where you should have an enterprise data science function and at business unit leaders’ level as well because you need people who truly understand the needs of the business. Now, you might ask why do you need an enterprise function? It is because there is a gap in how companies approach data science today. You need a central group to build a data science strategy to understand what decisions your company is making and assign a dollar value to those decisions. Secondly, as an enterprise, you need to have guardrails on what tools people can have – everyone is using open source tools and they should. But, there are some open source packages that you don’t have a license to use or some of them have worms and viruses. This is why you need a central group both from a strategy and toolbox perspective. However, it shouldn’t be an ivory tower exercise.
Discussion about this post