In the ever-evolving technology landscape, data analytics and data strategy continue to play a larger role in economics and business models. Director of the Center for Applied Artificial Intelligence at the University of St. Thomas, Dr. Manjeet Rege, co-hosts the "All Things Data" podcast with adjunct professor and Innovation Fellow Dan Yarmoluk. The podcast provides insight into the significance of data science as it relates to business models, business economics and delivery systems. Through informative conversation with leading data scientists, business model experts, technologists and futurists, Rege and Yarmoluk discuss how to utilize, harness, and deploy data science, data-driven strategies, and enable digital transformations.
Rege and Yarmoluk spoke with Sean Owen on Cloudera solutions, big data, and data science skills. Owen is the current Director of Data Science of Cloudera, a hybrid data cloud. Before Cloudera, Sean founded Myrrix Ltd (now, the Oryx project) to commercialize large-scale real-time recommender systems on Apache Hadoop. He is an Apache Spark Committer and co-authored Advanced Analytics on Spark. He was a Committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google.
Here are some highlights from their conversation.
Q. How would you describe the difference between data science and big data?
A. These are big terms, I think they mean different things to different people. I tend to think of big data as a movement that started right after the ".com" bust. Early 2000s, when the availability of data increased dramatically with the rise of the web and then mobile. Suddenly there was a huge amount of data being generated that one could collect. It also began to get cheaper and cheaper to store data. So big data was a name for this phenomenon. We suddenly went from a data scarce world to one where you could collect as much data as you cared to. Data science is obviously a mixture of data and statistics, as well as engineering and computer science. And it's necessary these days because you can't really separate the software issues from the analytic issues. When you are doing analytics today you are working with software. So those words have come together and I think data helps to propel those worlds together.
Q. How does Cloudera differentiate itself from other hybrid cloud systems?
A. We like to present Cloudera as an enterprise data hub. It is a big generic platform. It is a place to store data, process data, and secure it to do analytics and machine learning. It's a big Swiss army knife. We are looking to help you solve your problems. I think Cloudera offers more scales on its platforms compared to competitors. Cloudera has made better decisions about what to center around the core of its platform and what packages to surround itself with.
Q. What is the market needing to do to harness the power of big data?
A. Let's think about the ingredients there. We are going to need data and we are going to need software and some skills and then we need to figure out what to do with it. Two of those elements are pretty easy, software is free and computers are cheap. I think data is one of the remaining differentiators in this new era of big data and data analytics. What differentiates Company A from Company B is who has better data and who is better organized about data collection. One thing that can't hurt anyone is investing in collecting data intelligently. You have to have a purpose too. Data by itself just sits there. It has to be mined and interpreted to have real value.
Listen to their conversation here: