Robin H. Smith recently interviewed with Kurt Wallace on NewEDI’s Podcast series to discuss Data Integration.
The discussion grew to cover a number of related topics, including:
- ‘Big Data’,
- How data is being used to generate new sources of cash,
- The ‘siloization‘ of data, and
- Clean versus dirty data
See below for a full transcript and link to the PodCast audio and video!
Kurt Wallace (KW) [0:11]: Welcome to Intersections with NewEDI. I’m your host Kurt Wallace, and our guest today is Robin H. Smith, General Manger of Sales and Marketing at Virtual Logistics. Robin, thanks for being with us today.
Robin Smith (RS) [0:24]: Well, thank you Kurt; that was a very formal introduction. I’m the co-founder and co-owner of VL. We started the company back in 1994 with a vision of providing integrated EDI solutions. As we well know, integration in today’s world goes way beyond just the pure EDI play: it gets into web commerce, marketing automation, analytics, and data. Today, obviously, we’ve evolved and we look at data integration across the entire supply chain.
KW [1:02]: Big Data has become a buzzword, but couldn’t it be argued that the role of data is important but very misunderstood?
RS [1:11]: Kurt, you’re absolutely right. I think you have to peel the onion here.
You’ve got the term “big data” which is being bandied about. Big data’s really a marketing term and is a catch all for a whole pile of technologies, both at the database level and at the analytics level. It’s even being used in open data scenarios where people are talking about open big data — which is to me kind of a weird combination.
[1:40] At the baseline of this whole discussion and this conversation is the explosion of data creation that started really around 2005. And a lot of it coincided with robust cloud technologies.
[1:53] But basically data has traditionally been looked at internally within most organizations from a transactional perspective. A lot of that has being driven by the accountants and the financial statements that people look at to run their organizations. More and more data is being used in enterprises to generate new sources of cash. There’s new product creation being done around data aggregation mashups. And I think this is where data needs to be looked at in the majority of organizations.
People need to step out of their comfort zone when it comes to data beyond the transactional stuff and start looking at how information is coming into their organizations and how it’s going out of their organizations. And then, what can they do to aggregate that information?
I’m going to leave you with a little stat — to me when I first heard this it was absolutely amazing: GE jet engines on a daily basis around the world produce more data points than Twitter does in a single year. Think of the kind of information that GE is collecting!
KW [3:00]: Based on your experience how do you see the role of data in business today?
RS [3:06]: I think you’re going to see a change in organizations. And you’re started to see glimpses and sort of the inklings of this, and it’s primarily the Fortune 1000 and Fortune 500 tier currently. But people are starting to look at [data] holistically. They’re starting to look at data across their entire supply chain and the relationships both with their suppliers, [and] their customers. They’re looking at it in their marketing programs. They’re looking at it in the logistics. They’re looking at it in a whole variety of areas. And what they’re doing is they’re realizing that they can mash these data feeds up into analytics or data that could be analyzed to show patterns and to show trends.
[3:48] It goes beyond the two dimensional kind of analytics that companies traditionally have done. So, I think that what you’re going to see is you’re going to start seeing a differentiation in the companies who are stuck in the old school traditional model that data is really just a transaction that goes through my organization, as opposed to those organizations that are using data now, and mashing data up outside data sources to create revenue streams that didn’t exist.
[4:19] I think that the corollary here is what has happened in e-commerce [recently], and to me it’s a harbinger of where things are potentially going.
[4:31] [If] you look at where mobile commerce was five years ago, it didn’t even exist, and today people are using their smartphones to order and to peruse e-commerce sites. This wasn’t possible five years ago simply because we didn’t have the technology at the API level to be able to produce data in ways that could be rendered on smartphones.
KW [4:55]: So, what are some operational considerations that would give companies a better handle on the use of data then?
RS [5:02]: Well, I think the operational consideration is to do that 360˚ and look at how data comes into your organization, and look at how data goes out. Then look at how data moves between the various bits and pieces inside an organization.
[5:17] What you’re going to see is that, as an organizations grows, you’re going to have silolization (if I can use that word, you know it’s not really a word). But in most organization’s data sits in silos and some of it is application-driven, and some of its driven by a line of business function.
[5:38] I think that starting to look at data from an operational perspective and looking at where the overlaps are in that information is going to start to give you some insight: not only in the complexities of how integration between all of these applications and how you move data is going to happen, but it gives you some insight into the weak links and where external information can be applied.
[6:00] The first step is really to identify and realize that data is an internal part of every single organization today and that it can be used.
KW [6:08]: What are the first steps you’d recommend companies take to be more effective in their approach to data?
RS [6:15]: I think it’s a two-step process.
The first step is to make that leap of faith that data is something that your organization collects; I mean every single organization out there collects information. If you start looking at that information as having value then you’ve made the first step in that process.
[6:34] The second step to me is that you need to have a data strategy. There needs to be a data strategy that is defined around the use of data and a cleanliness of that information.
[6:49] In many organizations data is not considered to be valuable, and so lip service is paid to [data] cleanliness. You talk to any integration company out there — one of the banes to our existence is that fact that data is dirty.
[7:06] There’s an amazing stat out there that dirty address data in the United States costs companies on the order of about $600 million annually. And that’s being super conservative — there’s lots of other numbers out there.
[7:24] Dirty data is probably the biggest single issue when you approach the whole notion of aggregating and generating revenue streams. The old adage ‘garbage in, garbage out’ still applies. But I think that’s the second step.
[7:41] Part of the realization [that must happen with companies is] that data’s got value to it. It’s a two-pronged thing. It’s making sure that data’s clean, but also having a data use strategy so that data is viewed as something of value.
[7:53] Which leads me to the point that I think that most companies of any size needs to have a Chief Data Officer. If you think about it in today’s world, we trade on the value of our digital information. And if that digital information is not clean, it’s costing us lots of money. I’m a proponent of having a Chief Data Officer, because I think more and more [this role is] becoming a strategic asset.
KW [8:16]: What are some recommended practices or lessons to take from the examples you gave?
RS [8:24]: Well, I come back to it: I think that cleanliness of the information is probably the most important thing. It’s the piece that’s going to kill any large scale data integration project, but it’s also the piece that’s going to kill you when you start aggregating that information.
[8:43] For example, if your address data within your CRM is not clean, how are you going to do geospatial analytics around the information that is there, and the way that those customers operate, if you can’t plot the data correctly? That’s a very simple example, but that’s the kind of thing that, downstream, has huge impacts.
[9:07] I think the other element that I’ll leave the listeners with is to not to try to do too much. Many projects I’ve seen where people are trying to mash data up — they’re trying to do way too much.
[9:19] Define the projects and the processes that you want to put in place in bite-size packages. Make the process as simple as you possibly can, largely because I think that when you bite off more than you can chew when it comes to complex data sets, you’re going to run into failures. And that’s going to, as you move up the food chain in an organization, create all sorts of push back and negativity.
[9:45] So, keep things as simple as you possibly can to begin with so that you can at least prove to the naysayers in your organization (and there will be many) that data has value.
KW [9:57]: Robin, on a final thought, why and how should companies invest in data technologies?
RS [10:04]: If you don’t invest in data technologies, you can’t take advantage of the data that’s in your organization. At the very least, companies need to have proper ERP and repeat packages.
[10:15] But from there, they need to make sure that those packages are not locking them into closed data sets. I’m astounded still that there are companies selling ERP systems where the underlying structure is locked. Some of [this structure is] in the cloud where I can’t get at the data without having the application provider try to custom code things. One of the things that I look at in any organization, any investment, and in any technologies is what I call the integratability.
[10:48] The integrability is how can I get data in and out of the application easily?
[10:52] In the old days, we used to look at the transaction touch points. Today, you need to look at things like APIs. How robust is an application’s API? Does it talk to the business objects? Can I pull data out that goes beyond the transactional information that is housed? Because then I can now use that data and I can use those hooks to integrate to other organizations and services.
[11:17] It could be, for example, that I want to consume a cloud-based analytics platform, but if I can’t get the data out of my application easily, then I can’t consume that cloud-based public offering. It’s [this] kind of stuff that people need to be aware of: proper investment is really, really critical.
KW [11:34]: Robin Smith thanks for your time today; your website [is] www.VirtualLogistics.ca. We appreciate your time.
RS [11:41]: Kurt, thank you very much for having me on. It was a pleasure to talk to you; these are subjects obviously that we could go into huge depth. I do encourage people to read; we have a great blog at virtuallogisics.ca. And there’s lots of [other] good stuff on there as well. Thank you very much Kurt.