The 7 Questions Interview Series: Data and Supply Chain Data Integration
The 7 Questions Interview Series: Data and Supply Chain Data Integration “The 7 Question Series” is an investigative content series where we seek out key leaders in a specific industry and/or subject matter expertise area and ask them 7 key questions that “enquiring minds want to know”. There is a twist however to these questions. We provide the person being interviewed with a hypothesis for each question. This helps to frame and set context for their answer.
Data and Supply Chain Data Integration Series Objectives:
Data and Supply Chain Data Integration Series Objectives: The objective of this series is to establish direct connections with data experts across the globe and ask them the same set of 7 questions regarding data and data integration in the business. We want to derive insights from their direct experiences and expertise that will help companies, both B2B and B2C at all stages of their evolution. We are also curious to see if their answers are similar or different. These interviews will be featured on this website as a series.
The Interview with Brad Anderson, Former Vice President, Big Data Informatics at Liaison Technologies ( –
As former Vice President of Big Data Informatics (April 2014-September 2016), Brad Anderson was responsible for Liaison’s Big Data strategy, leveraging the company’s $250M investment in its world-class cloud infrastructure. Anderson brought a 20 year background in data management, first with enterprise data warehouses and more recently building and using non-relational Big Data tools. In addition to his technical expertise, Anderson founded or co-founded four technology companies across a variety of industries, all leveraging Big Data technologies. He has also worked with MapR Technologies, Ericsson and Cloudant where he built its hosted NoSQL offering based on CouchDB.
Robin Smith: Has the term Big Data been over hyped? There a sense of “Big Data” fatigue, backlash even, that seems to be becoming more prevalent. Is Big Data relevant?
Brad: My feeling is that “Big Data” has been over-hyped, sadly. But it is without question one of the biggest advancements in a generation. As I remember back to the ’93-’95 time frame, when the World Wide Web was coming to the forefront, there were the same backlash/fatigue signals in the market. I love to reference this Times of London article when talking about “Big Data” hype. In a more recent example, I like to point to the “Big Data” predecessor buzz-word, “Cloud,” which has solidified its position in the IT stack and moved out of the trough of disillusionment.
So, while I see vendors rushing to this market and contributing to the hype and therefore backlash, I also see specific use cases being delivered in advantageous and extremely efficient ways with appropriate use of the new “Big Data” tools and systems. Costs are slashed by 90%, capacities for storage and compute are increased by 50-100x. As more successful use cases make their way to the forefront, the march to productivity will continue, and the over-hyped feeling will die down.
Robin Smith: Do you really need data scientists as part of your big data strategy? What are the characteristics required of a data scientist? Does this have implications for our educational systems?
Brad: I’m not convinced that a data scientist is required for any big data strategy, except for the most mature and advanced initiatives. Data scientists can provide immense value, for sure. But there’s a trap in engaging data scientists in these initiatives that can be avoided with some pragmatism. I’ve seen estimates range from 70-80% – the work required to gain insights from big data initiatives – is actually ‘data conditioning’ work and not ‘data science’ work. Data quality, cleansing, harmonization, normalization efforts make up this data conditioning or as we sometimes call it, ‘data janitor’ work. This exceedingly important process is a necessity for the data scientists to finally glean insights from the raw data. While they may get all the glory at the end of the process, I believe the preparation of data is just as important.
As an entire generation of students were educated on coding, leading to an explosion of technology-based breakthroughs, I believe teaching the process of data conditioning will fill a big void in the business world now replete with big data initiatives.
Robin Smith: Is the relational database, the foundation of the data warehouse in the small data world, still relevant in a big data age?
Brad: The relational database is a phenomenal tool, and is the product of 40 years of optimizations, feature enhancements, and integrations. Non-relational technologies need to be deployed carefully, and only when your organization has exhausted all relational avenues of solving the business problem. While the big data components are maturing, they can still be open-source projects that have some rough edges and might not be enterprise-grade. This means they come with their own pain points as you deploy them to solve your use cases. You must weigh the tradeoffs with any system choices.
For CRM, and other transactional applications, I see the relational database continuing to serve as the primary storage and computational backend. Business intelligence is less cut-and-dry, as I see more and more BI use cases being served by big data technology stacks. Polyglot persistence is the new normal, and matching the data shape and query shape of your use cases to the proper underlying backend systems is an important strategic ability. When storage and computing are nearly unlimited with big data technologies, making multiple copies that serve different storage and query needs becomes feasible.
Robin Smith: Given the reputation and organizational risks involved in poorly governed data (privacy, breach, quality), should data governance be a corporate governance imperative? Where should ownership of this risk reside, should company have Chief Data Officers?
Brad: I believe this gets to the immaturity of the new big data technology components. Tools have currently either just been released or are being built to fill in the gaps these systems have versus older, more enterprise-grade systems. The systems we are discussing were developed in environments and for use cases that did not require risk mitigation or compliance, because they were developed with non-sensitve data. Google, Facebook, Twitter, LinkedIn, etc., the stewards of these big data systems, are still not compelled to add governance features to all of the open-source projects they have given the world, and the open-source community has only scratched the surface of what is required. I think this governance process is as important as the technology advantages gained by these new systems. The most activity on this front is happening with solution and service providers making the modifications and integrations to deliver a complete offering of governance and stewardship, while maintaining the scale and performance benefits of the new systems.
We can debate whether the role of a Chief Data Officer is a fad, or part of the new normal. The case can be made for financial and healthcare firms to dedicate a C-level resource and team to the task of data governance and enhance the overall risk management umbrella of the firm. Firms in less-regulated but reputation-based industries may also feel this need. In the near-term, I see building this capability in-house as cost-prohibitive, and vendors must be rigorously vetted.
Robin Smith: Is big data only for big companies with deep pockets?
Brad: In my opinion, this is an absolute falsehood. The types of data processing innovation happening in smaller firms are the epitome of what Christensen describes in The Innovator’s Dilemma, as they disrupt markets and larger competitors from “under the radar.” Big data is not a sustaining or evolutionary innovation. It is revolutionary, and its use has already disrupted the enterprise storage market from a cost-perspective. More advanced use cases involving more data volume, or more computationally-intensive algorithms (machine learning, deep learning) are being productized right now. If anything, small firms are less encumbered with legacy systems, and can do more interesting things with larger volumes of data, often marrying sizable and disparate data sets together to gain competitive advantage.
The open-source nature and cost structure of big data technologies lends itself to nearly free experimentation, and firms often only license from vendors when they have a product/market fit and want to move into production. Also, these technologies mark the first time that I can remember when hardware costs exceed software costs. This is clearly not the case with relational database, to which anyone renewing an Oracle contract can attest.
Robin Smith: How has data changed the way business should look at their systems?
Brad: I have seen fewer and fewer data strategy and architecture diagrams with the ERP system in the center. With the rise of best-of-breed cloud applications like Workday, Salesforce, and others, the traditional ERP system is being fragmented. I now see more data management systems and data integration systems at the center of these diagrams, with data becoming the currency of the business, as it flows in and out of different systems that add value and insight.
To reason about this new data economy and implement a strategy within it, I argue that system choice is not the most important concern, but rather adopting standards-based APIs. These standards will insulate your applications and allow your implementation choice of systems to be swapped out at a later date, for more capacity, functionality, or other compelling and, as of now unforeseen, reasons. This strategy has been proven time and again with standards such as TCP/IP, HTTP, etc, and can currently be seen playing out within the Hadoop community. It will be critical for the coming Internet of Things movement, as well.
Robin Smith: Data ownership and value has become the latest discussion point in the data hype cycle. Has the accounting and legal paradigm changed enough for data to be defined as an asset on the balance sheet and has ownership been clearly delineated from a legal perspective?
Brad: As mentioned earlier, I am viewing data as the new currency of the modern firm. That said, I don’t think that accounting and legal paradigms have shifted enough to account for this. It is still up to the firm maintaining ownership of the data to translate that data into market value and execute a strategy to monetize that value. Accountants and lawyers still recognize positive cash flow as a result of this execution just fine.
I would like to see more contractual (and possibly regulatory) treatment of de-identified data. Great strides can be made, for example, in the healthcare field if researchers could access more data relevant to their cause. Longer, more complete longitudinal patient records, encounters, providers, treatments, diagnoses, prescriptions, etc. being made accessible without violating patient privacy and confidentiality would be a boon to researchers asking more detailed questions and arriving at solutions that could make a material difference in modern healthcare.
Sign up to be notified when the next post in this series goes live. You can fill in the subscribe box on the right side panel at the top of the page, or click the button below. Check out other interviews here.