So what is “Data Infrastructure?”
According to the Open Data Institute:
Data Infrastructure consists of Data Assets supported by People, Processes & Technology.
Data Assets that encompass both structured and unstructured data. Everything from mainframe-based legacy databases, to your unindexed library of office documents, to expensive data feeds from various data providers, to derivative data products from proprietary models created by your data scientists culled from disparate data sources.
And this unorganized data corpus is only getting bigger and bigger as Data Eats the World – Data being the primary raw material behind every Digital Enterprise.
But do you have a central catalog of these Data Assets?
A catalog that stores exhaustive metadata – data about your data, including:
More often than not, the answer is No.
You need a “Living” Catalog, not just reference shelfware. A “Data Exchange” that is used on a daily basis as a central registry of all your Data Assets, that can be easily integrated into your business processes. That automatically links related datasets based on these metadata.
This Data Exchange needs to transparently integrate with the authentication and authorization mechanisms of your Enterprise. To reflect your organization chart, with granular permissioning so people can securely collaborate and share data with their peers – inside and outside the organization with confidence.
Beyond end-users, it also needs to support developers and multiple suppliers – competitors and coopetitors. All co-creating and embracing an ecosystem of permissionless innovation only possible with an open source platform with a modern microservices architecture.
This Data Exchange should also promote the secure, yet as near frictionless exchange and update of all this data and metadata. It implements pragmatic, practical data governance by allowing the easy implementation of arbitrary workflows to reflect ever changing business requirements and data sharing protocols.
This Data Exchange needs to have an API and a loosely-coupled, microservices architecture that can be easily integrated with the latest best-of-breed technologies.
As the State of the Art of producing and processing Raw Data is ever changing – you need an open platform you can build on, that you can optionally own, not just rent. That leverages your existing IT investments and expertise – not some proprietary, blackbox tool with an opaque development roadmap that locks you in and prevents you from switching it out as required.
Data Infrastructure consists of Data Assets supported by People, Processes and Technology.
This is how we see Data Infrastructure. With the right one, we believe you can Open Data Inside your organization and make your Data Useful, Usable and Used.