What is Data Infrastructure?
Our take on Data Infrastructure

27 Aug

Posted by: Joel Natividad and Sami Baig

Category: Business, Plans, Solutions

So what is “Data Infrastructure?”

Data Infrastructure consists of Data Assets supported by People, Processes & Technology.

Data Assets that encompass both structured and unstructured data. Everything from mainframe-based legacy databases, to your unindexed library of office documents, to expensive data feeds from various data providers, to derivative data products from proprietary models created by your data scientists culled from disparate data sources.

And this unorganized data corpus is only getting bigger and bigger as Data Eats the World – Data being the primary raw material behind every Digital Enterprise.

But do you have a central catalog of these Data Assets?

A catalog that stores exhaustive metadata – data about your data, including:

Security classification (low, medium, high security)
Risks
Compliance requirements
Costs (how much does this Data cost to produce and store?)
Benefits (what reports and decisions depend on this data?)
Access class (who can access the data, both internal and external)
License and Terms of Usage
Provenance (how the Data was sourced and/or produced, by whom (person), or by what (machine))
The current version/revision, and the audit trail of the changes
Tags/Topics/Themes
Downstream processes/apps/reports dependent on the data
Data Quality issues
Descriptive statistics
Data Dictionary
and more…

More often than not, the answer is No.

You need a “Living” Catalog, not just reference shelfware. A “Data Exchange” that is used on a daily basis as a central registry of all your Data Assets, that can be easily integrated into your business processes. That automatically links related datasets based on these metadata.

People

This Data Exchange needs to transparently integrate with the authentication and authorization mechanisms of your Enterprise. To reflect your organization chart, with granular permissioning so people can securely collaborate and share data with their peers – inside and outside the organization with confidence.

Beyond end-users, it also needs to support developers and multiple suppliers – competitors and coopetitors. All co-creating and embracing an ecosystem of permissionless innovation only possible with an open source platform with a modern microservices architecture.

Processes

This Data Exchange should also promote the secure, yet as near frictionless exchange and update of all this data and metadata. It implements pragmatic, practical data governance by allowing the easy implementation of arbitrary workflows to reflect ever changing business requirements and data sharing protocols.

Technology

This Data Exchange needs to have an API and a loosely-coupled, microservices architecture that can be easily integrated with the latest best-of-breed technologies.

As the State of the Art of producing and processing Raw Data is ever changing – you need an open platform you can build on, that you can optionally own, not just rent. That leverages your existing IT investments and expertise – not some proprietary, blackbox tool with an opaque development roadmap that locks you in and prevents you from switching it out as required.

Data Infrastructure consists of Data Assets supported by People, Processes and Technology.

This is how we see Data Infrastructure. With the right one, we believe you can Open Data Inside your organization and make your Data Useful, Usable and Used.

Joel Natividad

Co-Founder at datHere, Inc. | Website | + posts

Open Datanaut, Open Source contributor, SemTechie, Urbanist, Civic Hacker, Futurist, Humanist, Dad, Hubby & Social Entrepreneur on a 3BL mission.

Sami Baig

Co-Founder at datHere, Inc. | Website | + posts

I oversee the design, development, and implementation of innovative data solutions for clients. My expertise in data management, data quality, and data integration has been integral to driving data-driven decision-making. I am passionate about creating a culture of data-driven innovation that enables organizations to stay ahead of the competition.

Tags: data exchange, data infrastructure, metadata, open data

What is Data Infrastructure?Our take on Data Infrastructure