April 30, 2023

Commit CCO Nathan Cayzer and MD&CRO Max Nirenberg - in a special interview about Big Data trends for "InsideBIGDATA"

2022 Trends in Big Data: The Data Marketplace Evolution

December 7, 2021

As 2022 beckons, the big data ecosystem finds itself in a transitional state of flux that may very well redefine everything you know—or thought you knew—about it. The cloud is still its unambiguous centerpiece, but is moving ever closer to the edge. Artificial Intelligence is still its media darling, but may soon yield that spot to quantum computing. Data fabrics are more prevalent than ever, but so is the rise of the data mesh concept.

The one constant in these collective and individual motions is data themselves. Data’s valuation to the enterprise is dearer than ever as, according to Indico Data CEO Tom Wilde, “The reality is that every company in the world now is a data company. I don’t care if you’re a trucking business, or pharmaceutical, or insurer. You are a data company, whether you like it or not. And, the extent to which you get a handle on your data will play a huge part in your competitiveness in the future.”

Taming organizational data (and big data, in particular) for the coming year will involve firms helping themselves to the numerous opportunities the aforementioned approaches deliver for processing, analyzing, storing, and integrating big data. That much is clear.

What’s somewhat surprising is the end result of adroitly managing big data with these leading capabilities. The emergence of a data marketplace, facilitating a free flowing exchange of big data within and across organizations, is swiftly becoming a reality aided by composable data management and its technological underpinnings.

“In certain cases it is not so much buying and selling; it’s more about binding [data],” Saptarshi Sengupta, Denodo Director of Product Marketing, acknowledged about this trend. “But then, there are cases where it is buying and selling data.”

The Data Marketplace

The ascendance of a data marketplace, which typifies the consumerization of big data with parallels to other marketplaces such as Amazon’s or Reuters’ for financial companies, has been a longstanding ideal. It’s finally coming to fruition because of the subsequently described approaches for data fabrics, data meshes, data service layers, active metadata, and edge computing. At its finest, a data marketplace is an unexampled opportunity for monetizing data for like-minded consumers. “It’s the enterprises: the Fortune 100, the Fortune 500,” disclosed Purnima Kuchikulla, Privacera Director of Customer Success. “These guys are already leading the data marketplace. They want to sell data. They’re sellers; they’re buyers; they buy and then want to sell it again.”

Whether the exchange of data is for direct monetization purposes or for inter-departmental use cases between business units, the more data organizations have groomed for this purpose, the more advantageous it becomes. “On top of all this they’re selling datasets; they’re not selling one table of data,” Kuchikulla specified. “They’re selling it as a dataset that’s part of a domain.” As Sengupta posited, those dataset exchanges can also be between different domains in the same organization. He described a university system that’s implemented a “decision support system” in which the school “has a bunch of different campuses and from those campuses there’s faculty members, staff, students, and everybody’s looking at data. That data can be about books, libraries, course offerings, registration, enrollment, etc. It’s more like a data consumption model through a particular website or portal.”

The Data Mesh

Conceptually, a data mesh is an architectural approach that is both similar and assistive to an enterprise data fabric, which Gartner termed the top strategic trend for 2022. The latter is a holistic means of connecting all data throughout an organization, regardless of its location, so they’re accessible on demand. Despite the sundry of implementation approaches, several competencies have emerged for defining a data fabric. “There’s a data catalog competence, an active metadata competence, the semantic layer, all the data integration materials, data preparation, etcetera,” Sengupta enumerated.

A data mesh builds on this distributed architectural approach by including domain specific information about data’s creation, storage, and cataloging so it’s applicable to users across domains. “It gives you some level of persistency and storage of where your data can reside, but it’s not set in stone,” explained Calyptia Co-Founder Anurag Gupta. The domain specific attributes of data meshes address semantic differences for cross-departmental use while provisioning governance measures for exposing data. Meshes are frequently overseen by centralized teams. According to Gupta, “A mesh is almost representative of your central nervous system where all your data is sitting in this actionable manner ready to be sent to various end destinations.”

The Data Service Layer

Having decentralized data assets uniformly connected and controlled for delivery to multiple locations (and users) is arguably the definition of a data marketplace. Nonetheless, this paradigm, nor that for data fabrics and data meshes, wouldn’t work without what Commit Chief Customer Officer Nathan Cayzer called a “service layer”. With an obvious allusion to the cloud’s Service Oriented Architecture, real-time service layers are instrumental for delivering data to end users within and across organizations. “A real time serving layer allows you to materialize responses in real time or close to real time to the end user,” Cayzer mentioned. Such service layers either support, or are in turn supported by, the following data management constructs:

  • Data Delivery: The instantaneous visibility into data that serving layers provide can present the right data for the right action. “In finance or banking it lets you get a real-time snapshot of current activity in a trading house instead of having to wait for [batch jobs],” Cayzer pointed out.
  • Data Lakehouses: Primarily implemented in cloud environments, data lakehouses amalgamate the best facets of data warehouses and data lakes to incorporate formal mechanisms for data governance and semantics “to put all the different sources of data, whether it’s structured, semi-structured, or unstructured, together so you can run your ETL aggregation queries’ code and serving layer to customers all in one place,” noted Commit Chief Revenue Officer Max Nirenberg.
  • Super Databases: The cardinal advantage of this instrument is “we’re talking petabytes of data and this can consolidate multiple use cases into one single database: from OLTP, OLAP, analytics, search, and more,” Cayzer said. “It’s more efficient instead of being distributed across multiple databases and machines.”

Active Metadata

Gartner has embraced the notion of inverting metadata’s value from passive data lineage deployments to low-latent action in production settings. In some instances, this functionality entails organizations “using metadata to do some sort of AI or ML,” Sengupta remarked. “You basically look into your metadata and your log files and turn it into AI and machine learning so you can recommend what type of activities will come out of that.”

Sometimes doing so involves determining the best way to integrate data. On other occasions, this capability includes “dynamic tagging that represents metadata for how data flows from an edge device to, say, your data mesh,” Gupta denoted. “This metadata is vital because it can represent important factors like a team and what team owns what slice of data. With privacy concerns growing, you want to make sure that slice of data is under the proper compliance and governance.”

Edge Infrastructure

The ability to readily exchange low latent data at the cloud’s edge (like weather data, traffic updates, or manufacturing developments) within a data marketplace broadens its enterprise worth. Doing so hinges on “bringing compute and storage infrastructure to the edge to enable the infrastructure for the post-cloud world,” Cloudian CTO Gary Ogasawara specified. Although edge deployments typically transmit some data to centralized clouds, growing use cases for this architectural model include:

  • Video Streaming: From security use cases to contactless shopping, video streaming is becoming pervasive. It typically relies on cognitive computing to filter out images of normal operations for security videos, for example.
  • Fraud Detection: Enhancing payment fraud detection in physical locations via edge processing “benefits the end user and the provider by doing this in real-time,” Ogasawara observed.
  • Personalization: In retail settings, edge processing creates opportunities for personalizing customer experiences in brick and mortar locations, “like how ecommerce is on Amazon,” Ogasawara divulged—which is lucrative in a data marketplace.


Developments in a data mesh, data service layer, active metadata, and edge computing enhance big data management with granular controls for disseminating data, on request, in real-time. Sometimes that delivery encompasses selling data within the data marketplace, a concept that’s expansive enough to include exchanging data between departments for timely action, too. As far as their interrelation, however, these developments are derived from the composability tenet at the foundation of adaptable business resilience—and capitalization—for the years to come.

Composability is a modular approach to designing the foregoing inputs because “organizations are realizing it is unrealistic to have a single enterprise standard for data and analytics,” reflected Franz CEO Jans Aasman. “In 2022 and beyond, companies will embrace a lego-like approach to analytics and AI solutions where… [they’re] used in multiple, different applications to connect data insights to business actions across the enterprise.”

Read the full interviewDownload Now