BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Linux Foundation: Why Open Data Matters

Following

In the beginning, data was closed. Although we have been ‘transporting’ data between different locations, systems and resources since the time of the first databases and personal computers in the 1960s, our approach to data has been typified by its closedness. Whether we close-off data for reasons relating to security or enterprise privacy reasons, our knee-jerk reaction is always to treat data as a boxed up and zoned in entity that isn’t open to the whole world and their dog or grandmother - depending upon which side of the Atlantic you live on.

But open data exists. This approach to information management and sharing provides a controlled means of making data accessible, editable and sharable by any user that wants to exploit its use. This happens when information is held under an open license.

Open data movement

The open data movement as a whole takes various forms. Among its proponents are the World Bank, the European Data Portal, Data.gov in the USA and the Open Data Institute (ODI) itself. Closely aligned to the interests of the Linux Foundation with its adherence to open source software, the open data handbook states that open data is, “Data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike.”

Why do we want to use open data? Because it is one of the key information sources for open government, open education, open science, open science, open source hardware design and of course open source software. Championing reuse and redistribution, availability and access, plus the drive to encourage universal (meaning worldwide, but let’s not stop at Earth) participation by all persons and groups. Initially founded in use cases that support non-commercial environments, open data has progressed to being used in enterprise environments, sometimes obfuscated and anonymized, but always open.

“Open data is data that the whole world can use - for free - and also expand upon and reuse, or redistribute subject to measures that preserve the data’s provenance and openness,” explained Mike Dolan, GM of projects at the Linux Foundation. “Once refined, with care to protect and secure personal information, open data can accelerate innovation, collaboration, and coordination of efforts. Also, data is expensive to gather, hard to verify and represents an objective truth; so putting this information out into the ‘commons’ [a term used to denote open access repositories such as Wikipedia or Wikimedia Commons for images]and allowing anyone to use it extends the data’s usefulness.”

Open data is proliferating

Efforts are underway at a foundational level to promote open data access. Dolan and the Linux Foundation team argue that its impact will be felt on everything from climate change to agriculture, to goods and services. The question now is, given where we want to get to with open data, where did we start off? Looking at our history of mapping our planet, our track record hasn’t been that good.

“In times gone by, the most accurate map data has long been proprietary and expensive to purchase. With many sources and several competing schemas, interoperability was always painful,” said Dolan. “The physical environment, including every human community in the world, which constantly changes, is too hard for any one organization to map. The Overture Maps Foundation addresses those challenges with open map data as a shared asset that can strengthen mapping services worldwide. All of the data will be open and easily accessible, which will drive innovation by allowing third-party services to be built on top of the data for everything from autonomous vehicles to gaming to the Metaverse.”

There’s more to discover here. Part of the Linux Foundation, the Agstack Foundation (ag as in agriculture) seeks to break open the world of farming and agrarian data, which is rarely public. AgStack contains an open dataset of carefully designed identification values that describe agricultural fields, forests, vineyards and other cultivation and growing zones anywhere around the globe. When combined with AI and computer science, these datasets have been built to help farmers, ‘ag producers’ (as they are sometimes now owned) and governments manage crop production, assess productivity, monitor pests, track foodborne illness outbreaks etc.

openIDL: assurance for insurance

Looking wider for more examples, the insurance industry is now able to tap openIDL (open insurance data link), an open blockchain network for collecting and sharing statistical data to streamline regulatory reporting, yields insights for the insurer and enhance timeliness, accuracy and value for regulators. In 2021, openIDL the project was moved to the Linux Foundation to ensure a true member-premissioned open platform under the LF structured standards and governance model, free from proprietary solutions.

Let’s raise the view even higher and wider then. We all agree that climate change is a global problem with data coming in from everywhere, including factories, mines, power plants etc. But this story is about open data and, truthfully and inconveniently, private sector companies may be loath to release such sensitive information to the public or to researchers needing insight to craft mitigations.

“A new endeavour critical to the Linux Foundation’s OS-Climate project enables all of that [industrial] data to go into one source while companies can control which entities can access each data point. This way, privacy is protected but the data management platform still enables invaluable datasets to inform big picture results of what’s happening,” explained Dolan. “At the same time, a host of efforts are underway to build tools that create a foundation for trusted datasets and open source AI platforms that provide a neutral, trusted hub for developers to code, manage and scale open source tech projects.”

Open data: AI & LLMs

From Dolan’s perspective, he thinks open source has ‘thrived’ by attracting an ecosystem of users and contributors to a shared base layer of software. His naturally upbeat tones (he’s GM of the Linux Foundation, after all) lead him to suggest that open source has become the ‘plumbing’ that everyone in the software industry and its associated data streams can build something from. It also prevents the unnecessary associated waste of everyone working to rebuild the base layer, which we might witness in proprietary software projects.

Digging down to open data specifically, the team say that open data will have a similar impact over time in the world of Large Language Models (LLMs) and Machine Learning ML). This is in no small part due to the fact that although many AI tools have gone mainstream, the process to train such tools like LLMs can burn considerable time and energy. Add this reality to the fact that the costs of collecting and validating data remain high. We know that the data chosen to train language models is extremely important for building trust, removing bias and promoting fairness. A key driver of trust for open source was that anyone could build it themselves.

“Today, there are a growing number of high quality open data collections for training LLMs and other AI systems. Sharing well-trained and tested AI models openly will minimize waste in energy and human resources while advancing efforts to deploy AI in the battle against poverty, climate change, waste, and contribute to quality education, smart cities, electric grids and sustainable, economic growth etc,” said Dolan. “To achieve all that can be achieved, the use of open data must be done ethically. Private information needs to be protected. Data governance needs to be protected. Open data must be transparent top to bottom.”

The World Bank elaborated on data openness in August on an Open Data Toolkit, and the United Nations recommended in 2017 appropriate PII protections throughout the life cycles of data sets.

An open journey

As a working example, we can see the type of journey open source and open data goes on if we open up (pun intended) to how the Linux Foundation's financial foundation group was formed and came to be. The FinTech Open Source Foundation (FINOS) is described as a community of software application developers, technologists and industry leaders from the world’s largest financial services and tech firms, collaborating on a wide range of open source projects.

"This part of the Linux Foundation was formed in 2018, but it can trace its interesting history back through a series of shifts and events that we have witnessed since the millennium, if not before," said Gabriele ‘Gab’ Columbro, executive director, of FINOS & general manager at Linux Foundation Europe. "The initiative started as Symphony, an organization formed to facilitate secure messaging and collaboration for the world's largest investment banks i.e. if you will, some of the most closed-of-the-closed organizations possible."

Although these investment banks - and we all know the names here from Morgan Stanley to Goldman Saches to UBS to JPMorgan and so on - could already perform a degree of secured chat and collaboration via their use of Bloomberg, the industry recognized that this system was proprietary, not inexpensive and essentially closed. Given the tightening of margins that banks of all kinds have experienced through the mid-2010s, a desire to investigate secured open source technologies was perhaps inevitable.

"The stars were aligned for this to happen, yes," enthuses Columbro. "Where Symphony existed as a commercial tool, the Symphony Software Foundation was formed as an open initiative. From there the project developed to become FINOS in 2018 and it was subsequently brought under the umbrella of the Linux Foundation in 2020. Because I've been involved with the history of this work and seen the project evolve from Symphony days, I can say that we've seen a progression in financial industry open source in the last 10-years that is redolent of what happened with open source's adoption in general a decade before that."

But can we really convince financial investment institutions (and other banks) to adopt open technologies in a world where insider trading, ransomware and other pitfalls exist?

"If you want to bring these commercial entities into open source, you have to lead with business value first, that way they can see where their time, resources and other commitments will go," said Columbro. "In terms of whether we have now facilitated open data exchange at this level, it's a continuing labor of love. Financial institutions are obviously under huge regulatory pressures and every company or organization uses different data formats, so there is work to do. That work sees FINOS working to establish standards so that can we build fabrics, layers and channels to exchange data and all work more efficiently and sustainably in future."

Our open future?

There’s a lot of upbeat positivity here in the general push for open source, open data, open hardware, open repositories and open design in general. But open source isn’t perfect i.e. many projects suffer from a lack of contributions, a lack of support for project ‘maintainers’ (the key proponents that often sit at the head of projects taking on more responsibility and work than they fairly should) and sometimes a lack of formalization where open code is used in mission critical enterprise environments that really should have an extra dose of robustness applied to them.

Deliberate naysaying aside (well, we need balance and moderation in all thighs, right?) the world would arguably be a worse place without open technology, so let’s get involved and keep our minds, secured codebases and arms open.

Follow me on Twitter or LinkedIn

Join The Conversation

Comments 

One Community. Many Voices. Create a free account to share your thoughts. 

Read our community guidelines .

Forbes Community Guidelines

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's Terms of Service.  We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain:

  • False or intentionally out-of-context or misleading information
  • Spam
  • Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind
  • Attacks on the identity of other commenters or the article's author
  • Content that otherwise violates our site's terms.

User accounts will be blocked if we notice or believe that users are engaged in:

  • Continuous attempts to re-post comments that have been previously moderated/rejected
  • Racist, sexist, homophobic or other discriminatory comments
  • Attempts or tactics that put the site security at risk
  • Actions that otherwise violate our site's terms.

So, how can you be a power user?

  • Stay on topic and share your insights
  • Feel free to be clear and thoughtful to get your point across
  • ‘Like’ or ‘Dislike’ to show your point of view.
  • Protect your community.
  • Use the report tool to alert us when someone breaks the rules.

Thanks for reading our community guidelines. Please read the full list of posting rules found in our site's Terms of Service.