Software supply chain risk is growing, but mitigation solutions exist

In late 2021, a critical vulnerability was discovered within the Apache-Log4j logging tool. This Log4j tool and vulnerability became infamous because it was used by millions of software packages across organizations that had no idea that existed within their software supply chain. Even organizations that develop their own software often leverage third-party commercial and open-source software to support their business services.

Software supply chain risk has emerged as a leading concern for private sector firms and government agencies of all sizes. There is even a legislative effort within the Senate Homeland Security and Governmental Affairs Committee to help secure open-source software. Unpacking this supply chain, and finding methods to estimate and reduce the risk, is a massive problem for a number of reasons.

First, the number of open-source packages and libraries is tremendous. Github, an online platform that manages software for others, hosts over 200 million software repositories. And each programming language uses its own system for tracking software across its ecosystems. Javascript and Python, two very popular programming languages, support over a million packages combined.

Second, very little is known about the extent to which organizations employ these packages. There is no authoritative directory describing which companies use which software components. In fact, companies, themselves, may not even know the breadth of software they use for their critical business operations. One research collaboration between Harvard University and the Open Source Software Foundation has begun surveying companies in order to estimate the prevalence of software use across firms, but so far this only provides a tiny account of actual software in use by companies within the United States.

Third, the tools for analyzing this risk have yet to be built. Software bills of materials (SBOMs) serve as an ingredient list for software applications. SBOMs are becoming increasingly popular and have even been mandated through a Presidential Executive Order. The intention is that an SBOM will enumerate all of the software components required for a given package to function, thereby helping users identify and manage their software risks. However, the actual practice of creating and disclosing them is still evolving. For example, it is unclear how many layers deep an SBOM should expose a software supply chain. Some packages (like Log4j) may have thousands upon thousands of dependencies, and it is unclear whether this much detail is useful or even necessary.

But there may be hope for better understanding of this risk.

First, the data exists to document and map out this extensive network. They are incomplete, and aren’t easy to find, but they do exist. Libraries.io and deps.dev are two community efforts that offer dependency data across multiple programming languages, from which network maps and network analysis can be created and analyzed. Similarly, the package managers of some software languages provide information that could also be used to map out their software ecosystem. Together, these data could fill a massive gap in our understanding of software dependencies. And using standard network analysis techniques, those software components that are most critical to the ecosystems could begin to be identified.

Second, as the practice of creating and using SBOMs becomes more mature, users may become more adept at ingesting the information, comparing SBOMs across applications, and identifying the most risky components. For example, one approach to using SBOMs to visualize risk might be to sort through all the software packages listed in a given SBOM, and collect the known vulnerabilities from each, information that is readily available from the National Institute of Standards and Technology. Each vulnerability could then be plotted according to its impact, using the Common Vulnerability Scoring System standard, and its exploitability, using the Exploit Prediction Scoring System standard, on a graph that allows risk to be more easily visualized.

From there, organizations could visually inspect, compare, and develop strategies for mitigating the risk of one or more software applications.

Software supply chain security has emerged as a leading risk because of the massively fragmented and decentralized nature of modern software development. Unlike other problems in cybersecurity, this is a discrete problem, where the data exists. Information required to map software dependencies or dependencies is knowable because there exists a finite limit to the number of nodes and dependencies. And so, while we still have much to learn as a community about this risk, there are concrete steps we can take to better understand and mitigate the risk.

Sasha Romanosky is a senior policy at the nonprofit, nonpartisan RAND Corporation, an appointed member of the Data Privacy and Integrity Advisory Committee at the Department of Homeland Security researcher, and a former cyber policy advisor at the Pentagon in the Office of the Secretary of Defense for Policy.