At the end of each year, we see articles covering how many vulnerabilities were disclosed the prior year. Because the articles are written about the same time of year, it gives a fairly good initial comparison from year to year; at least, on the surface. This is the foundation of statements such as “Security vulnerabilities in critical infrastructure up 600%”. My company, Risk Based Security, even includes that general type of observation in our vulnerability reports, with caveats. These sensationalized counts and figures are also often used to make claims that one product is more or less secure than another, when the vulnerability counts cannot typically be used for such claims as they are built on severely incomplete data. In reality, we must remember that such numbers are only a snapshot in time and serve as a quick comparison between years, not much more.
Before we get to the “moving target” topic, we need to cover a bit of background on how all this happens.
First, consider that even with a large team doing vulnerability aggregation, there is a limit to the number of sources that can be monitored. While a team might monitor over 4,000 sources on a daily to weekly basis, we know there are a lot more out there. As new researchers create their blogs, older vendors finally create advisory pages, and new vendors pop up, the new sources are growing at an incredible rate. Additionally, consider that there are over a million results for “site:github.com changelog.md” (not to mention variations like “release_notes” or “changelog.txt” and similar) that could potentially host a trove of vague vulnerability mentions. Even more daunting, back in 2010 GitHub was hosting 1 million repositories and now they are over 100 million. That means there are an overwhelming number of bug trackers, pull requests, and over a billion commits on a single site. Any company that claims to monitor all of that or “millions” or sources? Do your due diligence and be ready to walk away from them.
Second, due to available resources, vulnerability aggregation teams have to prioritize their activity. This is usually done by vendor, product, and the time frame where higher deployment vendors and products get the most attention. With “time frame”, emphasis is placed on the more recent vulnerabilities as they are most likely to be relevant to organizations. Moving past that, a vulnerability intelligence (VI) provider must be working with clients to learn what resources they use in their organization, as it allows them to prioritize and ensure that they are covering exactly what is deployed first and foremost. After all that, as time permits, they have to come up with new ways to expand source coverage without compromising quality or speed.
With that in mind, consider a vendor that finally publishes a changelog or makes their bug tracker open for everyone. While a VI team should love to go through such resources as far back as possible, they have to limit themselves to vulnerabilities for the current year, and then some amount of time farther back in case clients are using older versions (especially for third-party dependencies). Time permitting, the team should then go back even further to dig for older and more obscure vulnerabilities. While these may or may not immediately benefit clients based on the software they are running, it does contribute directly to the vulnerability history of a given product or vendor. This is invaluable in determining the “cost of ownership” for a product and is vital to making a decision between multiple vendors offering the same type of solutions. With all of that data, it is trivial for a VI provider to provide a quick and easy-to-understand picture of that cost.
Even with very limited time to dig that far back into sources, the impact can still be seen clearly. In January of 2013, Risk Based Security’s VulnDB team had aggregated 8,822 vulnerabilities for the 2012 calendar year, and CVE covered only 4,765 of them (54%). Compared to the prior year (7,911 in 2011), we could say that disclosures increased around 10%. The next question we must ask is if those numbers aged well and hold true today.
Looking at VulnDB now, there were 10,856 vulnerabilities disclosed in 2012. So in the past eight years, the team has managed to find an additional 2,034 vulnerabilities disclosed that year. That means comparing 2012’s updated 10,856 count with the older 7,911 count for 2011, the percent increase was closer to 28%. But wait, we can no longer use the 7,911 count for 2011 either, since that too is a moving target! Ultimately, as previously stated, these disclosure numbers are only good as a snapshot in time. Depending when you perform the count, you may find wildly varying results that could heavily bias any conclusions you try to make. Do the people writing the statistics you read and cite disclaim that?
In January of 2013, I started taking counts every month for how many vulnerabilities VulnDB aggregated for the 2012 calendar year. Almost eight years later, and this blog and chart shows just how much that number can change. As with all vulnerability statistics, please make sure you fully understand what they really mean and disclaim as needed!
With this visual we can see that in the years after 2012, vulnerability aggregation continued to rise considerably. Over time that growth tapers off as the team simply didn’t have time to keep digging that far back into changelogs and bug trackers looking for more vulnerabilities that were less beneficial to organizations.
The tl;dr takeaways:
- The number of vulnerabilities disclosed in a given year is static, but the VI teams that aggregate the information won’t find them all that year.
- Vulnerabilities by year, as reported, will slowly climb over time as additional aggregation work is performed.
- While that newly aggregated data may be “old” as far as what software is being used within an organization, it still contributes to better metadata and product/vendor evaluation (“cost of ownership”)