The Rundown: CVE IDs, Meanings, & Assumptions

For almost two decades, CVE has been considered an industry standard for vulnerability tracking. A CVE ID can be affiliated with many vulnerabilities, in a format like CVE-2014-54321. Note my choice in ID, from 2014 with a consecutive set of numbers. That is because I specifically chose a ‘sample’ CVE that was set aside as an example of the CVE ID Syntax Change in 2014. This change occurred when it was determined that 9,999 IDs for a single year was not going to be sufficient. Technical guidance on this is available, as well as more basic information and the announcement about the change. Starting out with this hopefully demonstrates that there may be more to an ID than meets the eye.

Fundamentally, the ID is simple; you have the CVE prefix, followed by a year identifier, and a numeric identifier. In the CVE used above, it would represent ID 54321 with a 2014 year identifier. Fairly simple! But you are reading an entire blog on these IDs by me so the spoiler is here. It isn’t so simple unfortunately. I want to give a rundown of what a CVE ID really is, and set the record straight. Why? Because I don’t think MITRE has done a good job with that, and worse, actively works against what could be a clear and simple policy. We’ll use CVE-YEAR-12345 as a representative example for the purpose of discussing these IDs to be clear about which part of an ID we’re talking about.


When CVE was started in 1999, assignments were made based on a public disclosure. However, from the beginning, the YEAR portion almost immediately was not guaranteed to represent the year of disclosure. This was because MITRE’s policy was to assign an ID for a pre-1999 vulnerability using a CVE-1999 ID. We can see this with CVE-1999-0145 which was assigned for the infamous Sendmail WIZ command, allowing remote root access. This feature was publicly disclosed as a vulnerability on November 26, 1983 as best I have determined (the Sendmail changelog). While it was a known vulnerability and used before that, it was privately shared. If there is a public reference to this vulnerability before that date, leave a comment please!

The takeaway is that a vulnerability from 1983 has a CVE-1999 identifier. So from the very first year, MITRE set a clear precedent that the YEAR portion of an ID does not represent the year of discovery or disclosure. You may think this only happened for vulnerabilities prior to 1999, but that isn’t the case. In the big picture, meaning the 22 years of CVE running, an ID typically does represent the disclosure year. However, per one of CVE’s founders, “because of CVE reservation, sometimes it aligned with year of discovery“. That is entirely logical and expected as a CVE ID could be used to track a vulnerability internally at a company before it was disclosed. For example, BigVendor could use the CVE ID not only for their internal teams, such as communicating between security and engineering, but when discussing a vulnerability with the researcher. If a researcher reported several vulnerabilities, using an ID to refer to one of them was much easier than the file/function/vector.

For the early CVE Numbering Authorities (CNA), companies that were authorized to assign a CVE without going through MITRE, this was a common side effect of assigning. If a researcher discovered a vulnerability on December 25 and immediately reported it to the vendor, it may be given e.g. a CVE-2020 ID. When the vendor fixed the vulnerability and the disclosure was coordinated, that might happen in 2021. The founder of CVE I spoke to told me there “weren’t any hard and fast rules for CNAs” even at the start. So one CNA might assign upon learning of the vulnerability while another might assign on public disclosure.

Not convinced for some reason? Let’s check the CVE FAQ about “year portion of a CVE ID”!

What is the significance and meaning of the YEAR portion of a CVE ID
CVE IDs have the format CVE-YYYY-NNNNN. The YYYY portion is the year that the CVE ID was assigned OR the year the vulnerability was made public (if before the CVE ID was assigned).

The year portion is not used to indicate when the vulnerability was discovered, but only when it was made public or assigned.

Examples:

A vulnerability is discovered in 2016, and a CVE ID is requested for that vulnerability in 2016. The CVE ID would be of the form “CVE-2016-NNNN”.
A vulnerability is discovered in 2015 and made public in 2016. If the CVE ID is requested in 2016, the CVE ID would be of the form “CVE-2016-NNNNN”.

All clear, no doubts, case closed!

That clear policy is conflicting or may introduce confusion in places. Looking at MITRE’s page on CVE Identifiers, we see that the “The process of creating a CVE Record begins with the discovery of a potential cybersecurity vulnerability.” My emphasis on ‘discovery’ as that means the ID would reflect when it was discovered, and not necessarily even when it was reported to the vendor. There are many cases where a researcher finds a vulnerability but may wait days, weeks, months, or even years before reporting it to the vendor for different reasons. So it is more applicable that the ID will be assigned based on when the vendor learns of the vulnerability in cases of coordinated disclosure with a CNA. Otherwise, a bulk of CVEs are assigned based on the disclosure year.

It gets messier. At the beginning of each year, each CNA will get a pool of CVE IDs assigned. The size of the pool varies by CNA and is roughly based on the prior year of assignments. A CNA that disclosed 10 vulnerabilities in the prior year is likely to get 10 – 15 IDs the subsequent year. Per section 5.1.4 of the CNA rules, any IDs that are not used in a calendar year should be REJECTed if they were not assigned to an issue. “Those CVE IDs that were unused would be rejected.” But then, it stipulates that the CNA can get “CVE IDs for previous calendar years can always be requested if
necessary.
” So per current rules, a CNA can request a new ID from a prior year despite REJECTing IDs that were previously included in their pool. That means it is entirely optional, up to each CNA, on how they assign.

[Update: Note that the pool of IDs a CNA gets one year may not be the same the next. Not only in regards to the size of the pool, but the first ID may be in an entirely different range. e.g. 2019-1000 vs 2020-8000.]

The take-away from all this is that we now have many reasons why a CVE ID YEAR component does not necessarily tie to when it was disclosed. The more important take-away? If you are generating statistics based on the YEAR component, you are doing it wrong. Any statistics you generate are immediately inaccurate and cannot be trusted. So please don’t do it!


Finally, a brief overview of the numeric string used after the YEAR. Going back to our example, CVE-YEAR-12345, it is easy to start to make assumptions about 12345. The most prevalent assumption, and completely incorrect, is that IDs are issued in a sequential order. This is not true! Covered above, CNAs are given pools of IDs at the beginning of each year. Oracle and IBM assign over 700 vulnerabilities a year, so the pool of IDs they receive is substantial. There are over 160 participating CNAs currently, and if each only received 100 IDs, that is over 16,000 IDs that are assigned before January 1st.

In 2021, the effect of this can be seen very clearly. Halfway through April and we’re already seeing public IDs in the 30k range. For example, CVE-2021-30030 is open and represents a vulnerability first disclosed on March 28th. According to VulnDB, there are only 7,074 total vulnerabilities disclosed this year so far. That means we can clearly see that CVE IDs are not assigned in order.

Redscan’s Curious Comments About Vulnerabilities

As a connoisseur of vulnerability disclosures and avid vulnerability collector, I am always interested in analysis of the disclosure landscape. That typically comes in the form of reports that analyze a data set (e.g. CVE/NVD) and draw conclusions. This seems straight-forward but it isn’t. I have written about the varied problems with such analysis many times in the past and yet, companies that don’t operate in the world of vulnerability databases still decide to play in our mud puddle. This time is the company Redscan, who I don’t think I had heard of, doing analysis on NVD data for 2020. Risk Based Security wrote a commentary on their analysis, to which I contributed, but I wanted to keep the party going over here with a few more personal comments. Just my opinions here, as a more outspoken critic on the topic, and where I break from the day job.

I am going to focus on one of my favorite topics; vulnerability tourists. People that may be in the realm of Information Security, but don’t specifically operate day-to-day in the world of vulnerability disclosures, and more specifically to me, vulnerability databases (VDBs). For this blog, I am just going to focus on a few select quotes that made me double-take. Read on after waving to Tourist Lazlo!

“The NVD tracks CVEs logged by NIST since 1988, although different iterations of the NVD account for some variation when comparing like-for-like results over time.”

There’s a lot to unpack here, most of it wrong. First, the NVD doesn’t track anything; they are spoon-fed that data from MITRE, who manages the CVE project. Second, NIST didn’t even create NVD until over five years after CVE started. Third, CVE didn’t track vulnerabilities “since 1988”; they cherry-picked some disclosures from before 1999, when they started, and why CVE IDs start with ‘1999’. Fourth, there was only one different iteration of NVD, that was their ICAT “CVE Metabase” that ran the first year of CVE basically. According to Peter Mell, who created it, said that after starting as its own vulnerability website, “ICAT had become an archival tool for CVE standard vulnerabilities and was only updated every three or four weeks”. Then in 2005 the site relaunched with a new focus and timely updates from CVE. Despite this quote, later in their report they produce a chart that tries to show an even comparison from 1988 to 2020 despite saying it went through iterations and despite not understanding CVSS.

“The growth is also likely attributable to an increase in the number of CVE Numbering Authorities (CNAs) – of which there are now more than 150 worldwide with the power to create and publish CVEs.”

The growth in disclosures aggregated by CVE is a lot more complicated than that, and the increase in CNAs I doubt is a big factor. Of course, they say this and don’t cite any evidence despite CVE now showing who the assigning CNA was (e.g. CVE-2020-2000 is Palo Alto Networks). The data is there if you want to make that analysis but it isn’t that easy since it isn’t included in the NVD exports. That means it requires some real work scraping the CVE website since they don’t include it in their exports either. Making claims without backing them up when the data is public and might prove your argument is not good.

“Again, this is a number that will concern security teams, since zero interaction vulnerabilities are famously difficult to detect and have the potential to cause significant damage.”

This makes me think that Redscan should invent a wall, perhaps made of fire, that could detect and prevent these attacks! Or maybe a system that is designed to detect intrusions! Or even one that can prevent intrusions! This quote is one that is truly baffling because it doesn’t really come with an explanation as to what they mean, and I hope they mean something far different than what this sounds like. I hope this isn’t a fear tactic to make readers think that their managed detection service is needed. Quite the opposite; anyone who says the above probably should not be trusted to do your attack detection.

This chart heading is one of many signs that Redscan doesn’t understand CVSS at all. For a “worst of the worst” vulnerability they got several attributes right but end up with “Confidentiality [High]”. The vulnerability they describe would only be CVSSv2 7.8 (AV:N/AC:L/Au:N/C:C/I:N/A:N) and CVSSv3.1 7.5 (AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N). That is not the worst. If it ‘highly’ impacts confidentiality, integrity, and availability then that becomes the worst of the worst, becoming CVSSv2 10.0 and CVSSv3.1 9.8 or 10.0 depending on scope. It’s hard to understand how a security company gets this wrong until you read a bit further where they say they “selected the very ‘worst’ option for every available metric.” My gut tells me they didn’t realize you could toggle ‘High’ for more than one impact and confidentiality is the first listed.

“It is also important to note that these numbers may have been artificially reduced. Tech giants such as Google and Microsoft have to do a lot to maintain their products and services day-to-day. It is common for them to discover vulnerabilities that are not being exploited in the wild and release a quick patch instead of assigning a CVE. This may account for fewer CVEs with a network attack vector in recent years.”

This is where general vulnerability tourism comes in as there is a lot wrong here. Even if you don’t run a VDB you should be passingly familiar with Microsoft advisories, as an example. Ever notice how they don’t have an advisory with a low severity rating? That’s because they don’t publish them. Their advisories only cover vulnerabilities at a certain threshold of risk. So that means that the statement above is partially right, but for the wrong reason. It isn’t about assigning a CVE, it is about not even publishing the vulnerability in the first place. Because they only release advisories for more serious issues, it actually skews their numbers to include more remote vulnerabilities, not less, primarily on the back of “remote” issues that require user interaction such as browser issues or file parsing vulnerabilities in Office.

This quote also suggests that exploitation in the wild is a bar for assigning a CVE, when it absolutely is not. It might also be a surprise to a company like Redscan, but there are vulnerabilities that are disclosed that never receive a CVE ID.

“Smart devices designed for the mass market often contain a worrying number of vulnerabilities due to manufacturer oversight. Firmware within devices is often used by multiple vendors, meaning that any vulnerabilities in this software has the potential to result in lots of CVEs.”

Wrong again, sometimes. If it is known to be the same firmware used in multiple devices, it gets one CVE ID. The only time there are additional IDs assigned is when multiple disclosures don’t positively ID the root cause. When three disclosures attribute the same vulnerability to three different products, it stands to reason there will be three IDs. But it isn’t how CVE is designed because it artificially inflates numbers, and that is the game of others.

“The prevalence of low complexity vulnerabilities in recent years means that sophisticated adversaries do not need to ‘burn’ their high complexity zero days on their targets and have the luxury of saving them for future attacks instead.”
-vs-
“It is also encouraging that the proportion of vulnerabilities requiring high-level privileges has been on the increase since 2016. This trend means that cybercriminals need to work harder to conduct their attacks.”

So which is it? When providing buzz-quote conclusions such as these, that are designed to support the data analysis, they shouldn’t contradict each other. This goes back to what I have been saying for a long time; vulnerability statistics need qualifications, caveats, and explanations.

“Just because a vulnerability is listed in the NVD as hard to exploit doesn’t mean that attackers aren’t developing PoC code to exploit it. The key is to keep up with what’s happening in the threat landscape and respond accordingly.”

I’ll end here since this is a glowing endorsement for why vulnerability intelligence has to be more evolved than what CVE and NVD are offering. Part of the CVSS specifications include Temporal scoring and one of those attributes is Exploit Code Maturity. This is designed to specifically address the problem above; that knowing the capability of potential attackers matters. With over 21,000 vulnerabilities disclosed last year, organizations are finding that just patching based on the CVSSv3 base score isn’t enough. Sure, you patch the 10.0 / 9.8 since those are truly the worst-of-the-worst, and you patch the 9.3 / 8.8 since any random email might carry a payload. Then what? If all things are equal between vulnerabilities that impact your organization you should look to see if a patch is available (also covered by Temporal score) and if an exploit is available.

Numeric scores are not enough, you have to understand the context behind them. That CVSSv2 remote information disclosure that partially affects confidentiality by disclosing an admin password is only a 5.0. Score it under CVSSv3 and you are looking at a 9.8 because it immediately leads to privilege escalation which is factored in under that system. Heartbleed was a CVSSv2 5.0 with a functional exploit and available patch; look what hell that brought upon us. If you aren’t getting that type of metadata, reconsider your choice of vulnerability intelligence.

A critique of the summary of “Latent Feature Vulnerability Rankings of CVSS Vectors”

What do you think of this?” It always starts out simple. A friend asked this question of an article titled Summary of “Latent Feature Vulnerability Rankings of CVSS Vectors”. This study is math heavy and that is not my jam. But vulnerability databases are, and that includes the CVE ecosystem which encompasses NVD. I am also pretty familiar with limitations of the CVSS scoring system and colleagues at RBS have written extensively on them.

I really don’t have the time or desire to dig into this too heavily, but my response to the friend was “immediately problematic“. I’ll cliff notes some of the things that stand out to me, starting with the first graphic included which she specifically asked me about.

  • The header graphic displays the metrics for the CVSSv3 scoring system, but is just labeled “CVSS”. Not only is this sloppy, it belies an important point of this summary that the paper’s work is based on CVSSv2 scores, not CVSSv3. They even qualify that just below: “We should note the analysis conducted by Ross et al. is based upon the CVSS Version 2 scoring system…
  • Ross et al. note that many exploits exist without associated CVE-IDs. For example, only 9% of the Symantec data is associated with a CVE-ID. The authors offered additional caveats related to their probability calculation.” That sounds odd, but it is readily explained above when they summarize what that data is: “Symantec’s Threat Database (SYM): A database extracted from Symantec by Allodi and Massacci that contains references to over 1000 vulnerabilities.” First, that data set contains a lot more than vulnerabilities. Second, if Symantec is really sitting on over 900 vulnerabilities that don’t have a CVE ID, then as a CNA they should either assign them an ID or work with MITRE to get an ID assigned. Isn’t that the purpose of CVE?
  • Ross et al. use four datasets reporting data on vulnerabilities and CVSS scores…” and then we see one dataset is “Exploit Database (Exploit-DB): A robust database containing a large collection of vulnerabilities and their corresponding public exploit(s).” Sorry, EDB doesn’t assign CVSS scores so the only ones that would be present are ones given by the people disclosing the vulnerabilities via EDB, some of whom are notoriously unreliable. While EDB is valuable in the disclosure landscape, serving as a dataset of CVSS scores is not one of them.
  • About 2.7% of the CVE entries in the dataset have an associated exploit, regardless of the CVSS V2 score.” This single sentence is either very poorly written, or it is all the evidence you need that the authors of the paper simply don’t understand vulnerabilities and disclosures. With a simple search of VulnDB, I can tell you at least 55,280 vulnerabilities have a CVE and a public exploit. There were 147,490 live CVE IDs as of last night meaning that is almost 38% that have a public exploit. Not sure how they arrived at 2.7% but that number should have been immediately suspect.
  • In other words, less than half of the available CVSS V2 vector space had been explored despite thousands of entries…” Well sure, this statement doesn’t qualify one major reason for that. Enumerate all the possible CVSSv2 metric combinations and derive their scores, then look at which numbers don’t show up on that list. A score of 0.1 through 0.7 is not possible for example. Then weed out the combinations that are extremely unlikely to appear in the wild, which is most that have “Au:M” as an example, and it weeds out a lot of possible values.
  • Only 17 unique CVSS vectors described 80% of the NVD.” Congrats on figuring out a serious flaw in CVSSv2! Based on the 2.7% figure above, I would immediately question the 80% here too. That said, there is a serious weighting of scores primarily in web application vulnerabilities where e.g. an XSS, SQLi, RFI, LFI, and limited code execution could all overlap heavily.
  • Input: Vulnerabilities (e.g., NVD), exploit existence, (e.g., Exploit-DB), the number of clusters k” This is yet another point where they are introducing a dataset they don’t understand and make serious assumptions about. Just because something is posted to EDB does not mean it is a public exploit. Another quick search of VulnDB tells us there are at least 733 EDB entries that are actually not a vulnerability. This goes back to the reliability of the people submitting content to the site.
  • The authors note their approach outperforms CVSS scoring when compared to Exploit-DB.” What does this even mean? Exploit-DB does not do CVSS scoring! How can you compare their approach to a site that doesn’t do it in the first place?

Perhaps this summary is not well written and the paper actually has more merit? I doubt it, the summary seems like it is comprehensive and captures key points, but I don’t think the summary author works with this content either. Stats and math yes. Vulnerabilities no.

WhiteSource on ‘Open Source Vulnerability Databases’ – Errata

[This was originally published on the OSVDB blog.]

On September 8, 2016, Jason Levy of WhiteSource Software published a blog titled “Open Source Vulnerability Database”. Almost two years later it came across my radar and I asked via Twitter if WhiteSource was interested in getting feedback on the blog, since it contained errata. They never replied and it fell off my radar. Last week, someone asked me about WhiteSource’s claims about maintaining the largest vulnerability database. While I have not seen it, I expressed doubt and dug into my notes. Until then, I had forgotten that I created notes to write a blog listing the errata. Since their name came up again, I figured it is worth revisiting this, especially since they never replied to me.

First, I was a long-term contributor to OSVDB, an officer in OSF (the 501(c)(3) that managed the project), a founder at Risk Based Security, and one of the maintainers of VulnDB since day one. Don’t worry, all of that is explained in more detail below. Here are quotes from their blog and my thoughts on the errata:

The open source community has been living up to this statement recently, with the accelerated rate of discoveries of open source vulnerabilities reported by such sources as the NVD open source vulnerability database.

CVE/NVD have dismal coverage of open source libraries in the big picture. Their coverage is primarily through a couple dozen projects that seek out CVE IDs, researchers that request CVE IDs, and the OSS-Sec mail list. To date, there are still many developers, projects, and even security researchers that don’t know what CVE is.

The NVD is by far the main source for researching vulnerabilities.

This sentence does not make sense. As you note below, NVD does not research vulnerabilities; they don’t even aggregate vulnerabilities themselves. While many consume NVD’s data over CVE, due to the format they make the data available, more “analysis” is done via CVEDetails.com than NVD in my experience.

Their analysis contains information like how the vulnerability operates, its impact rating, CVSS score, and links to any available patches/fixes.

The main purpose of NVD, aside from a burden on taxpayer’s dollars, is to provide CPE and CVSS scores for vulnerabilities via CVE. They rarely, if ever, do additional analysis of the vulnerability unless required to determine a CVSS score. If they do, they do not “show their work” so to speak, instead using the same CVE description. As best I am aware, they don’t dig up additional references either, so the only solution links provided are via CVE. That said, I haven’t audited NVD in several years to see if that still holds true, but I would be surprised if anything has changed.

The OSVDB (open source vulnerability database) was launched in 2004 by Jake Kouhns, the founder and current CISO of Risk Based Security – the company which now operates OSVDB’s commercial version, the VulnDB.

Jake Kouns, one of the founders of Risk Based Security (RBS) did not launch OSVDB. And that is the Open Sourced Vulnerability Database. OSVDB was created and launched by H.D. Moore. More history is available via Wikipedia. While OSVDB was a basis for the historical data in VulnDB, Risk Based Security funded OSVDB entirely for over two years and it was their data that was shared publicly via OSVDB before the project shut down. When OSVDB shut down, RBS continued to maintain VulnDB which contains over eight years of vulnerability information not found in OSVDB. With considerable resources and a team maintaining VulnDB, it has become something OSVDB wished it could, but never came close.

This commercial version continues to be maintained by Risk Based Security, but without the strong community that maintained the OSVDB.

There was no “strong community” that maintained OSVDB. For years, it limped along on the backs of a handful of volunteers, with very little support from the community; financial or labor. At times, there were only two of us maintaining the project while most volunteers signed up, worked on the database for 20 minutes, and never returned to help again.

The OSVDB/VulnDB is perceived by many in the community as redundant, for the majority of vulnerabilities are also reported in the NVD.

This is WhiteSource’s ignorance or bias at play. OSVDB was known for having tens of thousands of vulnerabilities not found in CVE/NVD. One of VulnDBs strong points is that it contains almost 69,000 vulnerabilities not found in CVE/NVD. Since WhiteSource claims to have a comprehensive database, you should have disclaimed this while spreading false information about VulnDB. Since VulnDB is not public, the notion that is perceived by “many” in such a light is just your marketing. Anyone reading VulnDBs quarterly or year-end reports can see the incredible gap between them and CVE/NVD. Also curious is that the link off “monitoring of multiple vulnerability databases including the CVE/NVD” just links to your blog on the topic. Other than CVE/NVD, which vulnerability databases do you monitor exactly?

I’d also happily sign a non-disclosure agreement to do a quick audit of your database, to verify or refute your claim that you have the “the largest coverage of vulnerability listings“. I’d only ask that in return, I can publish a simple ‘yes’ or ‘no’ answer to that claim.

Security advisories are usually the first place that security professionals and developers look when they have security issues within a specific scope.

If this is WhiteSource’s methodology for maintaining a vulnerability database, it does not bode well for your customers. Formal security advisories are quickly becoming, or have already become, a minority in the disclosure game. OSVDB and VulnDB didn’t stay far ahead of the curve using this antiquated mindset. We knew to look elsewhere all along.

WhiteSource tracks almost a dozen security advisories (like RubyOnRails, RubySec and Node Security), to ensure our customers are alerted on new vulnerabilities the minute they’re reported.

A whole dozen? The VulnDB team currently monitors 3,100 sources every week. Some are monitored ~ hourly, some a few times a day, some once a week depending on various criteria such as publication frequency. If WhiteSource is only looking at those sources, and that type of source, it really doesn’t bode well for your claims of the largest vulnerability listing.

The majority of vulnerabilities are published with a remediation suggestion, like a new patch, new version, system configuration suggestion etc.

While technically true, per VulnDB data, it is better to qualify that. For example, in Q1 2019, 60.8% of disclosed vulnerabilities they aggregated had a documented solution. But, a couple caveats. First, that was down 13.5% over Q1 2018! Second, not all solutions are created equal. Just because there is a patch committed against the ‘master’ branch doesn’t mean an organization can actually use it. Their policy or available resources may not allow for one-off code patching like that.

And here at WhiteSource, this is exactly what we do.

Unfortunately, based on the way you describe what you do, you are not much better than CVE/NVD I’m afraid.

The Duality of Expertise: Microsoft

[This was originally published on the OSVDB blog.]

The notion of expertise in any field is fascinating. It crosses so many aspects of humans and our perception. For example, two people in the same discipline, each with the highest honors academic can grant, can still have very different expertise within that field. Society and science have advanced so we don’t have just have “science” experts and medical doctors can specialize to extreme degrees. Within Information Security, we see the same where there are experts in penetration testing, malware analysis, reverse engineering, security administration, and more.

In the context of a software company, especially one that does not specifically specialize in security (and is trivial to argue was late to the security game), you cannot shoehorn them into any specific discipline or expertise. We can all absolutely agree there is an absolute incredible level of expertise across a variety of disciplines within Microsoft. So when Microsoft releases yet another report that speaks to vulnerability disclosures, the only thing I can think of is duality. Especially in the context of a report that puts forth some expertise that they are uniquely qualified to speak on, while mixed with a topic that pre-dates Microsoft and they certainly aren’t qualified to speak on to some degree.

A Tweet from Carsten Eiram pointed me to the latest report, and brought up the obvious fact that it seemed to be way off when it comes to vulnerability disclosures.

carsten-ms-sir

The “MS SIR” he refers to is the Microsoft Security Intelligence Report, Volume 21 which covers “January through June, 2016” (direct PDF download).

It’s always amusing to me that you get legal disclaimers in such analysis papers before you even get a summary of the paper:

ms-disclaimer

Basically, the take away is that they don’t stand behind their data. Honestly, the fact I am blogging about this suggests that is a good move and that they should not. The next thing that is fascinating is that it was written by 33 authors and 14 contributors. Since you don’t know which of them worked on the vulnerability section, it means we get to hold them all accountable. Either they can break it down by author and section, or they all signed off on the entire paper. Sorry, the joys of academic papers!

After the legal disclaimers, then you start to get the analysis disclaimers, which are more telling to me. Companies love to blindly throw legal disclaimers on anything and everything (e.g. I bet you still get legal disclaimers in the footer of emails you receive, that have no merit). When they start to explain their results via disclaimers while not actually including their methodology, anyone reading the report should be concerned. From their “About this report” section:

This volume of the Microsoft Security Intelligence Report focuses on the first and second quarters of 2016, with trend data for the last several quarters presented on a quarterly basis. Because vulnerability disclosures can be highly inconsistent from quarter to quarter and often occur disproportionately at certain times of the year, statistics about vulnerability disclosures are presented on a half-yearly basis.

This is a fairly specific statement that speaks as if it is fact that vulnerability trends vary by quarter (they do!), but potentially ignores the fact that they can also vary by half-year or year. We have seen that impact not only a year, but the comparison to every year prior (e.g. Will Dormann in 2014 and his Tapioca project). Arbitrarily saying that it is a ‘quarter’ or ‘half-year’ does not demonstrate experience in aggregating vulnerabilities, instead it is a rather arbitrary and short time-frame. Focusing on a quarter can easily ignore some of the biases that impact vulnerability aggregation as outlined by Steve Christey and my talk titled “Buying Into the Bias: Why Vulnerability Statistics Suck” (PPT).

Jumping down to the “Ten years of exploits: A long-term study of exploitation of vulnerabilities in Microsoft software” section, Microsoft states:

However, despite the increasing number of disclosures, the number of remote code execution (RCE) and elevation of privilege (EOP) vulnerabilities in Microsoft software has declined
significantly.

Doing a title search of Risk Based Security’s VulnDB for “microsoft windows local privilege escalation” tells a potentially different story. While 2015 is technically lower than 2011 and 2013, it is significantly higher than 2012 and 2014. I can’t say for certain why these dips occur, but they are very interesting.

5year-win-lpe

Thousands of vulnerabilities are publicly disclosed across the industry every year. The 4,512 vulnerabilities disclosed during the second half of 2014 (2H14) is the largest
number of vulnerabilities disclosed in any half-year period since the Common Vulnerabilities and Exposures system was launched in 1999.

This quote from the report explicitly shows serious bias in their source data, and further shows that they do not consider their wording. This would be a bit more accurate saying “The 4,512 vulnerabilities aggregated by MITRE during the second half of 2014…” The simple fact is, a lot more than 4,512 vulnerabilities were disclosed during that time. VulnDB shows that they aggregated 8,836 vulnerabilities in that same period, but less than the 9,016 vulnerabilities aggregated in the second half of 2015. Microsoft also doesn’t disclaim that the second half of 2014 is when the aforementioned Will Dormann released the results of his ‘Tapioca’ project totaling over 20,500 vulnerabilities, only 1,384 of which received CVE IDs. Why? Because CVE basically said “it isn’t worth it”, and they weren’t the only vulnerability database to do so. With all of this in mind, Microsoft’s comment about the second half of 2014 becomes a lot more complicated.

The information in this section is compiled from vulnerability disclosure data that is published in the National Vulnerability Database (NVD), the US government’s repository of standards-based vulnerability management data at nvd.nist.gov. The NVD represents all disclosures that have a published CVE (Common Vulnerabilities and Exposures) identifier.

This is a curious statement, since CVE is run by MITRE under a contract from the Department of Homeland Security (DHS), making it a “US government repository” too. More importantly, NVD is essentially a clone of CVE that just wraps additional meta-data around each entry (e.g. CPE, CWE, and CVSS scoring). This also reminds us that they opted to use a limited data set, one that is well known in the Information Security field as being woefully incomplete. So even a company as large as Microsoft, with expected expertise in vulnerabilities, opts to use a sub-par data set which drastically influences statistics.

Figure 23. Remote code executable (RCE) and elevation of privilege (EOP) vulnerability disclosures in Microsoft software known to be exploited before the corresponding security update release or within 30 days afterward, 2006–2015

The explanation for Figure 23 is problematic in several ways. Does it cherry pick RCE and EOP while ignoring context-dependent (aka user-assisted) issues? Or does this represent all Microsoft vulnerabilities? This is important to ask as most web browser exploits are considered to be context-dependent and coveted by the bad guys. This could be Microsoft conveniently leaving out a subset of vulnerabilities that would make the stats look worse. Next, looking at 2015 as an example from their chart, they say 18 vulnerabilities were exploited and 397 were not. Of the 560 Microsoft vulnerabilities aggregated by VulnDB in 2015, 48 have a known public exploit. Rather than check each one to determine the time from disclosure to exploit publication, I’ll ask a more important question. What is the provenance of Microsoft’s exploitation data? That isn’t something CVE or NVD track.

Figure 25 illustrates the number of vulnerability disclosures across the software industry for each half-year period since 2H13

Once again, Microsoft fails to use the correct wording. This is not the number of vulnerability disclosures, this is the number of disclosures aggregated by MITRE/CVE. Here is their chart from the report:

ms-2016-1h16-chart

Under the chart they claim:

Vulnerability disclosures across the industry decreased 9.8 percent between 2H15 and 1H16, to just above 3,000.

As mentioned earlier, since Microsoft is using a sub-par data set, I feel it is important to see what this chart would look like using more complete data. More importantly, watch how it invalidates their claim about an industry decrease of 9.8 percent between 2H15 and 1H16, since RBS shows the drop is closer to 18%.

vulndb-vs-nvd

I have blogged about vulnerability statistics, focusing on these types of reports, for quite a while now. And yet, every year we see the exact same mistakes made by just about every company publishing statistics on vulnerabilities. Remember, unless they are aggregating vulnerabilities every day, they are losing a serious understanding of the data they work with.

March 19, 2017 Update – Carsten Eiram (@carsteneiram) pointed out that the pattern of local privilege escalation numbers actually follow an expected pattern with knowledge of researcher activity and trends:

In 2011, Tarjei Mandt while he was at Norman found a metric ton of LPEs in win32k.sys as part of a project.

In 2013, it was Mateusz Jurczyk’s turn to also hit win32k.sys by focusing on a bug-class he dubbed “double-fetch” (he’s currently starting that project up again to see if he with tweaks can find more vulns).

And in 2015, Nils Sommer (reported via Google P0) was hitting win32k.sys again along with a few other drivers and churned out a respectable amount of LPE vulnerabilities.

So 2012 and 2014 represent “standard” years while 2011, 2013, and 2015 had specific high-profile researchers focus on Windows LPE flaws via various fuzzing projects.

So the explanation is the same explanation we almost always see: vulnerability disclosures and statistics are so incredibly researcher driven based on which product / component / vulnerability type a researcher or group of researchers decides to focus on.

Rebuttal: Dark Reading’s “9” Sources for Tracking New Vulnerabilities

[This was originally published on the OSVDB blog.]

Earlier today, Sean Martin published an article on Dark Reading titled “9 Sources For Tracking New Vulnerabilities“. Spanning 10 pages, likely for extra ad revenue, the sub-title reads:

Keeping up with the latest vulnerabilities — especially in the context of the latest threats — can be a real challenge.

One would hope this article would help with that challenge, and it most certainly is one. First, a disclaimer; I was involved with OSVDB for roughly 10 years and the primary curator over that time. Further, I am now involved with Risk Based Security’s VulnDB commercial vulnerability database offering. Both of these are mentioned in the article, so my comments below will most certainly have some level of bias.

To help readers, Sean Martin writes “In no particular order, here are nine key vulnerability data sources for your consideration.” With that, flip to the next page of the article.

It’s important to understand the source — and backing for your source — to avoid getting left without a solid vulnerability database. A good example is the case where many had to say goodbye to their vulnerability feed when minority-player Open Source Vulnerability Database (OSVDB) was shut down.

“Not having OSVDB any longer, while sad for those that relied on it, may actually reduce the complexity in making sure there is integration across all products, MSSPs, services, and SIEMs,” says Fred Wilmot, chief technology officer at PacketSled

I am not sure how OSVDB constituted a “minority-player” in any sense of the term given the broad coverage for a decade. While historical entries were often incomplete, the database was commercially maintained from just before January, 2012, and the information was still given away for free, despite competing with the company providing the support and updates. Since the quote specifically mentions that OSVDB shut down, and it did on April 5, 2016, it’s nice to hear people give belated appreciation to the project. OSVDB shutting down, I would argue, does not reduce the complexity of anything for those knowledgeable about vulnerability disclosure. On the surface, sure! One less set of IDs to integrate across products sounds like a good thing. However, you have to also remember that OSVDB was cataloging thousands of vulnerabilities a year that were not found in the other sources listed in this article. That means there is a level of complexity here that is horrible for companies trying to keep up with vulnerabilities.

Page 3 tells readers about NIST’s National Vulnerability Database (NVD):

NVD is the US government repository of standards-based vulnerability management data. This data enables automation of vulnerability management, security measurement, and compliance. NVD is based on and synchronized with the CVE List (see next slide).

First, since NVD is synchronized with CVE, it is curious that they are listed as separate sources. For those not aware, NVD is a sort of ‘value add’ to CVE in that they generate CVSS and CPE data for the vulnerabilities cataloged by MITRE for the CVE project. Monitoring NVD means you are already monitoring all of CVE and getting the additional meta-data. It is also important to note that the meta-data is outsourced to a contractor who employs ‘junior analysts’ to do the work. This becomes apparent if you consume there data and actually look at their CVSS scores over the last ~ 8 years. Personally, I stopped emailing them corrections many years back due to the volume involved. To this day, you can still often see them scoring Adobe Flash vulnerabilities as CVSSv2 10.0, meaning they miss the ‘context-dependent’ (a.k.a. ‘user-assisted’) aspect which means the access complexity moves from ‘L’ow to ‘M’edium per the CVSSv2 scoring guide on FIRST, resulting in a 9.3 score. Seems minor, but that reclassifies a vulnerability from ‘Critical’ to ‘High’ for many organizations, and should make you question their scoring on more complex issues.

Page 4 tells us more about CVE and offers some “insight” into it that is horribly wrong:

CVE is a dictionary of publicly known information security vulnerabilities and exposures. CVE’s common identifiers enable data exchange between security products and provide a baseline index point for evaluating coverage of tools and services.

Morey Haber, VP of technology at BeyondTrust, offers these examples:
Scanning tools most commonly use CVEs for classification
SIEM technologies understand their applicability in reporting
Risk frameworks use them as a calculation vehicle for applied risk to the business

First, I cannot over-share Steve Ragan’s recent article titled “Over 6,000 vulnerabilities went unassigned by MITRE’s CVE project in 2015“. Consider just the headline, and then think about the fact that CVE does not catalog at least 47,267 vulnerabilities historically. Now, re-read Haber’s examples of how CVE is used and what kind of Achilles’ heel that is for any organization using security software based on CVE.

Fred Wilmot’s quote about CVE is what prompted me to write this entire blog. This is so incredibly wrong and misleading:

“Now that you have a common calculator for interoperability among vendors, the fact that CVE is maintained completely transparently to the community is a HUGE pro,” says Fred Wilmot, chief technology officer at PacketSled. “There is no holdout of exploits for vulnerabilities based on financial gain or intent. It’s altruism at its best. The weakness in the CVE comes in the weaponization of that information and the lack of disclosure for profit and activism, as two examples.”

Where to start…

  1. There is no common calculator for “interoperability among vendors” in the context of CVE. That isn’t what CVE is or does.
  2. CVE is most certainly not maintained transparently to the community. It is not maintained transparently to the volunteer Editorial Board (now known simply as the ‘CVE Board’) either. The backroom workings and decisions MITRE makes on behalf of CVE without Board or public input have been documented before. The last decision that lacked any transparency was their recent catastrophic decision to change the CVE format to a new ‘federated’ scheme. If you have any doubt about this being a backroom decision, look at the first reply from CVE Board member Kurt Seifried.
  3. Wilmot’s characterization that CVE is “altruism at its best” also speaks to a lack of knowledge of CVE. While MITRE, the organization that maintains CVE, is technically a not-for-profit organization, they only take non-compete contracts at incredible expense to the U.S. taxpayer. CVE, and a handful of other ‘C’ projects related to information security, bring in considerable money to the company. In 2015, they enjoyed over $1.4 billion in revenue and maintained $788 million in assets. The fact that the contract to maintain CVE is non-compete, and cannot be bid on by companies more qualified to run the project, speaks to where the real interest lies and it isn’t altruistic.
  4. The weakness in CVE is certainly not the “weaponization” of that information. A significant majority of weaponized exploits that lead to the thousands of data breaches and organizations being compromised are typically done with functional exploits that enjoy little technical information being made public. For example, phishing attacks that rely on Adobe Reader or Adobe Flash are usually patched by Adobe eventually, and the subsequent disclosure has no technical details. Even if researchers post more details down the road, the entries in CVE are rarely updated to include the additional details.
  5. The last bit of Wilmot’s quote, I will need someone to explain to me. “The weakness in the CVE [..] comes in the lack of disclosure for profit and activism.” I don’t know what that means.

Page 5 tells readers about the CERT Vulnerability Notes Database:

The Vulnerability Notes Database provides information about software vulnerabilities. Vulnerability notes include summaries, technical details, remediation information, and lists of affected vendors. Most Vulnerability notes are the result of private coordination and disclosure efforts. For more comprehensive coverage of public vulnerability reports, consider the National Vulnerability Database (NVD).

“This is nice to have, but it still uses CVEs as reference,” says Fred Wilmot, chief technology officer at PacketSled. “NVD is not nearly as practical to consume directly as CVE — the disclosure form is fine, but why would I go there and not directly to MITRE for CVE establishment first? However, it’s probably a good place to spend time during an investigation.”

The CERT VNDB is not a comprehensive vulnerability database, and does not aim to be one. As mentioned, their information is primarily via their assisting researchers in coordinating a disclosure with a vendor. Since CERT is a CNA, meaning they can assign CVE IDs to vulnerabilities they coordinate, it means that over 99% of their entries are covered by CVE and thus NVD. Monitoring NVD will get you all of CVE and almost all of CERT VNDB. The very few CERT VU that do not get CVE IDs assigned before disclosure are rare, and I believe they get assignments shortly after from MITRE.

Once again, Wilmot speaks about these sources and doesn’t appear to have real working knowledge which personifies my term ‘vulnerability tourist’. CERT VNDB disclosures appear on their site before they appear in CVE or NVD. It may be 24 – 72 hours before they appear in fact, meaning that while it still uses CVEs as a reference, for timely monitoring of vulnerabilities it may be important to keep an eye out on CERT directly. Next, Wilmot goes on to say “NVD is not nearly as practical to consume directly as CVE”, apparently not realizing that NVD makes its data available in XML. While MITRE makes the CVE data in several formats, it doesn’t mean NVD is not easy to consume. The most important distinction here is that NVD comes with CPE data where CVE does not. For any medium to large organization, this is basically mandatory meta-data for actually putting the information to use.

Page 6 tells readers about Risk Based Security’s VulnDB offering. The curious bit to me is the quote from Morey Haber:

“VulnDB does not contain audit information, but it is a good source for solutions that need to reference vulnerability information in their products such as firewalls or IDS/IPS and do not want to rely on open source or to build/maintain a library,” said Morey Haber, VP of technology at BeyondTrust.

First, BeyondTrust is not a user of VulnDB, which is a commercial offering unlike CVE/NVD/CERT. They did a short-term trial in 2012, during the beginning of the offering and opted not pursue it as a source of vulnerability intelligence. Second, what does “audit information” even mean in the context of a VDB? Audit information about your own environment maybe? Something that a vulnerability intelligence provider can’t possibly deliver. An audit trail is maintained for each vulnerability entry and is available to customers, but I doubt that is what he means since calling VulnDB out on this doesn’t make sense and the other sources of vulnerabilities listed in this article don’t maintain such a trail.

While VulnDB can certainly be used to reference vulnerabilities in security products as he says, that is the tip of the iceberg. With over 47,000 vulnerabilities not found in CVE or NVD, the breadth of information is incredible. Further, VulnDB has made a concerted effort for years to track vulnerabilities in third-party libraries, and builds on top of the robust meta-data that has been generated for over a decade. Haber’s comments do not reflect actual knowledge of the VulnDB offering.

Page 7 tells readers about the DISA IAVA Database And STIGS. Haber gives commentary on this as well:

“IAVA, the DISA-based vulnerability mapping database, is based on existing SCAP sources, and once in a while it contains details for government systems that are not a part of the commercial world,” says Morey Haber, VP of technology at BeyondTrust. “For any vendor doing .gov or .mil work, this reference is a must.”

While some of the IAVA advisories may contain additional detail, it is important to note that these will not provide any vulnerabilities above and beyond CVE/NVD, and their advisories lag well behind the issues being published in CVE/NVD. Haber is right, that this is a vital resource for .gov and .mil contractors, for several reasons.

Page 8 tells readers about SecurityTracker.com, which is a long-running site that aggregates vulnerabilities to a degree. Haber once again provides commentary:

“The website tends to focus on non-OS vulnerabilities, but they are certainly included in the feed,” says Morey Haber, VP of technology at BeyondTrust. “Infrastructure and IoT tend to make the front page the most, and this site is a good third-party reference for new flaws.”

Actually, they do focus on OS vulnerabilities and that can routinely be seen on their site. As I write this, 2 of the 5 vulnerabilities listed on the front page are in operating systems. The biggest thing to note about this offering is that publicly, they just don’t aggregate a significant volume of vulnerabilities for public consumption. Their most recent ID is 1037091, meaning they have ~ 37,000 entries in their database. Note that they operate like CVE though, where multiple vulnerabilities can be associated with a single ID. Regardless, reading their Weekly Vulnerability Summary emails for the past five weeks shows their volume: Sep 26 2016 (32 alerts), Oct 3 2016 (17 alerts), Oct 10 2016 (26), Oct 17 2016 (31), and Oct 24 2016 (32 alerts). To put that into perspective, VulnDB averages 46 new entries a day in 2016, with the most being 224 in a single day in 2016. For the most part, SecurityTracker may beat CVE on adding to the database, but they are almost entirely covering entries that have CVE IDs.

Page 9 tells readers about the Open Vulnerability And Assessment Language (OVAL) Interpreter And Repository. This is a curious addition to the article because it is not a source of vulnerability intelligence. Instead, it is a standard for reporting about systems. From the OVAL page:

OVAL® International in scope and free for public use, OVAL is an information security community effort to standardize how to assess and report upon the machine state of computer systems. OVAL includes a language to encode system details, and an assortment of content repositories held throughout the community.

While OVAL is certainly useful to some organizations, it does not belong in a list of vulnerability sources.

Page 10 tells readers about Information Sharing And Analysis Centers (ISACs). This is another curious addition as ISACs typically trade information on active attacks, threat actors, and which vulnerabilities may be targeted more heavily. They are generally not a source of vulnerabilities in the same context as most of the resources above.

In summary, Sean Martin’s article says it will share “9 Sources For Tracking New Vulnerabilities”. In reality, based on that quote and context, the article only tells readers about CVE/NVD, CERT VNDB, RBS VulnDB, and SecurityTracker. Several of the sources listed are not really for tracking new vulnerabilities, rather augment the vulnerability threat intelligence in various ways.

SQLi Disclosures and the Last Five Years (Transparent Statistics)

[This was originally published on the OSVDB blog.]

Nothing like waking up to a new article purporting to show vulnerability statistics and having someone ask us for comment. But hey, we love giving additional perspective on such statistics since they are often without proper context and disclaimers. This morning, the new article comes from Help Net Security and is titled “SQL injection vulnerabilities surge to highest levels in three years“. It cites “DB Networks’ research” who did the usual, parsed NVD data. As we all know, that data comes from CVE who is a frequent topic of rant on Twitter and occasional blogs. Cliff notes: CVE does not promise comprehensive coverage of public disclosures. They openly admit this and Steve Christey has repeatedly said that CVE may not be the best for statistics, even as far back as 2006. Early in 2013, Christey again publicly stated that CVE “can no longer guarantee full coverage of all public vulnerabilities.” I bring this up to remind everyone, again, that it is important to add such disclaimers about your data set, and ultimately your published research.

Getting back to the statistics and DB Research, the second thing I wondered after their data source, was who they were and why they were doing this analysis (SQLi specifically). I probably shouldn’t be surprised to learn this comes over a year after they announced they have an appliance that blocks SQL injection attacks. This too should be disclaimed in any article covering their analysis, as it adds perspective why they are choosing a single vulnerability type, and may also explain why they chose NVD over another data sources (since it fits more neatly into their narrative). Following that, they paid Ponemon to help them conduct a study on the threat of SQL injection. Or wait, was it two of them, the second with a bent towards retail breaches? Here are the results of the first one it seems, which again shows that they have a pretty specific bias toward SQL injection.

Now that we know they have a vested interest in the results of their analysis coming out a specific way, let’s look at the results. First, the Help Net Security article does not describe their methodology, at all. Reading their self-written, paid-for news announcement (PR Newswire) about the analysis, it becomes very clear this is a gimmick for advertising, not actual vulnerability statistics research. It’s 2015, years after Steve Christey and I have both ranted about such statistics, and they don’t explain their methodology. This makes it apparent that “CVE abstraction bias” is possibly the biggest factor here. I have blogged about using CVE/NVD as a dataset before, because it contains one of the biggest pitfalls in such statistic generation.

Rather than debunk these stats directly, since it has been done many times in the past, I can say that they are basically meaningless at this point. Even without their methodology, I am sure someone can trivially reproduce their results and figure out if they abstracted per CVE, or per actual SQLi mentioned. As a recent example, CVE-2014-7137 is a single entry that actually covers 54 distinct SQL injection vulnerabilities. If you count just the CVE candidate versus the vulnerabilities that may be listed within them, your numbers will vary greatly. That said, I will assume that their results can be reproduced since we know their data source and their bias in desired results.

With that in mind, let’s first look at what the numbers look like when using a database that clearly abstracts those issues, and covers a couple thousand sources more than CVE officially does:

sqli-five-year-view

“SQL Injection Vulnerabilities Surge to Highest Levels in Three Years” is the title of DB Networks’ press release, and is summarized by Help Net Security as “last year produced the most SQL vulnerabilities identified since 2011 and 104% more than were identified in 2013.” In reality (or at least, using a more comprehensive data set), we see that isn’t the case since the available statistics a) don’t show 2014 being more than 2011 and b) 2011 not holding a candle to 2010. No big surprise really, given that we actually do vulnerability aggregation while NVD and DB Networks does not. But really, I digress. These ‘statistics’ are nothing more than a thinly veiled excuse to further advertise themselves. Notice in the press release they go from the statistics that help justify their products right into the third paragraph quoting the CEO being “truly honored to be selected as a finalist for SC Magazine’s Best Database Security Solution.” He follows that up with another line calling out their specific product that helps “database security in some of the world’s largest mission critical datacenters.

Yep, these statistics are very transparent. They are based on a convenient data source, maintained by an agency that doesn’t actually aggregate the information, that doesn’t have the experience their data-benefactors have (CVE). They are advertised with a single goal in mind; selling their product. The fact they use the word “research” in the context of generating these statistics is a joke.

As always, I encourage companies and individuals to keep publishing vulnerability statistics. But I stress that it should be done responsibly. Disclaim your data source, explain your methodology, be clear if you are curious about the results coming out one way (bias is fine, just disclaim it), and realize that different data sources will produce different results. Dare to use multiple sources and compare the results, even if it doesn’t fully back your desired opinion. Why? Because if you disclaim your data sources and results, the logical and simple conclusion is that you may still be right. We just don’t have the perfect vulnerability disclosure data source yet. Fortunately, some of us are working harder than others to find that unicorn.

CVE Vulnerabilities: How Your Dataset Influences Statistics

[This was originally published on the OSVDB blog.]

Readers may recall that I blogged about a similar topic just over a month ago, in an article titled Advisories != Vulnerabilities, and How It Affects Statistics. In this installment, instead of “advisories”, we have “CVEs” and the inherent problems when using CVE identifiers in the place of “vulnerabilities”. Doing so is technically inaccurate, and it negatively influences statistics, ultimately leading to bad conclusions.

NSS Labs just released an extensive report titled “Vulnerability Threat Trends; A Decade in Review, Transition on the Way“, by Stefan Frei. While the report is interesting, and the fundamental methodology is sound, Frei uses a dataset that is not designed for true vulnerability statistics. Additionally, I believe that some factors that Frei attributes to trends are incorrect. I offer this blog as open feedback to bring additional perspective to the realm of vulnerability stats, which is a long ways from approaching maturity.

Vulnerabilities versus CVE

In the NSS Labs paper, they define a vulnerability as “a weakness in software that enables an attacker to compromise the integrity, availability, or confidentiality of the software or the data that it processes.” This is as good a definition as any. The key point here is a weakness, singular. What Frei fails to point out, is that the CVE dictionary is not a vulnerability database in the same sense as many others. It is a specialty database designed primarily to assign a unique identifier to a vulnerability, or a group of vulnerabilities, to coordinate tracking and discussion. While CVE says “CVE Identifiers are unique, common identifiers for publicly known information security vulnerabilities” , it is more important to note the way CVE abstracts, which is covered in great detail. From the CVE page on abstraction:

CVE Abstraction Content Decisions (CDs) provide guidelines about when to combine multiple reports, bugs, and/or attack vectors into a single CVE name (“MERGE”), and when to create separate CVE names (“SPLIT”).

This clearly denotes that a single CVE may represent multiple vulnerabilities. With that in mind, every statistic generated by NSS Labs for this report is not accurate, and their numbers are not reproduceable using any other vulnerability dataset (unless it too is only based on CVE data and does not abstract differently, e.g. NVD). This distinction puts the report’s statements and conclusions in a different light:

As of January 2013 the NVD listed 53,489 vulnerabilities ..
In the last ten years on average 4,660 vulnerabilities were disclosed per year ..
.. with an all-‐time high of 6,462 vulnerabilities counted in 2006 ..

The abstraction distinction means that these numbers aren’t just technically inaccurate (i.e. terminology), they are factually inaccurate (i.e. actual stats when abstracting on a per-vulnerability basis). In each case where Frei uses the term “vulnerability”, he really means “CVE”. When you consider that a single CVE may cover as many as 66 or more distinct vulnerabilities, it really invalidates any statistic generated using this dataset as he did. For example:

However, in 2012 alone the number of vulnerabilities increased again to a considerable 5,225 (80% of the all-‐time high), which is 12% above the ten-‐year average. This is the largest increase observed in the past six years and ends the trend of moderate declines since 2006.

Based on my explanation, what does 5,225 really mean? If we agree for the sake of argument, that CVE averages two distinct vulnerabilities per CVE assignment, that is now over 10,000 vulnerabilities. How does that in turn change any observations on trending?

The report’s key findings offer 7 high-level conclusions based on the CVE data. To put all of the above in more perspective, I will examine a few of them and use an alternate dataset, OSVDB, that abstracts entries on a per-vulnerability basis. With those numbers, we can see how the findings stand. NSS Labs report text is quoted below.

The five year long trend in decreasing vulnerability disclosures ended abruptly in 2012 with a +12% increase

Based on OSVDB data, this is incorrect. Both 2009 (7,879) -> 2010 (8,835) as well as 2011 (7,565) -> 2012 (8,919) showed an upward trend.

More than 90 percent of the vulnerabilities disclosed are moderately or highly critical – and therefore relevant

If we assume “moderately” is “Medium” criticality, as later defined in the report, is 4.0 -‐ 6.9 then OSVDB shows 57,373 entries that are CVSSv2 4.0 – 10.0, out of 82,123 total. That means 90% is considerably higher than we show. Note: we do not have complete CVSSv2 data for 100% of our entries, but we do have them for all entries affiliated with the ones Frei examined and more. If “moderately critical” and “highly critical” refer to different ranges, then they should be more clearly defined.

It is also important to note that this finding is a red herring, due to the way CVSS scoring works. A remote path disclosure in a web application scores a 5.0 base score (CVSS2#AV:N/AC:L/Au:N/C:P/I:N/A:N). This skews the scoring data considerably higher than many in the industry would agree with, as 5.0 is the same score you get for many XSS vulnerabilities that can have more serious impact.

9 percent of vulnerabilities disclosed in 2012 are extremely critical (with CVSS score>9.9) paired with low attack/exploitation complexity

This is another red herring, because any CVSS 10.0 score means that “low complexity” was factored in. The wording in the report implies that a > 9.9 score could be paired with higher complexity, which isn’t possible. Further, CVSS is scored for the worst case scenario when details are not available (e.g. CVE-2012-5895). Given the number of “unspecified” issues, this may seriously skew the number of CVSSv2 10.0 scores.

Finally, there was one other element to this report that was used in the overview, and later in the document, that is used to attribute a shift in disclosure trends. From the overview:

The parallel and massive drop of vulnerability disclosures by the two long established purchase programs iDefense VCP and TippingPoint ZDI indicate a transition in the way vulnerability and exploit information is handled in the industry.

I believe this is a case of “correlation does not mean causation“. While these are the two most recognized third-party bug bounty programs around, there are many variables at play here. In the bigger picture, shifts in these programs do not necessarily mean anything. Some of the factors that may have influenced disclosure numbers for those two programs include:

  • There are more bug bounty programs available. Some may offer better price or incentive for disclosing through them, stealing business from iDefense/ZDI.
  • Both companies have enjoyed their share of internal politics that affected at least one program. In 2012, several people involved in the ZDI program left the company to form their startup. It has been theorized that since their departure, ZDI has not built the team back up and that disclosures were affected as a result.
  • ZDI had a small bout of external politics, in which one of their most prevalent bounty collectors (Luigi Auriemma) had a serious disagreement about ZDI’s handling of a vulnerability, as relates to Portnoy and Exodus. Auriemma’s shift to disclose via his own company would dramatically affect ZDI disclosure totals alone.
  • Both of these companies have a moving list of software that they offer a bounty on. As it changes, it may result in spikes of disclosures via their programs.

Regardless, iDefense and ZDI represent a small percentage of overall disclosures, it is curious that Frei opted to focus on this so prominently as a reason for vulnerability trends changing without considering some influencing factors. Even during a good year, 2011 for example, iDefense (42) and ZDI (297) together accounted for 339 out of 7,565 vulnerabilities, only ~ 4.5% of the overall disclosures. There are many other trends that could just as easily explain relatively small shifts in disclosure totals. When making statements about trends in vulnerability disclosure and how it affects statistics, it isn’t something that should be done by casual observers. They simply miss a lot of the low-level details you glean on the day-to-day vulnerability handling and cataloging.

To be clear, I am not against using CVE/NVD data to generate statistics. However, when doing so, it is important that the dataset be explained and qualified before going into analysis. The perception and definition of what “a vulnerability” is changes based on the person or VDB. In vulnerability statistics, not all vulnerabilities are created equal.

Adobe, Qualys, CVE, and Math

[This was originally published on the OSVDB blog.]

Elinor Mills wrote an article titled Firefox, Adobe top buggiest-software list. In it, she quotes Qualys as providing vulnerability statistics for Mozilla, Adobe and others. Qualys states:

The number of vulnerabilities in Adobe programs rose from 14 last year to 45 this year, while those in Microsoft software dropped from 44 to 41, according to Qualys. Internet Explorer, Windows Media Player and Microsoft Office together had 30 vulnerabilities.

This caught my attention immediately, as I know I have mangled more than 45 Adobe entries this year.

First, the “number of vulnerabilities” game will always have wiggle room, which has been discussed before. A big factor for statistic discrepancy when using public databases is the level of abstraction. CVE tends to bunch up vulnerabilities in a single CVE, where OSVDB tends to break them out. Over the past year, X-Force and BID have started abstracting more and more as well.

Either way, Qualys cited their source, NVD, which is entirely based on CVE. How they got 45 vulns in “Adobe programs” baffles me. My count says 97 Adobe vulns, 95 of them have CVEs assigned to them (covered by a total of 93 CVEs). OSVDB abstracted the entries like CVE did for the most part, but split out CVE-2009-1872 as distinct XSS vulnerabilities. OSVDB also has two entries that do not have CVE, 55820 and 56281.

Where did Qualys get 45 if they are using the same CVE data set OSVDB does? This discrepancy has nothing to do with abstraction, so something else appears to be going on. Doing a few more searches, I believe I figured it out. Searching OSVDB for “Adobe Reader” in 2009 yields 44 entries, one off from their cited 45. That could be easily explained as OSVDB also has 9 “Adobe Multiple Products” entries that could cover Reader as well. This may in turn be a breakdown where Qualys or Mills did not specify “Adobe Software” (cumulative, all software they release) versus “Adobe Reader” or some other specific software they release.

Qualys tallied 102 vulnerabilities that were found in Firefox this year, up from 90 last year.

What is certainly a discrepancy due to abstraction, OSVDB has 74 vulnerabilities specific to Mozilla Firefox (two without CVE), 11 for “Mozilla Multiple Browsers” (Firefox, Seamonkey, etc) and 81 for “Mozilla Multiple Products” (Firefox, Thunderbird, etc). While my numbers are somewhat anecdotal, because I cannot remember every single entry, I can say that most of the ‘multiple’ vulnerabilities include Firefox. That means OSVDB tracked as many as, but possibly less than, 166 vulnerabilities in Firefox.

Microsoft software dropped from 44 to 41, according to Qualys. Internet Explorer, Windows Media Player and Microsoft Office together had 30 vulnerabilities.

According to my searches on OSVDB, we get the following numbers:

  • 234 vulnerabilities in Microsoft, only 4 without CVE
  • 50 vulnerabilities in MSIE, all with CVE
  • 4 vulnerabilities in Windows Media Player, 1 without CVE
  • 52 vulnerabilities in Office, all with CVE. (based on “Office” being Excel, Powerpoint, Word and Outlook.
  • 92 vulnerabilities in Windows, only 2 without CVE

When dealing with vulnerability numbers and statistics, like anything else, it’s all about qualifying your numbers. Saying “Adobe Software” is different than “Adobe Acrobat” or “Adobe Reader” as the software installation base is drastically different. Given the different levels of abstraction in VDBs, it is also equally important to qualify what “a vulnerability” (singular) is. Where CVE/NVD will group several vulnerabilities in one identifier, other databases may abstract and assign unique identifiers to each distinct vulnerability.

Qualys, since you provided the stats to CNet, could you clarify?

OSVDB Now Supports CVSSv2 Scoring

[This was originally published on the OSVDB blog.]

OSVDB now displays CVSSv2 scores, mostly as calculated by the National Vulnerability Database (NVD):

cvss-score

Along with the score, we display the date that NVD generated it and give users a method for recommending updates if they feel the score is inaccurate. While this is long overdue, this is one of many CVSS related features we will be adding in the near future. For those wondering about the delay in adding CVSS support, the cliff notes answer is “we had reservations about the scoring system”. Back in 2005, Jake and I had a long chat with a couple of the creators of CVSS and brought up our concerns. Our goal was to create our own scoring system, but internal debate (and procrastination) lead to neither being implemented. Rather than creating our own system, we finally opted to use what has become an industry standard. Some of our planned CVSS score enhancements on the to-do list, no particular order:

  1. Method for adding our own CVSS score. There are thousands of entries in OSVDB that do not have a CVE assignment, and as a result, no NVD based CVSS score.
  2. A more robust moderation queue to handle proposed changes. This may optionally have a one-click method for us to notify NVD of our change so they may consider revising their score.
  3. Ensure the CVSSv2 score is part of the database dumps, available for download.
  4. Method for tracking CVSS score historically. As NVD revises their score, or we do, a user should be able to see the history of changes.
  5. Compare our/NVD scores with other public tools, display discrepancy if different. For example, if a Nessus plugin scores an issue differently than NVD, show both scores so users may consider which is more accurate.
  6. Track researcher generated CVSS score. While infrequent, some advisories provide scoring. If different than NVD, display the discrepancy.

As always, if you have ideas on how we could better handle CVSS scoring, or have additional ideas for features, please contact us!