Monday, 8 December 2008

Report: Manual vs. Automated Vulnerability Assessment

Here is a very interesting research paper published on October 26, 2008 called Manual vs. Automated Vulnerability Assessment (by James A. Kupsch  and Barton P. Miller who are part of the University of Wisconsi Vulnerability Assessment project) .

In this paper they use the Condor application (version Condor 6.7.12 which doesn't seem to be available for download on  and used the Commercial Source Code scanners Fortify and Coverity to perform an analysis of the security vulnerabilities that they discovered on Condon (see for the list of the 14 vulnerabilities they discovered and for a detailed description of one).

I really like this concept and I'm very happy that they were able to publish their results. Basically what they did was to say: "Ok here are a number of security vulnerabilities that we discovered ([probably manually]), so let's see what the Source Code Scanning tools  can find?"

Of course that I am bias on my interest on this research, since one of my current contracts is with Ounce Labs which is a direct competitor of Fortify and Coverity, and I've developed a number of Open Source tools (called O2) that augment Ounce Labs technology capabilities (more comments about that in a bit).

So here is what I think it is relevant about this paper:
  1. This is the a public release of Source Code Scanning analysis data, mapped to real vulnerabilities. This will allow us to perform public benchmarking of different tools and understand how those tools can be used (of course that I will bring OunceLabs to the mix here)
  2. It is pretty obvious by the results that: a) out-of-the box the tools perform very poorly, and b) a security consultant needs to be involved to triage those findings.
  3. In fact, even Fortify, who had the best results, did not discover those 6 issues immediately (they only had 3 issues identified as Critical ( and 2301 marked as Hot, 8101 marked as Warning, 5061 marked as Info)
  4. What I would like to know (and this is where the discussion becomes relevant to O2), is how much work and triage was needed to find those 6 (Fortify) and 1 (Coverity) issues? And how much manual work was needed to find the original set of vulnerabilities?
  5. I also would like to know what was the criteria needed to mark scanning results as a successful discovery? (do you just need to point to a dangerous function, or do you need to have a complete trace?).
  6. Other interesting questions are "How many of the findings (15466 on Fortify and 2686 on Coverity) where False Positives?" and "Where there any new vulnerabilities discovered by the tools?"
  7. One annoying thing is that the authors of this paper did not publish a link to the changes they made to Condor: "...with 13 small patches to compile with newer gcc; ... built as a "clipped" version, i.e., no standard universe, and no Kerberos and Quill as these would not build without extensive work on the new platform and tool chain ...". So I will contact them to see if we can replicate their test/scanning environment
  8. Just for the record, I don't know (yet) what are Ounce Labs 'out of the box' results in this case (and will publish them once I can replicate their scans). But I think they would be (before further analysis) similar to Fortify's results.
Next I will write up a O2 Challenge for this [UPDATE: here it is O2 Challenge #2) Find the 14 Vulns in Condor 6.7.12]. The plan (over the next couple months) is to use this application as a C++ case-study for how O2 can be used on the discovery of these type of issues (remember that ultimately, O2 is designed to ''automate the security consultants brain", so if a security consultant can find it, so should O2 :) . The only 'issue' should be how much custom rules/scripts will need to be added/created. 

In fact, my view is that ultimately ALL tools should be able to find these issues. The competitive advantage of Tool A vs Tool B (commercial or Open Source) should be:
  • the amount of time and customization that were required to find those issues (including set-up of scanning environment)
  • the ability to replicate those 'insecure patterns' on the entire code base
  • the reporting (namely finding's consolidation and automatic creation of Unit Tests)
  • the risk rating and thread model capabilities
  • the integration with the SDL
  • the ability to model 'positive security' (i.e. map the security controls and 'prove' that they were correctly implemented

new O2 content, Hacmebank and 1st challenge

Here is an update of the latest content added to the O2 website at
One question I had is on the file format for the videos. Which one should I use: Mp4 or WMF?

The above is just a small sample of the content that I am planning to upload over the next couple weeks. So if there is an area that you really want me to cover, let me know and I will write a post about it.