How can you ensure secure coding with Test Driven Development?_问答_开发者

I've been coming up to speed on the latest trend that is Test Driven Development (TDD). Most of the development I do is in C or C++. It strikes me that there is a very obvious conflict between common TDD practices and common secure coding practices. At it's heart, TDD tells you that you shouldn't write new code for something that you don't have a failing test for. To me, that means that I shouldn't write secure code unless I have unit tests to see if my code is secure.

That brings up two issues:

How can I effectively write unit tests to test for Buffer Overflows, Stack Overruns, Heap Overruns, Array Index Errors, Format String Bugs, ANSI vs Unicode vs MBCS string size mistmatches, an Safe String Handling (from Howard and LeBlanc's "Writing Secure Code")?
At what point in the standard TDD practice should these tests be included since much of security is non-functional.

Surprisingly, I have found very little research discussing TDD and security. Most of what I come across are TDD papers that mention at a very high level that TDD will "make your code more sec开发者_JS百科ure."

I'm looking for any direct answers to the issues above, any research that pertains to this (I looked already and didn't find much), or any place that TDD guru's live so I can go knock on their door (virtually) and see if they have any good answers.

Thanks!

EDIT:

The topic of Fuzzing has come up, which I think is a great approach to this problem (in general). This raises the questions: Does Fuzzing fit into TDD? Where in the TDD process does fuzzing fit?

Parameterized Unit Testing (possibly automated) has also crossed my mind. This might be a way to get fuzzing-like results earlier into the testing process. I'm not sure exactly where that fits into TDD either.

EDIT 2:

Thank you all for your answers thus far. At this point, I am extremely interested in how we can leverage parameterized tests to serve as pseudo fuzzers for our functions. But, how do we determine what tests to write for testing security? And how can we be sure that we adequately cover the attack space?

It is a well known problem in software security that if you protect against 5 attack scenarios, the attacker will just look for, and use, a 6th attack. It is a very difficult cat-and-mouse game. Does TDD give us any advantage against this?

Yes, TDD is a tool/technique that can help to ensure secure coding.

But as with all things in this industry: assume it's a silver bullet, and you'll shoot yourself in the foot.

Unknown Threats

As you indicated in Edit 2: "you protect against 5 attack scenarios, the attacker will just look for, and use, a 6th attack". TDD is not going to protect you from unknown threats. By its very nature, you have to know what you want to test in order to write the test in the first place.

So suppose threat number 6 is discovered (hopefuly not due to breach, but rather internally due to another tool/technique that attempts to find potential attack vectors).

TDD will help as follows:

Tests can be written to verify the threat.
A solution can be implemented to block the threat, and quickly be confirmed to be working.
More importantly, provided all other tests still pass, you can quickly verify that:
- All other security measures still behave correctly.
- All other functionality still behaves correctly.
Basically TDD assists in allowing a quick turnaround time from when a threat is discovered to when a solution becomes available.
TDD also provides a high degree of confidence that the new version behaves correctly.

Testable Code

I have read that TDD is often misinterpreted as a Testing Methodology, when in fact it is more of a Design Methodology. TDD improves the design of your code, making it more testable.

Specialised Testing

An important feature of test cases is their ability to run without side-effects. Meaning you can run tests in any order, any number of times, and they should never fail. As a result, a number of other aspects of a system become easier to test purely as a result of the testability. For example: Performance, Memory Utilisation.

This testing is usually implemented by way of running special checks of an entire test suite - without directly impacting the suite itself.

A similar security testing module could overlay a test suite and look for known security concerns such as secure data left in memory, buffer overruns or any new attack vector that becomes known. Such an overlay would have a degree of confidence, because it has been checked for all known functionality of the system.

Improved Design

On of the key design improvements arising as a side-effect of TDD is explicit dependencies. Many systems suffer under the weight of implicit or derived dependencies. And these would make testing virtually impossible. As a result TDD designs tend to be more modular in the right places. From a security perspective this allows you to do things like:

Test components that receive network data without having to actually send it over the network.
One can easily mock-out objects to behave in unexpected / 'unrealistic' ways as might occur in attack scenarios.
Test components in isolation.
Or with any desired mix of production components.

Unit Testing

One thing that should be noted is that TDD favours highly localised (unit testing). As a result you could easily test that:

SecureZeroMemory() would correctly erase a password from RAM.
Or that GetSafeSQLParam() would correctly guard against SQL injection.

However, it becomes more difficult to verify that all developers have used the correct method in every place that it's required.
A test to verify a new SQL related feature would confirm that the feature works - it would work just as well with both the 'safe' and 'unsafe' versions of GetSQLParam.

It is for this reason you should not neglect other tools/techniques that can be used to "ensure secure coding".

Coding Standards
Code Reviews
Testing

I'll take your second question first. Yes, TDD works can be used non-functional requirements. In fact, is often used as such. The most common benefit of an improved modular design, which is non-functional-- but seen by everyone who practices TDD. Other examples that I've used TDD to verify: cross-platform, cross-database, and performance.

For all your tests, you may need to restructure the code so that it is testable. This is one of the biggest effects of TDD-- it really changes how you structure your code. At first it seems like this is perturbing the design, but you soon come to realize that the testable design is better. Anyway...

String interpretation bugs (Unicode vs. ANSI) are particularly nice to test with TDD. It's usually straightforward to enumerate the bad and good inputs, and assert about their interpretation. You may find that you need to restructure your code a bit to "make it testable"; by this I mean extract methods that isolate the string-specific code.

For buffer overruns, making sure routines respond properly if given too much data is pretty straightforward to test as well. Just write a test and send them too much data. Assert that they did what you expected. But some buffer overflows and stack overflows are a bit trickier. You need to be able to cause these to happen, but you also need to figure out how to detect whether they happened. This may be as simple as allocating a buffers with extra bytes in them and verifying that those bytes don't change during tests... Or it may some other creative techniques.

I'm not sure there's a simple answer, though. Testing takes creativity, discipline, and commitment, but is usually worth it.

isolate the behavior you need to test
make sure you can detect the problem
know what you want to happen for the error case
write the test and see it fail

Hope this helps

TDD is the best way to build a secure system. All software developed by Microsoft is fuzzed and this arguably the number one reason for the dramatic reduction in vulnerabilities found. I highly recommended using the Peach Framework for this purpose. I have personally used Peach with great success in finding Buffer Overflows.

Peach pit files provide a way of describing the data used by your application. You can choose what interface you want test. Does your application read files? Does it have an open port? After you tell peach what the input looks like and how to communicate with your application, you can turn it loose and i knows all of the nasty input to make your application puke all over its self.

To make everything run, peach has a great testing harness, If your application crashes, peach will know because it has a debugger attached. When your application crashes, peach will restart it and keep testing. Peach can categorize all of the crashes and match up the core dumps with the input it used to crash the application.

Parameterized Tests

While we aren't doing buffer overrun test at my work we do have the notion of template tests. These tests are parameterized to require the specific data for the case we want to test. We then use metaprogramming to dynamically create the real tests by applying the parameters for each case to the template. This has the benefit of being deterministic, and runs as part of our automated test suite.

My TDD Practice

We do Acceptance Test Driven Development at my work. Most of our tests happen to be close to full stack functional tests. The reason is we found it was more valuable to test and assure the behavior of user driven actions. We use techniques like dynamic test generation from parameterized tests to provide us more coverage with a minimum of work. We do this for ASCII vs UTF8, API conventions, and well known variant tests.

The topic of Fuzzing has come up, which I think is a great approach to this problem (in general). This raises the questions: Does Fuzzing fit into TDD? Where in the TDD process does fuzzing fit?

I believe that it might fit quite well! There are fuzzers like american fuzzy lop that can be scripted and adapt themselves to modifications in the I/O format on their own. In this particular case, you could integrate it with Travis CI, store the input test cases you used and run regression testing against those.

I might extend this answer if you come up with any questions for details in the comments.