Unit Testing with Complex Directory Structure_问答_开发者

I am trying to use test-driven development for an application that has to read a lot of data from disk. The problem is that the data is organized on the filesystem in a somewhat complex directory structure (not my fault). The methods I'm testing will need to see that a large number of files exist in several different directories in order for the methods to complete.

The solution I'm trying to avoid is just having a known folder on the hard drive with all the data in it. This approach sucks for several reasons, one reason being that if we wanted to run the unit tests on another computer, we'd have to copy a large amount of data to it.

I could also generate dummy files in the setup method and clean them up in the teardown method. The problem with this is that it would be a pain to write the code to replicate the existing directory structure and dump lots of dummy files into those directories.

I understand how to unit test file I/O operations, but how do I unit test this kind of scenario?

Edit: I will not need to actually read the files. The application will need to analyze开发者_C百科 a directory structure and determine what files exist in it. And this is a large number of subdirectories with a large number of files.

I would define a set of interfaces that mimick the file system, such as IDirectory and IFile, and then use Test Doubles to create a representation of the directory structure in memory.

This will allow you to unit test (and vary) that structure to your heart's content.

You will also need concrete implementations that implement those interfaces using the real BCL classes for that purpose.

This lets you vary data structure and data access independently of each other.

This has a Python perspective. You may not be working in Python, but the answer more-or-less applies to most languages.

With unit testing with any external resource (e.g. the os module) you have to mock out the external resource.

The question is "how do mock out os.walk?" (or os.listdir or whatever you're using.)

Write a mock version of the function. os.walk for example. Each mocked-out version returns a list of directories and files so that you can exercise your application.

How to build this?

Write a "data grabber" that does os.walk on real data and creates a big-old flat list of responses you can use for testing.
Create a mock directory structure. "it would be a pain to write the code to replicate the existing directory structure" isn't usually true. The mocked directory structure is simply a flat list of names. There's no pain at all.

Consider this

def setUp( self ):
    structure= [ 
        "/path/to/file/file.x", 
        "/path/to/another/file/file.y", 
        "/some/other/path/file.z",...
    ]
    for p in structure:
        path, file = os.path.split( p )
        try:
            os.makedirs( path )
        except OSError:
            pass
        with open( p, "w" ) as f:
            f.write( "Dummy Data" )

That's all that's required for setUp. tearDown is similar.

Whew, that sounds like a beast. I've been dabbling in testing myself.

It sounds like the main focus of your question is "How do I set up a large number of files so that I can test methods that check that said files exist?"

You mention several possible solutions. You said that you don't want to simply have a folder on the hard drive full of test data because you wouldn't want to have to go through the process of copying the data to another computer, which is understandable.

You also mention that you could write methods to generate dummy files, but it would be a pain to replicate the data structure.

Roy Osherove says in The Art of Unit Testing that it's a great idea to maintain and version your test code as your project is maintained and versioned.

I think that for the sake of consistency, it would make sense to create some dummy data and place it in some kind of source control repository with your test code. That way, you could streamline the process of copying the dummy data onto another computer and not have to worry about keeping track of which dummy data is on which machine. That would be a pain!

My solution: place dummy data is source control.

A possible solution would be to create the dummy file&directory structure from a tar file that your setup method deploys.