I am developing a ETL process that extract business data from one database to a data warehouse. The application is NOT using NHibinate, Linq to Sql or Entity Framework. The application has its own generated data access classes that generate the necessary SQL statements to perform CUID.
As one can image, developers who write code that generate custom SQL can easily make mistakes.
I would like to write a program that generate testing data (Arrange), than perform the ETL process (Act) and validate the data warehouse (Assert).
I don't think it is hard to write such program. However, what I worry is that in the past my company had attempt to do something similar, and ending up with a brunch of un-maintainable unit tests that constantly fail because of many new changes to the database schema as new features are added.
My plan is to write an integration test that runs on the build machine, and not any unit tests to ensures the ETL process works. The testing data cannot be totally random generate because of business logic on determine how data are loaded to the data warehouse. We have custom development tool that generates new data access classes when there is a change in the database definition.
I would love any feedback from the community on giving me advice on write such开发者_如何学运维 integration test that is easy to easy to maintain. Some ideas I have:
Save a backup testing database in the version control (TFS), developers will need to modify the backup database when there are data changes to the source or data warehouse.
Developers needs to maintain testing data though the testing program (C# in this case) manually. This program would have a basic framework for developer to generate their testing data.
When the test database is initialize, it generate random data. Developers will need to write code to override certain randomly generated data to ensure the test passes.
I welcome any suggestions Thanks
Hey dsum, allthough I don't really know your whole architecture of the ETL, I would say, that integration-testing should only be another step in your testing process.
Even if the unit-testing in the first encounter ended up in a mess, you should keep in mind, that for many cases a single unit-test is the best place to check. Or do you want to split the whole integration test for triple-way case or sth. other further deep down, in order to guarantee the right flow in every of the three conditions?
Messy unit-test are only the result of messy production code. Don't feel offended. That's just my opinion. Unit-tests force coders to keep a clean coding style and keep the whole thing much more maintainable.
So... my goal is, that you just think about not only to perform integration testing on the whole thing, because unit-tests (if they are used in the right way) can focus on problems in more detail.
Regards, MacX
First, let's say I think that's a good plan, and I have done something similar using Oracle & PL/SQL some years ago. IMHO your problem is mainly an organizational one, not a technical:
- You must have someone who is responsible to extend and maintain the test code.
- Responsibility for maintaining the test data must be clear (and provide mechanisms for easy test data maintenance; same applies to any verification data you might need)
- The whole team should know that no code will go into the production environment as long as the test fails. If the test fails, first priority of the team should be to fix it (the code or the test, whatever is right). Train them not to work on any new feature as long as the test breaks!
- After a bug fix, it should be easy for the one who fixed it to verify that the part of the integration which failed before does not fail afterwards. That means, it should be possible to run the whole test quick and easily from any developer machine (or at least, parts of it). Quick can get a problem for an ETL process if your test is too big, so focus on testing a lot of things with as few data as possible. And perhaps you can break the whole test into smaller pieces which can be executed step-by-step.
If one wants to maintain data while performing Data integration testing in ETL, We could also go with these steps because Integration testing of the ETL process and the related applications involves in them. for eg:
1.Setup test data in the source system. 2.Execute ETL process to load the test data into the target. 3.View or process the data in the target system. 4.Validate the data and application functionality that uses the data
精彩评论