I have a ETL project, that has alot of data that needs cleaning. We're talking about alot of complex transformations. The process needs to take place nightly, and has to finish within a certain amount of time (10 hours). To this end it is best that the ETL use all the processor cores on the system.
Which would be better to use to perform complex ETL transforms in a multi processor environment:
SSIS
or
Dot Net Framework 4 (let me qualify that. I can write and application using entity framework and parallel tasks to do the complex data transforms that are required. Writing an application to do the ETLing isn't a problem, however I'm trying to use the best tool for the job.)
I know it's a开发者_如何学Cn unfair question; that SSIS is a technology and dot net is a framework but still...
Yes, working with SSIS is a chore and every project for which I have used it has amazed me by how much longer it took than expected. To be fair, I suppose that a solution to most any problem eventually could be fashioned using either one given enough time.
Using either tool usually involves doing some research and learning in each project. Learning about .NET leaves me edified. Struggling with patchy work-arounds and arcane code hacks to make SSIS work leaves me deflated.
What could be more elementary in software coding than reading from and writing to variables in memory? How complex could it possibly be in any language? How many restrictions on what, when, and where could there be on performing such a rudimentary task? To find the answer, search the internet for the phrase "ssis write to variables in script". SSIS takes complexity to a whole new level, even for the simplest of operations! God help you if you have to write to a package variable within a data flow task.
i'll say no.
i started to write an ETL job, and got stymied by the first column of data: a formatted date time. SSIS was unable to make heads or tails of it.
Perhaps you can spent weeks trying to figure out how to convince SSIS to do what you want - but it's much easier just to get it done.
SSIS is a tool specifically for doing the job you mention. It's ideal for ETL processing and has a lot of common tasks built-in; in a custom .Net framework you'd have to code these from scratch.
精彩评论