I have a situation where I have several hundreds of complex excel spreadsheets, each with multiple pivot tables running queries against a sql database. I need to be able to convert these sql queries into function calls against a proprietary data store. This is complicated at many levels, but the part I am asking about now, and seems likely to have been addressed before in computer science, is how to "parse" the sql statements into a well defined structure that I can work with programmatically.
An example of my starting point:
SELECT vwFlowDataBest.MeasurementDate, vwFlowDataBest.LocationType, vwFlowDataBest.ScheduledVolume, tblPoints.Zone, tblPoints.Name AS SOME_ALIAS_FOR_NAME, vwFlowDataBest.PointID, tblCustomerType.Name, vwFlowDataBest.OperationallyAvailable, tblPoints.County, tblPoints.State, tblConnectingParty.Name
FROM Pipe2Pipe.dbo.tblConnectingParty tblConnectingParty, Pipe2Pipe.dbo.tblCustomerType tblCustomerType, Pipe2Pipe.dbo.tblPipelines tblPipelines, Pipe2Pipe.dbo.tblPoints tblPoints, Pipe2Pipe.dbo.vwFlowDataBest vwFlowDataBest
WHERE tblCustomerType.ID = tblPoints.CustomerTypeID AND tblPipelines.ID = vwFlowDataBest.PipelineID AND tblPoints.ID = vwFlowDataBest.PointID AND tblPoints.ConnectingPartyID = tblConnectingParty.ID AND ((tblPipelines.ID=16) AND (vwFlowDataBest.ScheduledVolume<>0) AND (tblPoints.Zone In ('mid 1','mid 2','mid 3','mid 4','mid 5','mid 6','mid 7')) AND (tblCustomerType.ID=16) AND (vwFlowDataBest.MeasurementDate>={ts '2010-05-15 00:00:00'}) AND (tblPipelines.ID<155))
So for this statement, I need to programatically handle the SELECT portion, the FROM portion, and the WHERE portion, and the subordinates within each. Complications of this are things such as aliases, differentiating between a join between tables and a plain old value filter in the where clause, the grouping (brackets) within the where clause, and other issues. Dealing with the complexities of Excel pivot tables is entirely outside the scope of this question, I can figure that out.
For now, I don't mind not supporting certain sql functions, such as "group by", "having", etc...for my problem, those are small enough that if necessary I can handle those manually. But if there's a known way to handle that as well, I'd be most happy.
My feeling is that I can probably get 70% of the way there (for my problem) just by splitting the sql statement into 3 parts, and then further breaking each of those down into their logical subordinate parts and then deal with them accordingly. But as I write this I can already see holes in my plan...this feels like a tarpit of complexity and edge cases.
I can't imagine I'm the first person to want to do such a thing, so my question is, are there old, proven approaches to this sort of problem, existing libraries, inno开发者_如何学Govative approaches I could take, or any suggestions in general to apply to this task?
You seem to need a SQL parser (or at least part of one). It may be overkill for your purposes (more complete than you need), but there's a PL/SQL parser for ANTLR that might be useful.
Edit: I didn't really read that grammar as carefully as I should have before I posted the link. Doing a bit of looking, it doesn't really parse select statements at all -- it just recognizes where one is, and skips across it.
The ANTLR grammars page lists several more SQL grammars though (for the variants supported/used by MySQL, Oracle, etc.) Since you have C# and such in the tags, it's probably fair to guess you want to parse the MS SQL Server variant. There's a grammar strictly for its select
statement that may be a reasonable fit for your needs.
精彩评论