I have to create Pricelist parser that imports data from excel or csv and put it in database. I have no problems to get data from source. I need to 开发者_开发问答find columns that contains price, product title and description automaticaly.
What can you suggest how to do that, is there common methods or libraries?
Data sample 1:
Intel Core 2 Duo E6300 (2.80GHz, 1066MHz, 2MB, S775) tray | 83
Intel Core 2 Duo E6500 (2.93GHz, 1066MHz, 2MB, S775) tray | 86
Data sample 2:
Title Description Guaranty Price
Intel Core 2 Duo E6300 | 2.80GHz, 1066MHz, 2MB, S775 | 12 | 83
Intel Core 2 Duo E6500 | 2.93GHz, 1066MHz, 2MB, S775 | 6 | 86
Data sample 3:
UPC Title Price
456546545 | Intel Core 2 Duo E6300 | 83
4654654654 | Intel Core 2 Duo E6500 | out of stock
I recently wrote an address parser and the general strategy I used was to first pull out all the items that have a distinguishable pattern. In my case I first found the Postal Code which is analogous to price in your example. From there I found the state code, etc.
In your example I would find the Price and remove it from the line. From there you will need to find some pattern in the data that would allow you to parse our the product code. Without seeing more REAL data it is hard to decide what that is. In my address parser I used address suffixes (Rd, St, Court, etc) to help identify the end of an address line.
If you can provide more data we could probably be more helpful.
If you're using SQL Server, I would suggest not creating a program at all and using SQL Server Integration Services, which has built-in support for CSV and Excel.
Depending on the quality of your input (are all input strings equally formatted), you could try the following:
string s = "Intel Core 2 Duo E6300 (2.80GHz, 1066MHz, 2MB, S775) tray | 83";
string firstPart = s.Substring(0, s.IndexOf("(")).Trim(); //returns "Intel Core 2 Duo E6300"
string secondPart = s.Substring(s.IndexOf("(") + 1, s.IndexOf(")") - s.IndexOf("(") - 1).Trim(); //returns "2.80GHz, 1066MHz, 2MB, S775"
string thirdPart = s.Substring(s.IndexOf(")") + 1, s.IndexOf("|") - s.IndexOf(")") - 1).Trim(); //returns "tray"
string fourthPart = s.Substring(s.IndexOf("|") + 1, s.Length - s.IndexOf("|") - 1).Trim(); //returns "83"
But when your data is not uniformely formatted, you might need to do some (or a lot) of checking before you can use the above functions.
精彩评论