开发者

TSQL Query for analyzing Text

开发者 https://www.devze.com 2022-12-13 10:57 出处:网络
I have a table that has ordernumber, cancelled date and reason. Reason field is varchar(255) field and it was written by many different sales rep and really hard to group by the reason category I need

I have a table that has ordernumber, cancelled date and reason. Reason field is varchar(255) field and it was written by many different sales rep and really hard to group by the reason category I need to 开发者_运维技巧generate a report to categorize cancelation reasons. What is the best way to analyse the reasons with TSQL?

Sample of reasons entered by sales rep

cust already has this order going out
cust can not hold for item Called to cancel order
cust doesn't want to pay for shipping
wife ordered same item from different vendor, sent email
cst made a duplicate order, sent email
cst can't hold
Cust doesn't want to go through verification process so is cancelling order
doesn't ant to hold  for Bo
doesn't want
Cust called to cancel the order  He can no longer get the product he wants 
cnt hld
will not comply with export req
cant' hold
Custs request
Cust will not hold for BO
per. cust. request.

BTW I have SQL Server 2005.


part of your problem is that this these aren't truly reason codes. sounds like an issue with your schema to me. if there aren't predefined reason codes to reference and you're allowing free text entry for each reason, then there's really no way to do this directly, outside of pulling distinct reasons back, which is probably not going to be very useful.

just an idea, can you add another column to the table, even if it's in a temp or test environment and then give the business users the ability to assign a code (e.g. 1 for mis-ships, 2 for duplicate orders, 3 for wrong item etc.) to each order cancellation. then perform the analysis on that.

i assume that's what they're expecting from you, but i don't know that i see any better way. you could always perform the analysis yourself if you have the authority/knowledge but this might be painful if you have a ton of cancellations.

edit- i see now that you've tagged this with regex... it would be possible to setup specified keywords to pull out the entries, but there'd have to be some tolerance built in and still manual analysis afterwards for items which don't fall into any specified category due to misspellings etc. /edit


+1 to @jmatthews, you really need to have reason codes that are selected and then possibly allow free-form entry for the full reason.

If this isn't an option you can look into text clustering. Don't expect that to be fast or easy though, it's still an open research topic and is related to both AI and machine learning.


Look at Term Lookup in SSIS, here is an article to read.

0

精彩评论

暂无评论...
验证码 换一张
取 消