开发者

Extracting Demographic and Contact Information from unstructured text files

开发者 https://www.devze.com 2023-01-01 23:15 出处:网络
I am looking to extract specific items out of a large pool of unstructured documents. These documents could be 1-5 pages of text formatted in various ways by the user, but in most cases would contain

I am looking to extract specific items out of a large pool of unstructured documents. These documents could be 1-5 pages of text formatted in various ways by the user, but in most cases would contain at least:

I'm looking for a semantic parser that can attempt to extract these elements from the documents so that I can load that information into a relational database and work with these records as contacts.

Other services I've looked for, while valuable for other purposes, do not address this specific need.

  • Alchemy API
  • Open Calais
  • Saplo

Any thoughts, suggestions or leads?


Have you found a lead to your question? I found some research articles:

www.cis.upenn.edu/~pereira/papers/crf.pdf

citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.9192&rep=rep1&type=pdf

www2.selu.edu/Academics/Faculty/aculotta/pubs/culotta04extracting.pdf

But no specific examples of code on implementing any of these ideas.

Take a look at this too: stackoverflow.com/questions/953150/general-address-parser-for-freeform-text

(sorry I excluded the http, this system is not allowing me to post more than one url/link)

0

精彩评论

暂无评论...
验证码 换一张
取 消