开发者

Implementing search on medical link list/table that allows for synonyms/abbreviations- and importing such a thing

开发者 https://www.devze.com 2023-02-04 05:11 出处:网络
I\'m making a simple searchable list which will end up containing about 100,000 links on various medical topics- mostly medical conditions/diseases.

I'm making a simple searchable list which will end up containing about 100,000 links on various medical topics- mostly medical conditions/diseases. Now on the surface of things this sounds easy... in fact I've set my tables up in the following way:

  • Links: id, url, name, topic
  • Topics (eg cardiology, paediatrics etc): id, name
  • Conditions (eg asthma, influenza etc): id, name, aliases

And possibly another table:

    开发者_JAVA百科
  • Link & condition (since 1 link can pertain to multiple conditions): link id, condition id

So basically since doctors (including myself) are super fussy, I want to make it so that if you're searching for a condition- whether it be an abbreviation, british or american english, or an alternative ancient name- you get relevant results (eg "angiooedema", "angioedema", "Quincke's edema" etc would give you the same results; similarly with "gastroesophageal reflux" "gastro-oesophageal reflux disease", GERD, GORD, GOR). Additionally, at the top of the results it would be good to group together links for a diagnosis that matches the search string, then have matches to link name, then finally matches to the topic.

My main problem is that there are thousands if not tens of thousands of conditions, each with up to 20 synonyms/spellings etc. One option is to get data from MeSH which happens to be a sort of medical thesaurus (but in american english only so there would have to be a way of converting from british english). The trouble being that the XML they provide is INSANE and about 250mb. To help they have got a guide to what the data elements are.

Honestly, I am at a loss as to how to tackle this most effectively as I've just started programming and working with databases and most of the possibilities of what to do seem difficult/suboptimal.

Was wondering if anyone could give me a hand? Happy to clarify anything that is unclear.


Your problem is well suited to a document-oriented store such as Lucene. For example you can design a schema such as

Link Topic Conditions

  1. Then you can write a Lucene query such as Topic:edema and you should get all results. You can do wildcard search for more.

  2. To match british spellings (or even misspellings) you can use the ~ query which finds terms within a certain string distance. For example edema~0.5 matches oedema, oedoema and so on...

Apache Lucene is a Java library with portts available for most major languages. Apache Solr is a full-fledged search server built using Lucene lib and easily integrable into your platform-of-choice because it has a RESTful API.

Summary: my recommendation is to use Apache Solr as an adjunct to your MySql db.


It's hard. Your best bet is to use MeSH and then perhaps soundex to match on British English terms.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号