开发者

How to make a small engine like Wolfram|Alpha?

开发者 https://www.devze.com 2022-12-29 19:08 出处:网络
Lets say I have three models/tables: operating_systems, words, and programming_languages: # operating_systems

Lets say I have three models/tables: operating_systems, words, and programming_languages:

# operating_systems
name:string created_by:string family:string
Windows     Microsoft         MS-DOS
Mac OS X    Apple             UNIX
Linux       Linus Torvalds    UNIX
UNIX        AT&T              UNIX

# words
word:string defenitions:string
window      (serialized hash of defenitions)
hello       (serialized hash of defenitions)
UNIX        (serialized hash of defenitions)

# programming_languages
name:string created_by:string example_code:text
C++         Bjarne Stroustrup #include <iostream> etc...
HelloWorld  Jeff Skeet        h
AnotherOne  Jon Atwood        imports 'SORU开发者_如何学PythonLEZ.cs' etc...

When a user searches hello, the system shows the defenitions of 'hello'. This is relatively easy to implement. However, when a user searches UNIX, the engine must choose: word or operating_system. Also, when a user searches windows (small letter 'w'), the engine chooses word, but should also show Assuming 'windows' is a word. Use as an <a href="etc..">operating system</a> instead.

Can anyone point me in the right direction with parsing and choosing the topic of the search query? Thanks.


Note: it doesn't need to be able to perform calculations as WA can do.


Have a new index table called terms that contains a tokenised version of each valid term. That way, you only have to search one table.

# terms
Id Name     Type               Priority
1  window   word               false
2  Windows  operating_system   true

Then you can see how close a match the users search term is. I.e. "Windows" would be a 100% match with 2 - so assume that, but a close match to 1 also, so suggest that as an alternative. You've have to write your own rules engine that decided how close a word matches (i.e. what gets assumed with "windows" vs "Windows"?) The Priority field could be the final decider if the rules engine can't decide, and could in theory be driven by user activity so it learns what users are more likely referring to.


And what about to make a cache in form of a database table where all the keywords would be.

The search query would be something like this:

SELECT * FROM keywords WHERE keyword = '<YourKeyWord>'   /* mysql */

the keywords table would contain some kind of references to your modules.

The advantage of this approarch is of course fast searching.

You may use two queries in order to simulate the behaviour you ask for:

  • Exact match (no problem in mysql)
  • Case insensitive search


Wolfram Alpha is far more complex than your example... I'm not certain of its inner workings (I have done very little reading on it), but I believe it is a very large and complex automated inference system. They're rather trivial to implement (Prolog is basically a general purpose one you can put whatever data you need into), but they're very hard to make useful.

0

精彩评论

暂无评论...
验证码 换一张
取 消