开发者

python: fast dictionary word lookup with wildcards*

开发者 https://www.devze.com 2023-04-10 14:10 出处:网络
Given a text, which is split into a list of words, I want to lookup each of the words in an dictionary of words, which too is read from a text-file and split(\'\\n\').

Given a text, which is split into a list of words, I want to lookup each of the words in an dictionary of words, which too is read from a text-file and split('\n').

Rather than checking if each word is contained in the dictionary (which is gruesomely slow) I need to select a list of elements based on wildcards* ('*' is at the end i.e. no permuterm solution required). For instance, the solution should select all dictionary elements starting with 'dep', without traversing the entire dictionary list.

Performance开发者_如何学Go is of the essence in this case. I though of a Btree...but

  1. What would be the best package and data-type for a fast implementation in Python.
  2. Please provide code examples


Use a dawg, which is more efficient than a Trie in terms of space waste. There are a few python implementations, but for a start take a look here.


You want a trie. Use the PyTrie package.

0

精彩评论

暂无评论...
验证码 换一张
取 消