information-retrieval
A software/hardware structure of the Google Search/Maps Linux-clusters?
I am particularly interested how one can deal with a huge amount of information for a commercial service like Google Search or Google Maps. We all know they use (or \"did\" at least) a kind of Linux c[详细]
2023-01-03 09:52 分类:问答Ngram IDF smoothing
I am trying to use IDF scores to find interesting phrases in my pretty huge corpus of documents. I basically need something like Amazon\'s Statistically Improbable Phrases, i.e. phrases that distingui[详细]
2023-01-02 20:59 分类:问答Writing a program to scrape forums
I need to write a program to scrape forums. Should I write the program in Python using the Scrapy framework or shou开发者_开发百科ld I use Php cURL?[详细]
2023-01-02 04:21 分类:问答Create a dataset: extract features from text documents (TF-IDF)
I\'ve to create a dataset from some text files, writing them as vectors of features. Something like this:[详细]
2023-01-01 20:11 分类:问答entity set expansion python
Do you know of any existing implementation in any language (preferably python) of any entity set expansion algorithms, such that the one from Google sets ? ( http://labs.google.com/sets )[详细]
2022-12-29 05:06 分类:问答How to estimate the quality of a web page?
I\'m doing a university project, that must gather and combine data on a user provided topic. The problem I\'ve encountered is that Google search results for many terms are polluted with low quality au[详细]
2022-12-29 00:18 分类:问答Find Tables in PDF's
Are there any tools or tricks how to automatically extract tables from pdfs. Are there any C# libraries that could do that? Or do you maybe know other methods how this could be handled?[详细]
2022-12-28 11:02 分类:问答Retrieve some info from the web automatically
I need to retrieve some info from web. For example, I can visit weather.com to search my zip code to get H开发者_如何学GoTML file that contains the temperature or something. I need to make a python sc[详细]
2022-12-27 13:42 分类:问答Information Retrieval database formats?
I\'m looking for some documentation on how Information Retrieval systems (e.g., Lucene) store their indexes for speedy \"relevancy\" lookups.My Google-fu is failing me: I\'ve found a page which descri[详细]
2022-12-26 12:45 分类:问答Assistance with building an inverted-index
It\'s part of an information retrieval thing I\'m doing for school. The plan is to create a hashmap of words using the the first two letters of the word as a key and any words with the two letters sav[详细]
2022-12-25 06:43 分类:问答