information-retrieval
Are there any API's that'll let me search by image?
I have an image and I want to 开发者_如何学Csearch to see what it is. Any API\'s available for that?I believe there are quite a few. You want to search for Content-based Image Retrieval (CBIR). Wikipe[详细]
2023-03-28 06:36 分类:问答How can I extract only the main textual content from an HTML page?
Update Boilerpipe appears to work really well, but I realized that I don\'t need only the main content because many pages don\'t have an article, but only links with some short description to the ent[详细]
2023-03-27 00:10 分类:问答how should I think about search engine indices?
I am using elastic search and do not understand exactly what an index is. For example, if I have 3 models (a backpack, a shoe and a glove), do I put each model in its own index or do I index attribute[详细]
2023-03-21 18:05 分类:问答Jaccard Similarity in Lucene
I need to calculate the similarity of a query and document in Lucene using Jaccard similarity over n-grams. As Jaccard similarity is is a very common measure in IR, 开发者_开发问答I expected to find a[详细]
2023-03-20 09:49 分类:问答calculating probability distribution
I have a sim开发者_运维技巧ple (may be stupid) question. I want to calculate Kullback–Leibler divergence on two documents. It requires probability distribution of each document.[详细]
2023-03-18 08:43 分类:问答Fast in-memory inverted index
I am looking for a fast in-memory implementation of a generic inverted index. All I need is to store features with weights for a couple million entities and use the inverted index to compute similarit[详细]
2023-03-18 01:50 分类:问答What is difference between crawling, Parsing, Indexing, Search from Python libraries perspective [closed]
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhet开发者_如何学Corical andcannot be reasonably answered in its current form.[详细]
2023-03-14 02:44 分类:问答Cosine similarity and tf-idf
I am confused by the following comment about TF-IDF and Cosine Similarity. I was reading up on both and then on wiki under Cosine Similarity I find this sentence \"In case of of information retrieva[详细]
2023-03-11 15:26 分类:问答Inferring templates from a collection of strings
I am indexing a set of websites which have a very large number of pages (tens of millions) that are generated from a small number of templates. I am looking for an algorithm to learn the templates tha[详细]
2023-03-11 03:06 分类:问答HP Universal Configuration Management Database UCMDB Starting Point
I\'m working with a contractor installed UCMDB instance that was put in before I started.What are some good starting points that I should read to get up to speed so that I can ask good questions of th[详细]
2023-03-09 22:18 分类:问答