开发者

Multilingual queries in ElasticSearch

开发者 https://www.devze.com 2023-04-04 20:03 出处:网络
Let\'s say we have the following mapping in ElasticSearch. { \"content\": { \"properties\": { \"id\": { \"type\": \"string\",

Let's say we have the following mapping in ElasticSearch.

{
  "content": {
    "properties": {
      "id": {
        "type": "string",
        "index": "not_analyzed",
        "store": "yes"
      },
      "locale_container": {
        "type": "object",
        "properties": {
          "english": {
            "type": "object",
            "properties": {
              "title": {
                "type": "string",
                "index_analyzer": "english",
                "search_analyzer": "english",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              },
              "text": {
                "type": "string",
                "index_analyzer": "english",
                "search_analyzer": "english",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              }
            }
          },
          "german": {
            "type": "object",
            "properties": {
              "title": {
                "type": "string",
                "index_analyzer": "german",
                "search_analyzer": "german",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              },
              "text": {
                "type": "string",
                "index_analyzer": "german",
                "search_analyzer": "german",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              }
            }
          },
          "russian": {
            "type": "object",
            "properties": {
              "title": {
                "type": "string",
                "index_analyzer": "russian",
                "search_analyzer": "russian",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              },
              "text": {开发者_C百科
                "type": "string",
                "index_analyzer": "russian",
                "search_analyzer": "russian",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              }
            }
          },
          "italian": {
            "type": "object",
            "properties": {
              "title": {
                "type": "string",
                "index_analyzer": "italian",
                "search_analyzer": "italian",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              },
              "text": {
                "type": "string",
                "index_analyzer": "italian",
                "search_analyzer": "italian",
                "index": "analyzed",
                "term_vector": "with_positions_offsets",
                "store": "yes"
              }
            }
          }
        }
      }
    }
  }
}

When a particular user queries the index, we can take her culture from her settings, i.e. we know which analyzer to use. How can we formulate a query which will search only "title" and "text" fields in her own language (let's say, German) and use German analyzer to tokenize the search query?


I've simplified the example to use standard analyzer for 'English' and simple (no stopping) for 'French'. For document like this:

{
  id: "abc",
  locale_container: {
    english: {
      title: "abc to ABC",
      text: ""
    },
    french: {
      title: "def to DEF",
      text: ""
    }
  }
}

The following queries do the trick:

  • locale_container.english.title:abc -> returns the document
  • locale_container.french.title:def -> returns the document as well
  • locale_container.english.title:to -> doesn't return anything, since 'to' is a stopword
  • locale_container.french.title:to -> returns the document
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号