Elasticsearch聚合查询示例2

按不同条件统计数量Request:

curl -X POST \
  http://elasticsearch:9200/devslicejobs/task/_search \
  -H 'Accept: */*' \
  -H 'Accept-Encoding: gzip, deflate' \
  -H 'Authorization: Basic ZWxhc3RpYzpzbGljZWpvYnNfMTIz' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 565' \
  -H 'Content-Type: application/json' \
  -H 'Host: elasticsearch:9200' \
  -H 'cache-control: no-cache' \
  -d '{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "batchid": 4813
                    }
                }
               ]

        }
    },
    "aggs": {
        "tasks": {
            "filters": {
                "filters": {
                    "pending": {"bool": {"must": [{"term": {"status": 1}}]}},
                    "assign": {"bool": {"must": [{"term": {"status": 2}}]}},
                    "total": {"bool": {}}
                }
            }
        }
    }
}'

Response:

{
    "took": 29,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 101,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "tasks": {
            "buckets": {
                "assign": {
                    "doc_count": 12
                },
                "pending": {
                    "doc_count": 1
                },
                "total": {
                    "doc_count": 101
                }
            }
        }
    }
}

相关文档:https://www.elastic.co/guide/en/elasticsearch/reference/5.1/search-aggregations-bucket-filters-aggregation.html

Elasticsearch聚合查询示例1

Request:

curl -X POST \
  http://elasticsearch:9200/devorders/order/_search \
  -H 'Accept: */*' \
  -H 'Accept-Encoding: gzip, deflate' \
  -H 'Authorization: Basic ZWxhc3RpYzpzbGljZWpvYnNfMTIz' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 1333' \
  -H 'Content-Type: application/json' \
  -H 'Host: elasticsearch:9200' \
  -H 'cache-control: no-cache' \
  -d '{
    "size": 0,
    "sort": [
        {
            "orderid": {
                "order": "desc"
            }
        }
    ],
    "query": {
        "bool": {
            "must_not": [],
            "must": [
                {
                    "terms": {
                        "provinceid": [
                            "370"
                        ]
                    }
                },
                {
                    "term": {
                        "batchid": 4813
                    }
                },
                {
                    "terms": {
                        "status": [
                            1,
                            2,
                            3,
                            4,
                            40,
                            5
                        ]
                    }
                },
                {
                    "range": {
                        "origin_salary": {
                            "gte": 0
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "sum_by_salary": {
            "sum": {
                "field": "salary"
            }
        },
        "sum_by_origin": {
            "sum": {
                "field": "origin_salary"
            }
        }
    }
}'

Response:

{
    "took": 49,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 10,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "sum_by_origin": {
            "value": 0.0
        },
        "sum_by_salary": {
            "value": 115.0
        }
    }
}

另附上一个script示例的Request:

{
    "size": 0,
    "aggs": {
        "sum_by_salary": {
            "sum": {
                "script": {
                    "lang": "painless",
                    "inline": "doc['hirenum'].value * doc['salary'].value"
                }
            }
        }
    }
}

Elasticsearch – 查询和过滤上下文

一个查询子句的行为取决于它是处在查询上下文还是过滤上下文。

查询上下文

在查询上下文中的子句用于回答“这个文档有多能满足这个查询子句?”,在决定是否能满足
该查询子句之前,该查询子句都会生成一个_score来表示一个文档相对于其他文档有多
能满足该查询子句。

过滤上下文

在过滤上下文中,查询子句用于回答“这个文档能满足这个查询子句的条件吗?”,显然,
答案只有是或者不是。也不会为文档打分,过滤上下文大部分时间用于过滤结构化的数据
比如,

  • 一个时间是否在2015到2016年之间?
  • 一个状态位是否为”published”?

频繁使用的过滤上下文会被ES缓存,用于提升速度。
每当一个查询子句传递给了一个filter参数时,过滤上下文就会生效。比如在bool查询
中的filter参数或者must_not参数,constant_score查询或者filter聚合
中的filter参数

下面是一个查询示例:

GET /_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "title":   "Search"        }}, 
        { "match": { "content": "Elasticsearch" }}  
      ],
      "filter": [ 
        { "term":  { "status": "published" }}, 
        { "range": { "publish_date": { "gte": "2015-01-01" }}} 
      ]
    }
  }
}

Elasticsearch – 自定义解析器

解析器的构成:

  • 0个或者多个字符过滤器(character filters)
  • 1个分词器(tokenizer)
  • 0个或者多个词条过滤器(token filters)

设置实例:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type":      "custom",
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "Is this <b>déjà vu</b>?"
}

字符过滤器(character filters)

字符过滤器用于在一个字符串被传递给分词器之前,对该字符串进行处理。
比如可以将“2018年”转换为“二零一八年”,es有一些内置的字符过滤器,可以用于构建
自定义解析器

分词器(tokenizer)

分词器的作用是接收一个字符串,将其打断成多个token。分词器可用于构建自定义
解析器。ik插件提供两种分词器,分别为ik_smart、ik_max_word。

词条过滤器(token filters)

词条过滤器用于接收分词器得到的词条,将这些词条进行一些处理,比如转换词条的大小写
删除词条,增加词条等等。es内置有一些词条过滤器,可用于构建自定义解析器