elasticsearch – 一朝入尘世，恍惚逝百年

这些天架了一个ES集群，用来分析积累了海量日志数据。我用sphinx写了一个简单文档，发布在内网。下面是复制过来的部分内容，不幸的是很多格式信息都丢失了。做了点简单修改，去掉了敏感信息。

///////////////////////////////////////////////////////////

本示例仅示范最为基础的应用，协助读者调通最简单的查询。详细灵活的使用方法非本文档能够覆盖，需参见官方详细文档或咨询你常用的搜索引擎。

使用curl命令行工具
使用es官方python库
1. RESTful接口
2. 官方python库

使用curl命令行工具

NOTICE:

[*略去较为敏感的鉴权部分*]

在访问日志中检索一个keyword，无论keyword出现在哪个字段中。如下示例：所有文档，superuser出现在其中任意字段中都会被召回。注意酌情更换URI中的参数。实际上，把_all替换成user_name，则只在user_name字段中检索。这里使用的是lucene语法，参见：https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

curl https://endpoint/_search?q=_all:superuser

使用ES的DSL语法，在访问日志中检索出现在指定字段中的keyword。如下示例：superuser出现在user_name字段中。

curl https://endpoint/_search?pretty -H "Content-Type: application/json" -X POST -d '
{
   "query":{
       "bool":{
             "filter":[
                 { "term":{ "user_name": "superuser" } }
             ]
       }
   }
}'

上述检索方法使用了ES中的filter语法，filter是较快的一种查询方式。如果需要检索的字段需要分词，则只能使用 match语法，如下：

curl ./kamus.key https://endpoint/_search?pretty -H "Content-Type: application/json" -X POST -d '
{
   "query":{
       "match":{
             {"user_name": "superuser"}
       }
   }
}'

为检索增加时间限制如需要指定时间区间：

curl "https://endpoint/_search?pretty" -d '
{
  "query":{
       "bool":{
               "must":{
                    "match":{
                                "user_name":"superuser"
                            }
               },
               "filter":{
                     "range":{
                          "req_time":{
                                        "gte":"2017-01-01 00:00:00",
                                        "lt":"2017-08-08 00:00:00"
                                     }
                       }
                }
        }
  }
} '

分页 ES的分页使用”from”和”size”参数。例如，每页10条，返回第1页。

curl "https://endpoint/_search?pretty"  -d '
{
     "from": 0,
     "size": 10,
     "query": {
         "term": {
              "user_name": "superuser"
         }
     }
}
'

由于ES的实现机制，ES并不鼓励使用分页，在size较大且from的页码也较大时，ES可能无法及时响应。

如果需要返回所有命中文档如果命中文档超过一千，用上述方法不容易拿到所有结果。需要使用 scroll and scan 方法。参见 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

其他高级应用如bool操作，以及复杂灵活的查询条件组合请参加官网文档。

使用es官方python库

懒得打字了，看官方文档吧： https://elasticsearch-py.readthedocs.io/

Tag: elasticsearch

访问ES最简单示例

使用curl命令行工具

使用es官方python库