脚本之家,脚本语言编程技术及教程分享平台!
分类导航

Python|VBS|Ruby|Lua|perl|VBA|Golang|PowerShell|Erlang|autoit|Dos|bat|

服务器之家 - 脚本之家 - Golang - go语言实现Elasticsearches批量修改查询及发送MQ操作示例

go语言实现Elasticsearches批量修改查询及发送MQ操作示例

2022-09-25 11:57Jeff的技术栈 Golang

这篇文章主要为大家介绍了go语言实现Elasticsearches批量修改查询及发送MQ操作示例,有需要的朋友可以借鉴参考下,希望能够有所帮助,祝大家多多进步,早日升职加薪

update_by_query批量修改

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
POST post-v1_1-2021.02,post-v1_1-2021.03,post-v1_1-2021.04/_update_by_query
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "join_field": {
              "value": "post"
            }
          }
        },
        {
          "term": {
            "platform": {
              "value": "toutiao"
            }
          }
        },
        {
          "exists": {
            "field": "liked_count"
          }
        }
      ]
    }
  },
  "script":{
    "source":"ctx._source.liked_count=0",
    "lang":"painless"
  }
}

索引添加字段

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
PUT user_tiktok/_doc/_mapping?include_type_name=true
{
  "post_signature":{
    "StuClass":{
      "type":"keyword"
    },
    "post_token":{
      "type":"keyword"
    }
  }
}
PUT user_toutiao/_mapping
{
  "properties": {
    "user_token": {
      "type": "text"
    }
  }
}

查询es发送MQ

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
from celery import Celery
from elasticsearch import Elasticsearch
import logging
import arrow
import pytz
from elasticsearch.helpers import scan, streaming_bulk
import redis
pool_16_8 = redis.ConnectionPool(host='10.0.3.100', port=6379, db=8, password='EfcHGSzKqg6cfzWq')
rds_16_8 = redis.StrictRedis(connection_pool=pool_16_8)
logger = logging.getLogger('elasticsearch')
logger.disabled = False
logger.setLevel(logging.INFO)
es_zoo_connection = Elasticsearch('http://eswriter:e s密码@e sip:4000', dead_timeout=10,
                                  retry_on_timeout=True)
logger = logging.getLogger(__name__)
class ES(object):
    index = None
    doc_type = None
    id_field = '_id'
    version = ''
    source_id_field = ''
    aliase_field = ''
    separator = '-'
    aliase_func = None
    es = None
    tz = pytz.timezone('Asia/Shanghai')
    logger = logger
    @classmethod
    def mget(cls, ids=None, index=None, **kwargs):
        index = index or cls.index
        docs = cls.es.mget(body={'ids': ids}, doc_type=cls.doc_type, index=index, **kwargs)
        return docs
    @classmethod
    def count(cls, query=None, index=None, **kwargs):
        index = index or cls.index
        c = cls.es.count(doc_type=cls.doc_type, body=query, index=index, **kwargs)
        return c.get('count', 0)
    @classmethod
    def upsert(cls, doc, doc_id=None, index=None, doc_as_upsert=True, **kwargs):
        body = {
            "doc": doc,
        }
        if doc_as_upsert:
            body['doc_as_upsert'] = True
        id = doc_id or cls.id_name(doc)
        index = index or cls.index_name(doc)
        cls.es.update(index, id, cls.doc_type, body, **kwargs)
    @classmethod
    def search(cls, index=None, query=None, **kwargs):
        index = index or cls.index
        return cls.es.search(index=index, body=query, **kwargs)
    @classmethod
    def scan(cls, query, index=None, **kwargs):
        return scan(cls.es,
                    query=query,
                    index=index or cls.index,
                    **kwargs)
    @classmethod
    def index_name(cls, doc):
        if cls.aliase_field and cls.aliase_field in doc.keys():
            aliase_part = doc[cls.aliase_field]
            if isinstance(aliase_part, str):
                aliase_part = arrow.get(aliase_part)
            if isinstance(aliase_part, int):
                aliase_part = arrow.get(aliase_part).astimezone(cls.tz)
            if cls.version:
                index = '{}{}{}{}{}'.format(cls.index, cls.separator, cls.version, cls.separator,
                                            cls.aliase_func(aliase_part))
            else:
                index = '{}{}{}'.format(cls.index, cls.separator, cls.aliase_func(aliase_part))
        else:
            index = cls.index
        return index
    @classmethod
    def id_name(cls, doc):
        id = doc.get(cls.id_field) and doc.pop(cls.id_field) or doc.get(cls.source_id_field)
        if not id:
            print('========', doc)
        assert id, 'doc _id must not be None'
        return id
    @classmethod
    def bulk_upsert(cls, docs, **kwargs):
        """
        批量操作文章, 仅支持 index 和 update
        """
        op_type = kwargs.get('op_type') or 'update'
        chunk_size = kwargs.get('chunk_size')
        if op_type == 'update':
            upsert = kwargs.get('upsert', True)
            if upsert is None:
                upsert = True
        else:
            upsert = False
        actions = cls._gen_bulk_actions(docs, cls.index_name, cls.doc_type, cls.id_name, op_type, upsert=upsert)
        result = streaming_bulk(cls.es, actions, chunk_size=chunk_size, raise_on_error=False, raise_on_exception=False,
                                max_retries=5, request_timeout=25)
        return result
    @classmethod
    def _gen_bulk_actions(cls, docs, index_name, doc_type, id_name, op_type, upsert=True, **kwargs):
        assert not upsert or (upsert and op_type == 'update'), 'upsert should use "update" as op_type'
        for doc in docs:
            # 支持 index_name 作为一个工厂函数
            if callable(index_name):
                index = index_name(doc)
            else:
                index = index_name
            if op_type == 'index':
                _source = doc
            elif op_type == 'update' and not upsert:
                _source = {'doc': doc}
            elif op_type == 'update' and upsert:
                _source = {'doc': doc, 'doc_as_upsert': True}
            else:
                continue
            if callable(id_name):
                id = id_name(doc)
            else:
                id = id_name
            # 生成 Bulk 动作
            action = {
                "_op_type": op_type,
                "_index": index,
                "_type": doc_type,
                "_id": id,
                "_source": _source
            }
            yield action
class tiktokEsUser(ES):
    index = 'user_tiktok'
    doc_type = '_doc'
    id_field = '_id'
    source_id_field = 'user_id'
    es = es_zoo_connection
from kombu import Exchange, Queue, binding
def data_es_route_task_spider(name, args, kwargs, options, task=None, **kw):
    return {
        'exchange': 'tiktok',
        'exchange_type': 'topic',
        'routing_key': name
    }
class DataEsConfig_download(object):
    broker_url = 'amqp://用户:密码@ip:端口/'
    task_ignore_result = True
    task_serializer = 'json'
    accept_content = ['json']
    task_default_queue = 'default'
    task_default_exchange = 'default'
    task_default_routing_key = 'default'
    exchange = Exchange('tiktok', type='topic')
    task_queues = [
        Queue(
            'tiktok.user_avatar.download',
            [binding(exchange, routing_key='tiktok.user_avatar.download')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post_avatar.download',
            [binding(exchange, routing_key='tiktok.post_avatar.download')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post.spider',
            [binding(exchange, routing_key='tiktok.post.spider')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post.save',
            [binding(exchange, routing_key='tiktok.post.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.user.save',
            [binding(exchange, routing_key='tiktok.user.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post_avatar.invalid',
            [binding(exchange, routing_key='tiktok.post_avatar.invalid')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.user_avatar.invalid',
            [binding(exchange, routing_key='tiktok.user_avatar.invalid')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.comment.save',
            [binding(exchange, routing_key='tiktok.comment.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
    ]
    task_routes = (data_es_route_task_spider,)
    enable_utc = True
    timezone = "Asia/Shanghai"
# 下载app
tiktok_app = Celery(
    'tiktok',
    include=[
        'task.tasks',
    ]
)
tiktok_app.config_from_object(DataEsConfig_download)
# 发任务生产者,更新舆情user历史信息
def send_post():
    query = {
        "query": {
            "bool": {
                "must": [
                    {
                        "exists": {
                            "field": "post_signature"
                        }
                    },
                    {
                        "range": {
                            "following_num": {
                                "gte": 1000
                            }
                        }
                    }
                ]
            }
        },
        "_source": ["region", "sec_uid", "post_signature"]
    }
    # query = {
    #     "query": {
    #         "bool": {
    #             "must": [
    #                 {"exists": {
    #                     "field": "post_signature"
    #                 }},
    #                 {
    #                     "match": {
    #                         "region": "MY"
    #                     }
    #                 }
    #             ]
    #         }
    #     },
    #     "_source": ["region", "sec_uid", "post_signature"]
    # }
    r = tiktokEsUser.scan(query=query, scroll='30m', request_timeout=100)
    for item in map(lambda x: x['_source'], r):
        tiktok_app.send_task('tiktok.post.spider', args=(item,))
def send_sign_token():
    query = {
        "query": {
            "bool": {
                "must": [
                    {
                        "exists": {
                            "field": "post_signature"
                        }
                    },
                    {
                        "range": {
                            "following_num": {
                                "gte": 1000
                            }
                        }
                    },
                    {
                        "range": {
                            "create_time": {
                                "gte": "2021-01-06T00:00:00",
                                "lte": "2021-01-06T01:00:00"
                            }
                        }
                    }
                ]
            }
        },
        "_source": ["user_id", "sec_uid"]
    }
    r = tiktokEsUser.scan(query=query, scroll='30m', request_timeout=100)
    for item in map(lambda x: x['_source'], r):
        tiktok_app.send_task('tiktok.user.sign_token', args=(item,))
if __name__ == '__main__':
    send_post()
    # send_sign_token()

以上就是go语言实现Elasticsearches批量修改查询及发送MQ操作示例的详细内容,更多关于go实现Elasticsearches修改查询发送MQ的资料请关注服务器之家其它相关文章!

原文链接:https://www.cnblogs.com/guyouyin123/p/14656798.html

延伸 · 阅读

精彩推荐
  • Golang详解Golang开启http服务的三种方式

    详解Golang开启http服务的三种方式

    这篇文章主要介绍了详解Golang开启http服务的三种方式,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们...

    L千年老妖35452020-07-21
  • GolangGo 加密解密算法小结

    Go 加密解密算法小结

    加密解密在实际开发中应用比较广泛,常见的加解密分为三种,本文就详细的介绍一下Go 加密解密算法,具有一定的参考价值,感兴趣的可以了解一下...

    无风的雨7432022-08-31
  • Golanggolang中命令行库cobra的使用方法示例

    golang中命令行库cobra的使用方法示例

    这篇文章主要给大家介绍了关于golang中命令行库cobra的使用方法,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需...

    wangbin5102020-05-18
  • Golanggolang gorm的Callbacks事务回滚对象操作示例

    golang gorm的Callbacks事务回滚对象操作示例

    这篇文章主要为大家介绍了golang gorm的Callbacks事务回滚对象操作示例,有需要的朋友可以借鉴参考下,希望能够有所帮助,祝大家多多进步早日升职加薪...

    Jeff的技术栈8492022-09-20
  • GolangGo语言中append函数用法分析

    Go语言中append函数用法分析

    这篇文章主要介绍了Go语言中append函数用法,对比使用append函数与不使用append函数的两个实例,详细分析了Go语言中append函数的功能,需要的朋友可以参考下 ...

    脚本之家3582020-04-13
  • Golang解析Go的Waitgroup和锁的问题

    解析Go的Waitgroup和锁的问题

    大家在学习go语言的时候,都知道go语言支持并发,使用 goroutine,使用关键字 go 即可,接下来通过本文给大家分享Go的Waitgroup和锁的问题,需要的朋友可以参...

    呦呦鹿鸣4912021-06-27
  • Golanggo实现for range迭代时修改值的操作

    go实现for range迭代时修改值的操作

    这篇文章主要介绍了go实现for range迭代时修改值的操作,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧...

    lein_wang8422021-05-30
  • GolangGo语言如何操纵Kafka保证无消息丢失

    Go语言如何操纵Kafka保证无消息丢失

    Kafka是由Apache软件基金会开发的一个开源流处理平台,由Scala和Java编写。该项目的目标是为处理实时数据提供一个统一、高吞吐、低延迟的平台。其持久化...

    Golang梦工厂4422021-09-14