前言
为了及时收到告警消息,考虑了几种方案,最终选择钉钉。方案有几种:
- 电话
- 短信
- 企业微信
- 钉钉
- slack
电话、短信需要对接相关api,并且需要收费。企业微信需要认证,手续麻烦了。slack国内不方便,钉钉对接也比较简单,有比较现成的解决方案。我们运维团队人不多。已经足够了。作为即时IM通信软件,时效性也能得到保障。
安装步骤
安装钉钉
安装钉钉,建立钉钉群,添加自定义机器人,通过限制ip地址请求机器人,填写公网的ip地址,改地址为SNAT的公网ip,得到一个类似于这样的:https://oapi.dingtalk.com/robot/send?access_token=c7bxxxxa42f680cxxxxxc3032
地址。
安装dingtalk的webhook服务
由于钉钉的webhook是有格式的,所以我们采用timonwong/prometheus-webhook-dingtalk:v1.4.0
这个webhook服务来接受alert manager的告警信息,然后将告警信息转换成钉钉所能识别的webhook的格式。以下为openshift的安装模板:
apiVersion: v1
kind: Template
metadata:
name: dingtalk-template
annotations:
description: dingtalk-alert
parameters:
- name: NAMESPACE
value: monitoring
- name: DING_TALK_URL
value: https://oapi.dingtalk.com/robot/send?access_token=c7b19xxxf5370270b10d255934c3032
objects:
- apiVersion: apps/v1
kind: Deployment
metadata:
namespace: ${NAMESPACE}
annotations:
deployment.kubernetes.io/revision: '1'
labels:
app: webhook-dingtalk
prometheus: ipaas
name: webhook-dingtalk
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 3
selector:
matchLabels:
app: webhook-dingtalk
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: webhook-dingtalk
prometheus: ipaas
spec:
containers:
- args:
- '--web.listen-address=0.0.0.0:8060'
- '--web.enable-ui'
- '--config.file=/etc/prometheus-webhook-dingtalk/config.yml'
env:
- name: TZ
value: CST-8
image: 'timonwong/prometheus-webhook-dingtalk:v1.4.0'
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 8060
timeoutSeconds: 1
name: webhook-dingtalk
ports:
- containerPort: 8060
name: tcp-8060
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 8060
timeoutSeconds: 1
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/prometheus-webhook-dingtalk/config.yml
name: webhook-dingtalk
subPath: config.yml
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: webhook-dingtalk
name: webhook-dingtalk
- apiVersion: v1
data:
config.yml: |
## Request timeout
# timeout: 5s
## Customizable templates path
# templates:
# - contrib/templates/legacy/template.tmpl
## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
# default_message:
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'
## Targets, previously was known as "profiles"
targets:
webhook1:
url: ${DING_TALK_URL}
# secret for signature
secret: SEC000000000000000000000
webhook2:
url: ${DING_TALK_URL}
webhook_legacy:
url: ${DING_TALK_URL}
# Customize template content
message:
# Use legacy template
title: '{{ template "legacy.title" . }}'
text: '{{ template "legacy.content" . }}'
webhook_mention_all:
url: ${DING_TALK_URL}
mention:
all: true
webhook_mention_users:
url: ${DING_TALK_URL}
mention:
mobiles: ['133xxxxx195']
kind: ConfigMap
metadata:
namespace: ${NAMESPACE}
labels:
app: webhook-dingtalk
name: webhook-dingtalk
修改alertmanager.yaml
经过以上的步骤就已经搭建好webhook服务了,接下来需要把aler manager中的receivers和route配置一下即可,参考:
global:
resolve_timeout: 5m
smtp_require_tls: false
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 30m
receiver: dingtalk
receivers:
- name: dingtalk
webhook_configs:
- send_resolved: true
url: http://webhook-dingtalk.monitoring.svc:8060/dingtalk/webhook1/send
本博客所有文章除特别声明外,均采用: 署名-非商业性使用-禁止演绎 4.0 国际协议,转载请保留原文链接及作者。