前言

为了及时收到告警消息,考虑了几种方案,最终选择钉钉。方案有几种:

  • 电话
  • 短信
  • 企业微信
  • 钉钉
  • slack

电话、短信需要对接相关api,并且需要收费。企业微信需要认证,手续麻烦了。slack国内不方便,钉钉对接也比较简单,有比较现成的解决方案。我们运维团队人不多。已经足够了。作为即时IM通信软件,时效性也能得到保障。

安装步骤

安装钉钉

安装钉钉,建立钉钉群,添加自定义机器人,通过限制ip地址请求机器人,填写公网的ip地址,改地址为SNAT的公网ip,得到一个类似于这样的:https://oapi.dingtalk.com/robot/send?access_token=c7bxxxxa42f680cxxxxxc3032 地址。

安装dingtalk的webhook服务

由于钉钉的webhook是有格式的,所以我们采用timonwong/prometheus-webhook-dingtalk:v1.4.0这个webhook服务来接受alert manager的告警信息,然后将告警信息转换成钉钉所能识别的webhook的格式。以下为openshift的安装模板:

apiVersion: v1
kind: Template
metadata:
name: dingtalk-template
annotations:
  description: dingtalk-alert
parameters:
- name: NAMESPACE
  value: monitoring
- name: DING_TALK_URL
  value: https://oapi.dingtalk.com/robot/send?access_token=c7b19xxxf5370270b10d255934c3032
objects:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    namespace: ${NAMESPACE}
    annotations:
      deployment.kubernetes.io/revision: '1'
    labels:
      app: webhook-dingtalk
      prometheus: ipaas
  name: webhook-dingtalk
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 3
    selector:
      matchLabels:
      app: webhook-dingtalk
    strategy:
      rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
      creationTimestamp: null
      labels:
          app: webhook-dingtalk
          prometheus: ipaas
      spec:
      containers:
      - args:
        - '--web.listen-address=0.0.0.0:8060'
        - '--web.enable-ui'
        - '--config.file=/etc/prometheus-webhook-dingtalk/config.yml'
      env:
        - name: TZ
        value: CST-8
      image: 'timonwong/prometheus-webhook-dingtalk:v1.4.0'
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 3
        initialDelaySeconds: 30
        periodSeconds: 10
        successThreshold: 1
        tcpSocket:
        port: 8060
        timeoutSeconds: 1
      name: webhook-dingtalk
      ports:
        - containerPort: 8060
        name: tcp-8060
        protocol: TCP
      readinessProbe:
        failureThreshold: 3
        initialDelaySeconds: 30
        periodSeconds: 10
        successThreshold: 1
        tcpSocket:
        port: 8060
        timeoutSeconds: 1
      resources:
        limits:
        cpu: 500m
        memory: 500Mi
        requests:
        cpu: 100m
        memory: 100Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /etc/prometheus-webhook-dingtalk/config.yml
        name: webhook-dingtalk
        subPath: config.yml
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
          - configMap:
              defaultMode: 420
              name: webhook-dingtalk
          name: webhook-dingtalk
- apiVersion: v1
  data:
    config.yml: |
      ## Request timeout
      # timeout: 5s

      ## Customizable templates path
      # templates:
      #   - contrib/templates/legacy/template.tmpl

      ## You can also override default template using `default_message`
      ## The following example to use the 'legacy' template from v0.3.0
      # default_message:
      #   title: '{{ template "legacy.title" . }}'
      #   text: '{{ template "legacy.content" . }}'

      ## Targets, previously was known as "profiles"
      targets:
        webhook1:
          url: ${DING_TALK_URL}
          # secret for signature
          secret: SEC000000000000000000000
        webhook2:
          url: ${DING_TALK_URL}
        webhook_legacy:
          url: ${DING_TALK_URL}
          # Customize template content
          message:
            # Use legacy template
            title: '{{ template "legacy.title" . }}'
            text: '{{ template "legacy.content" . }}'
        webhook_mention_all:
          url: ${DING_TALK_URL}
          mention:
            all: true
        webhook_mention_users:
          url: ${DING_TALK_URL}
          mention:
            mobiles: ['133xxxxx195']
  kind: ConfigMap
  metadata:
    namespace: ${NAMESPACE}
    labels:
      app: webhook-dingtalk
    name: webhook-dingtalk

修改alertmanager.yaml

经过以上的步骤就已经搭建好webhook服务了,接下来需要把aler manager中的receivers和route配置一下即可,参考:

global:
  resolve_timeout: 5m
  smtp_require_tls: false
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 30m
  receiver: dingtalk
receivers:
- name: dingtalk
  webhook_configs:
  - send_resolved: true
    url: http://webhook-dingtalk.monitoring.svc:8060/dingtalk/webhook1/send

 目录


买个卤蛋,吃根冰棒