prometheus部署

alter部署:https://i4t.com/4197.html
钉钉部署: https://www.cnblogs.com/wangxu01/articles/11654836.html

1 prometheus部署

cd /data
wget https://s3-gz01.didistatic.com/n9e-pub/prome/prometheus-2.28.0.linux-amd64.tar.gz -O prometheus-2.28.0.linux-amd64.tar.gz
tar xf prometheus-2.28.0.linux-amd64.tar.gz
mv prometheus-2.28.0.linux-amd64.tar.gz /tmp/
mv prometheus-2.28.0.linux-amd64 prometheus
cd prometheus

1.1 prometheus修改配置文件

修改targets地址否则只能本机访问

cat /data/prometheus/prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus-98'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['0.0.0.0:9090']

1.2 添加开机启动

cat > /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/data/prometheus/prometheus  --config.file=/data/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus/data --web.enable-lifecycle --enable-feature=remote-write-receiver --query.lookback-delta=2m
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable prometheus.service
systemctl restart prometheus.service
systemctl status prometheus.service

2 配置采集器

 vim /data/prometheus/prometheus.yml   #配置prometheus
  - job_name: "node_exporter"
    static_configs:
      - targets: ["localhost:9100","192.168.124.134:9100"]

重启生效

systemctl restart prometheus.service
  • 按照如上的格式配置,注意缩进。
  • 其中job_name可以任意,
  • targets填写node_exporter的地址。
  • 修改好后保存退出。重启prometheus使配置生效

钉钉机器人添加

[TOC]

1 步骤一

打开机器人管理页面。以PC端为例,打开PC端钉钉,点击头像,选择“机器人管理”。

步骤二

在机器人管理页面选择“自定义”机器人,输入机器人名字并选择要发送消息的群,同时可以为机器人设置机器人头像。

步骤三

完成必要的安全设置(至少选择一种),勾选 我已阅读并同意《自定义机器人服务及免责条款》,点击“完成”。安全设置目前有3种方式,设置说明见下文介绍。

步骤四

完成安全设置后,复制出机器人的Webhook地址,可用于向这个群发送消息,格式如下:

https://oapi.dingtalk.com/robot/send?access_token=XXXXXX

日志文件监控发到钉钉机器人

[TOC]

git地址:https://github.com/51daticom/marx-agent

1 快速使用

1.1 编译方法一:

git clone https://github.com/51daticom/marx-agent.git
cd marx-agent
export GO111MODULE=on
export GOPROXY=https://goproxy.io
go build -o marx-agent

1.2 方法二:

go get -u github.com/51daticom/marx-agent
ls {GORPATH}/bin/marx-agent

1.3 配置

mv config.nginx.example.in config.ini;
vim config.in
[pro]
buf = 1
whiteList = ""
blackList = "\ 500\ ","\ 502\ ","\ 501\ " #监控报警的状态码(正则匹配)
log = /var/log/nginx/access.log  #监控的日志文件路径
wxpush = https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key={{youkey}} #企业微信机器人webhook地址
dingpush =  #钉钉机器人webhook地址

#/var/log/nginx/access.log format data such as:
#127.0.0.1 - - [21/Jul/2020:05:57:48 +0800] "GET /thinkphp/html/public/index.php HTTP/1.1" 500 47 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0;en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6)" "-" "0.001" "0.001

1.4 启动监控

./max-agent config.ini pro
监控指标        具体实现            举例
pod性能           cadvisor            容器的cpu、内存利用率
node性能      node-exporter       node节点的cpu、内存利用率
k8s资源对象     kube-state-metrics  pod/deployment/service

服务发现
从kubernetes的api中去发现抓取的目标,并始终与kubernetes集群状态保持一致,
动态的获取被抓取的目标,实时的从api中获取当前状态是否存在,

官方文档
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

自动发现支持的组件:
node-自动发现集群中的node节点
pod-自动发现运行的容器和端口
service-自动发现创建的serviceIP、端口
endpoints-自动发现pod中的容器
ingress-自动发现创建的访问入口和规则
使用prometheus监控k8s
在k8s中部署prometheus

nodes
http://192.168.0.39:9100/metrics

altertmanager
http://192.168.0.39:9093/#/alerts

grafana
http://192.168.0.39:3000/

rocketmq监控

[TOC]

shell脚本监控:http://t.zoukankan.com/duanxz-p-3890046.html
promethues监控:https://blog.csdn.net/sinat_14840559/article/details/119782996

nginx监控

[TOC]

https://blog.csdn.net/hanjinjuan/article/details/119733953