prometheus部署
prometheus部署
alter部署:https://i4t.com/4197.html
钉钉部署: https://www.cnblogs.com/wangxu01/articles/11654836.html
1 prometheus部署
cd /data
wget https://s3-gz01.didistatic.com/n9e-pub/prome/prometheus-2.28.0.linux-amd64.tar.gz -O prometheus-2.28.0.linux-amd64.tar.gz
tar xf prometheus-2.28.0.linux-amd64.tar.gz
mv prometheus-2.28.0.linux-amd64.tar.gz /tmp/
mv prometheus-2.28.0.linux-amd64 prometheus
cd prometheus
1.1 prometheus修改配置文件
修改targets地址否则只能本机访问
cat /data/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus-98'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['0.0.0.0:9090']
1.2 添加开机启动
cat > /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/data/prometheus/prometheus --config.file=/data/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus/data --web.enable-lifecycle --enable-feature=remote-write-receiver --query.lookback-delta=2m
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable prometheus.service
systemctl restart prometheus.service
systemctl status prometheus.service
2 配置采集器
vim /data/prometheus/prometheus.yml #配置prometheus
- job_name: "node_exporter"
static_configs:
- targets: ["localhost:9100","192.168.124.134:9100"]
重启生效
systemctl restart prometheus.service
- 按照如上的格式配置,注意缩进。
- 其中job_name可以任意,
- targets填写node_exporter的地址。
- 修改好后保存退出。重启prometheus使配置生效
钉钉机器人添加
[TOC]
1 步骤一
打开机器人管理页面。以PC端为例,打开PC端钉钉,点击头像,选择“机器人管理”。
步骤二
在机器人管理页面选择“自定义”机器人,输入机器人名字并选择要发送消息的群,同时可以为机器人设置机器人头像。
步骤三
完成必要的安全设置(至少选择一种),勾选 我已阅读并同意《自定义机器人服务及免责条款》,点击“完成”。安全设置目前有3种方式,设置说明见下文介绍。
步骤四
完成安全设置后,复制出机器人的Webhook地址,可用于向这个群发送消息,格式如下:
https://oapi.dingtalk.com/robot/send?access_token=XXXXXX
日志文件监控发到钉钉机器人
[TOC]
git地址:https://github.com/51daticom/marx-agent
1 快速使用
1.1 编译方法一:
git clone https://github.com/51daticom/marx-agent.git
cd marx-agent
export GO111MODULE=on
export GOPROXY=https://goproxy.io
go build -o marx-agent
1.2 方法二:
go get -u github.com/51daticom/marx-agent
ls {GORPATH}/bin/marx-agent
1.3 配置
mv config.nginx.example.in config.ini;
vim config.in
[pro]
buf = 1
whiteList = ""
blackList = "\ 500\ ","\ 502\ ","\ 501\ " #监控报警的状态码(正则匹配)
log = /var/log/nginx/access.log #监控的日志文件路径
wxpush = https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key={{youkey}} #企业微信机器人webhook地址
dingpush = #钉钉机器人webhook地址
#/var/log/nginx/access.log format data such as:
#127.0.0.1 - - [21/Jul/2020:05:57:48 +0800] "GET /thinkphp/html/public/index.php HTTP/1.1" 500 47 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0;en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6)" "-" "0.001" "0.001
1.4 启动监控
./max-agent config.ini pro
监控指标 具体实现 举例
pod性能 cadvisor 容器的cpu、内存利用率
node性能 node-exporter node节点的cpu、内存利用率
k8s资源对象 kube-state-metrics pod/deployment/service
服务发现
从kubernetes的api中去发现抓取的目标,并始终与kubernetes集群状态保持一致,
动态的获取被抓取的目标,实时的从api中获取当前状态是否存在,
官方文档
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
自动发现支持的组件:
node-自动发现集群中的node节点
pod-自动发现运行的容器和端口
service-自动发现创建的serviceIP、端口
endpoints-自动发现pod中的容器
ingress-自动发现创建的访问入口和规则
使用prometheus监控k8s
在k8s中部署prometheus
nodes
http://192.168.0.39:9100/metrics
altertmanager
http://192.168.0.39:9093/#/alerts
grafana
http://192.168.0.39:3000/
rocketmq监控
[TOC]
shell脚本监控:http://t.zoukankan.com/duanxz-p-3890046.html
promethues监控:https://blog.csdn.net/sinat_14840559/article/details/119782996
nginx监控
[TOC]
- 感谢你赐予我前进的力量