哪家建設網站長春網站建設平臺
1 prometheus的思想
所有告警都應該立刻處理掉,不應該存在長時間未解決的告警。所以具體的表現就是高頻的數據采集,和告警的自動恢復(默認5分鐘)
2 alertmanager API調用
使用如下命令即可手工制造告警,注意startsAt和endsAt時間為當前實際時間的UTC格式。
curl -H "Content-Type: application/json" -X POST -d '[{"labels":{"字段1": "值1", "字段2": "值2", "字段3": "值3"},"annotations":{"desc": "xxxx"},"generatorURL":"http://1.1.1.1","startsAt":"2022-08-10T20:57:46.000+08:00"}]' "http://127.0.0.1:9093/api/v2/alerts"
3 alertmanager告警json
alertmanager發(fā)送給receiver的為一個json,多條告警形成alerts數組,示例如下:
'{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"}], "groupLabels": {"字段1": "值1"}, "commonLabels": {"字段1": "值1", "字段2"}, "commonAnnotations": {"desc": "xxxx"}, "externalURL": "http://prometheus:9093", "version": "4", "truncatedAlerts": 0}'
告警恢復之后,對應的status字段會被置為resolved,只有alerts數組中所有告警都變?yōu)閞esolved狀態(tài),整條json的status才會置為resolved。
4 參數說明
- group_wait:當收到第一條告警時,延時該時間才進行發(fā)送,在此期間如果有其他告警被歸并到相同group下,則屆時會在json中一并發(fā)送給receiver。任何告警都會有此延時。
- group_interval:group_wait時間之后,每隔group_interval發(fā)送一次json給receiver
- repeat_interval:假如這個group沒有任何變化,那么經過repeat_interval才會發(fā)送給receiver
4.1 舉例
假設group_wait設置為30秒,group_interval設置為1分鐘,repeat_interval設置為10分鐘
- 10:00:00(t0)接收到第一條告警,10:00:20接收到第二條告警,則在10:00:30(t0+group_wait)會發(fā)送第一條json如下:
{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
- 10:00:40產生第三條告警,則在10:01:30(t0+group_wait+group_interval)會發(fā)送第二條json如下:
{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
- 在10:01:40第一條告警恢復了,則10:02:30(t0+group_wait+group_interval*2)發(fā)送第三條json如下:
{"receiver": "email", "status": "firing", "alerts": [{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
- 在10:02:40另外兩條告警也恢復了,則10:03:30(t0+group_wait+group_interval*3)發(fā)送第四條json如下:
{"receiver": "email", "status": "resolve", "alerts": [{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
假如10:00:30發(fā)送第一條json之后,2、3、4步驟都沒有發(fā)生,且告警一直沒有恢復,則10:10:30(t0+repeat_interval)會重復發(fā)送第一條json。