貴陽網(wǎng)站建設專家it培訓學校
記錄一次Go HTTP Client TIME_WAIT的優(yōu)化
業(yè)務流程
分析
通過容器監(jiān)控發(fā)現(xiàn)服務到事件總線的負載均衡之間有大量的短鏈接,回看一下代碼
發(fā)送請求的代碼
func SendToKEvent(ev *KEvent) error {data, err := json.Marshal(ev.Data)if err != nil {return err}log.Println(string(data))if !sendEvent {log.Println("------ SEND_EVENT IS DISABLED ------")return nil}defer util.TimeCost("SendToKEvent")()body := bytes.NewReader(data)ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)defer cancel()req, err := http.NewRequest(http.MethodPost, ev.Url, body)if err != nil {return err}req.WithContext(ctx)req.Header.Set("Content-Type", "application/json; charset=utf-8")req.Header.Set("......", "......")for k, v := range ev.ExtMap {req.Header.Set(k, v)}resp, err := httpc.HttpClient.Do(req)if err != nil {return err}defer resp.Body.Close()// 事件總線 2xx 均為正常if resp.StatusCode >= 300 || resp.StatusCode < 200 {return fmt.Errorf("req failed, resp=%v", resp)}return nil
}
http client的代碼
var (HttpClient = &http.Client{Transport: &http.Transport{Proxy: http.ProxyFromEnvironment,DialContext: func(ctx context.Context, network, addr string) (conn net.Conn, e error) {return (&net.Dialer{Timeout: 10 * time.Second,KeepAlive: 90 * time.Second,}).DialContext(ctx, network, addr)},ForceAttemptHTTP2: true,TLSHandshakeTimeout: 5 * time.Second,ResponseHeaderTimeout: 30 * time.Second,MaxIdleConnsPerHost: 10,IdleConnTimeout: 90 * time.Second,ExpectContinueTimeout: 1 * time.Second,},}
)
代碼看起來沒啥問題,但想到了之前處理過Golang ES client的一個問題
https://jiankunking.com/tcp-state-diagram.html
看下上文中TIME_WAIT
部分,發(fā)現(xiàn)還真是
https://pkg.go.dev/net/http#Response
// The http Client and Transport guarantee that Body is always
// non-nil, even on responses without a body or responses with
// a zero-length body. It is the caller's responsibility to
// close Body. The default HTTP client's Transport may not
// reuse HTTP/1.x "keep-alive" TCP connections if the Body is
// not read to completion and closed.
調(diào)整代碼
func SendToKEvent(ev *KEvent) error {......resp, err := httpc.HttpClient.Do(req)if err != nil {return err}defer resp.Body.Close()io.Copy(ioutil.Discard, resp.Body) // <-- 添加這一行......return nil
}
重新部署后,發(fā)現(xiàn)TIME_WAIT
的鏈接少了很多,但還是有10幾個
bash-5.0# netstat -anp |grep TIME
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:10964 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:45738 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:21178 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:37354 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:10966 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:37352 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:61524 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:61526 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:21180 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:33256 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:45736 TIME_WAIT -
tcp 0 0 ::ffff:172.16.3.247:8080 ::ffff:10.200.76.64:33254 TIME_WAIT -
bash-5.0#
這里需要注意一下
- 172.16.3.247是服務POD的ip
- 10.200.76.64是POD所在宿主機的ip
也就是說POD跟宿主機之間有短鏈接,那這幾個短鏈接是在做啥呢?
抓包看下
著重看一下No 502這一行
Frame 502: 177 bytes on wire (1416 bits), 177 bytes captured (1416 bits)
Ethernet II, Src: ee:ee:ee:ee:ee:ee (ee:ee:ee:ee:ee:ee), Dst: b6:cd:6a:f8:69:5e (b6:cd:6a:f8:69:5e)
Internet Protocol Version 4, Src: 10.200.76.64, Dst: 172.16.3.247
Transmission Control Protocol, Src Port: 29978, Dst Port: 8080, Seq: 1, Ack: 1, Len: 111
Hypertext Transfer ProtocolGET /healthz HTTP/1.1\r\n <-- 注意這一行,這個接口是服務配置的存活檢查接口Host: 172.16.3.247:8080\r\nUser-Agent: kube-probe/1.21\r\nAccept: */*\r\nConnection: close\r\n <-- 注意這一行\(zhòng)r\n[Response in frame: 506][Full request URI: http://172.16.3.247:8080/healthz]
Connection
Connection: keep-alive
當一個網(wǎng)頁打開完成后,客戶端和服務器之間用于傳輸HTTP數(shù)據(jù)的TCP連接不會關(guān)閉,如果客戶端再次訪問這個服務器上的網(wǎng)頁,會繼續(xù)使用這一條已經(jīng)建立的連接Connection: close
代表一個Request完成后,客戶端和服務器之間用于傳輸HTTP數(shù)據(jù)的TCP連接會關(guān)閉, 當客戶端再次發(fā)送Request,需要重新建立TCP連接。
從Connection
的注釋可以看出當請求header中帶有Connection: keep-alive
表明該請求是會是一個短鏈接。
看下服務的Deployment的配置
livenessProbe:failureThreshold: 3httpGet:path: /healthzport: 8080scheme: HTTPperiodSeconds: 10successThreshold: 1timeoutSeconds: 1
到這里問題都可以解釋的通了,Kubernetes會每10秒請求一次服務的存活檢查的接口,每一次都是短鏈接,而TIME_WAIT
的默認值是120s。
那服務TIME_WAIT
的鏈接應該會一直保持在11-13個左右。
到這里所有的問題都就可以解釋了。
結(jié)論
- Go HTTP Client請求完了,即使業(yè)務不關(guān)注響應的Body,還是要在代碼中read一下body。
- 只要服務配置了存活檢查就會有短鏈接,短鏈接的數(shù)據(jù)取決于檢查間隔時間的配置。