1. 环境
- Nginx
- PHP-FPM
2. 背景
线上环境偶尔会出现请求 504 Gateway Time-out 的情况。初步排查 timeout 和Nginx和PHP-FPM配置有关,以下为线上服务器可能相关配置以及具体数值(单位都是:秒):
2.1 Nginx
- fastcgi_connect_timeout 3000;
Defines a timeout for establishing a connection with a FastCGI server. It should be noted that this timeout cannot usually exceed 75 seconds. - fastcgi_send_timeout 3000;
Sets a timeout for transmitting a request to the FastCGI server. - fastcgi_read_timeout 3000;
Defines a timeout for reading a response from the FastCGI server.
2.2 PHP-FPM
- request_terminate_timeout = 100
The timeout for serving a single request after which the worker process will be killed. - max_execution_time = 300
Maximum execution time of each script, in seconds. - 二者功能很相似,区别:
Apparently the’re both doing the same thing at different levels. max_execution_time is honored by PHP itself and request_terminate_timeout is handled by the FPM process control mechanism. So whichever is set to the lowest value will kick in first. Also Apache has the idle-timeout parameter that it observes and will give up on the PHP process after that time.
3. 排查过程
3.1 测试文件
1
2
3
4
5
6
// sleep 90秒后,打印PHP相关信息。90秒小于上述配置,理论上请求不应该超时
sleep(90);
var_dump(phpinfo());3.2 模拟正常请求,通过域名请求测试文件,请求在大概30秒或60秒后,返回 504 Gateway Time-out,必现。
30、60秒也都小于配置。为什么会超时?3.3 查看PHP日志
1
[01-Feb-2020 23:43:31] WARNING: [pool www] child 6639, script '/www/wwwroot/youxuan/core/web/test.php' (request: "GET /web/test.php") executing too slow (36.118452 sec), logging
只有warning,没有error,PHP-FPM没有报错。为什么30s就有一个warning,原来是PHP-FPM的配置:
- request_slowlog_timeout = 30
暂时排除 PHP-FPM的问题
- 3.3 查看Nginx日志
1
[01/Feb/2020:21:08:17 +0800] "GET /web/test.php HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36" 114.247.177.131 "50.875" "50.875"
Nginx 日志返回499,而不是502,这就有问题了。
3.4 Nginx http code 499
nginx检查到客户端已断开连接,则报499 code 。(注:其他情况如用户主动关闭浏览器等)
499:这说明是浏览器端主动断开请求。但是我浏览器没主动关闭,为啥报499。
注:如何区分一个请求的服务端和客户端。发起请求的就叫客户端,接收请求的就叫服务端。3.5 柳暗花明
经高人指点让我直接用IP访问,不用域名访问。果然用IP直接访问不会超时,用域名就会超时,必现。目前锁定是域名的问题。3.6 排查域名
ping 域名,结果返回的不是云主机的地址。因为之前买了CDN的服务,云主机挂在了CDN的服务的后面。猜测可能和CDN服务有关。3.7 CDN服务排查
查找CDN服务,找到真凶。CDN有个回源超时 30秒:3.8 将回源超时调大,果然域名访问也不超时了。