之前应该提过,我们线上架构整体重新架设了,应用层面使用的是Spring Boot,前段日子因为一些第三方的原因,略有些匆忙的提前开始线上的内测了。然后运维发现了个问题,服务器的HTTPS端口有大量的CLOSE_WAIT:


  我的第一反应是Spring boot有Bug,因为这个项目分为HTTP和HTTPS两种服务以JAR的形式启动的,而HTTP的没有问题,同时,老架构的服务在Tomcat中以HTTPS提供服务也没有问题,我当时认为这大致上可以判断为Socket层面应该是没有问题的,于是我开始分析Spring Boot的代码。

  经过调试和分析(过程如果有机会,再整理一篇),虽然没有找到引起这个现象的原因,但是发现一个规律,所有出现问题的连接org.apache.tomcat.util.net.NioEndpoint的内部类SocketProcessor中doRun方法中,握手状态一直处于handshake == SelectionKey.OP_READ,监听一直不会关闭。

  虽然,到这一步看上去问题应该出现在Socket层面,但是我还是觉得应该是Spring Boot的,因为Spring Boot引用的Tomcat的处理这部分功能的代码虽然是内嵌的(tomcat-embed-core-8.5.4),但是和完整版并没有什么区别,而完整版是没有这个问题的。



  虽然我依然认为这是在甩锅,但是我并没有什么能证明这不是Tomcat问题的证据。于是我又看了看代码,试图证明一下 ,然而并没有找到。


The problem occurs for TLS connections when the connection is dropped after the socket has been accepted but before the handshake is complete. The socket ended up in a loop:
- timeout -> ERROR event
- process ERROR (this is the new bit from r1746551)
- try to finish handshake
- need more data from client
- register with poller for READ
- wait for timeout
- timeout ... ... and around you go.



exclude module: "spring-boot-starter-tomcat"


[group: 'org.springframework.boot', name: 'spring-boot-starter-jetty', version: '1.4.0.RELEASE'],



                        if (socket.isHandshakeComplete() || event == SocketEvent.STOP) {
handshake = 0;
} else {
handshake = socket.handshake(key.isReadable(), key.isWritable());
// The handshake process reads/writes from/to the
// socket. status may therefore be OPEN_WRITE once
// the handshake completes. However, the handshake
// happens when the socket is opened so the status
// must always be OPEN_READ after it completes. It
// is OK to always set this as it is only used if
// the handshake completes.
event = SocketEvent.OPEN_READ;


                        if (socket.isHandshakeComplete()) {
// No TLS handshaking required. Let the handler
// process this socket / event combination.
handshake = 0;
} else if (event == SocketEvent.STOP || event == SocketEvent.DISCONNECT ||
event == SocketEvent.ERROR) {
// Unable to complete the TLS handshake. Treat it as
// if the handshake failed.
handshake = -1;
} else {
handshake = socket.handshake(key.isReadable(), key.isWritable());
// The handshake process reads/writes from/to the
// socket. status may therefore be OPEN_WRITE once
// the handshake completes. However, the handshake
// happens when the socket is opened so the status
// must always be OPEN_READ after it completes. It
// is OK to always set this as it is only used if
// the handshake completes.
event = SocketEvent.OPEN_READ;







