Spring Boot 환경에서 "Too many open files" Exception 발생되는 현상

Created 2022-02-28, Last modified 2022-03-01

Problem

Spring Boot 프레임워크를 이용한 서비스 운영 시, 아래와 같은 예외 로그가 확인됐다.

2022-02-28 08:24:33.103 [ERROR] [scheduling-1|util.WebClientUtil] Exception thrown while GET Request : (${주소}) io.netty.channel.ChannelException -  
io.netty.channel.ChannelException: io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
        at io.netty.channel.unix.Socket.newSocketStream0(Socket.java:421) ~[netty-transport-native-unix-common-4.1.50.Final.jar!/:4.1.50.Final]
        at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:319) ~[netty-transport-native-epoll-4.1.50.Final-linux-x86_64.jar!/:4.1.50.Final]
        at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:323) ~[netty-transport-native-epoll-4.1.50.Final-linux-x86_64.jar!/:4.1.50.Final]
        at io.netty.channel.epoll.EpollSocketChannel.<init>(EpollSocketChannel.java:45) ~[netty-transport-native-epoll-4.1.50.Final-linux-x86_64.jar!/:4.1.50.Final]
        at reactor.netty.resources.DefaultLoopEpoll.getChannel(DefaultLoopEpoll.java:45) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:187) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:169) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpResources.onChannel(TcpResources.java:215) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.http.client.HttpClientConnect$HttpTcpClient.connect(HttpClientConnect.java:141) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpClientDoOn.connect(TcpClientDoOn.java:60) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.tcp.TcpClient.connect(TcpClient.java:202) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.http.client.HttpClientFinalizer.connect(HttpClientFinalizer.java:77) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at reactor.netty.http.client.HttpClientFinalizer.responseConnection(HttpClientFinalizer.java:94) ~[reactor-netty-0.9.8.RELEASE.jar!/:0.9.8.RELEASE]
        at org.springframework.http.client.reactive.ReactorClientHttpConnector.connect(ReactorClientHttpConnector.java:111) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
        at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.exchange(ExchangeFunctions.java:104) ~[spring-webflux-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
        at org.springframework.web.reactive.function.client.DefaultWebClient$DefaultRequestBodyUriSpec.lambda$exchange$0(DefaultWebClient.java:338) ~[spring-webflux-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
        at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44) ~[reactor-core-3.3.6.RELEASE.jar!/:3.3.6.RELEASE]
        at reactor.core.publisher.Mono.subscribe(Mono.java:4219) ~[reactor-core-3.3.6.RELEASE.jar!/:3.3.6.RELEASE]
        at reactor.core.publisher.Mono.block(Mono.java:1678) ~[reactor-core-3.3.6.RELEASE.jar!/:3.3.6.RELEASE]
        (...생략...)
        at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_242]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
        at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) [spring-context-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_242]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_242]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
        Suppressed: java.lang.Exception: #block terminated with an error
                at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99) ~[reactor-core-3.3.6.RELEASE.jar!/:3.3.6.RELEASE]
                at reactor.core.publisher.Mono.block(Mono.java:1679) ~[reactor-core-3.3.6.RELEASE.jar!/:3.3.6.RELEASE]
                (...생략...)
                at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) ~[?:?]
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_242]
                at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
                at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) [spring-context-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
                at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242]
                at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_242]
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_242]
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_242]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
                at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files

아래는 위 프로세스가 사용 중인 file descriptor를 조회한 것으로 TCP socket과 관련된 것이 65,500개가 되는 것을 알 수 있다.

$ lsof -p ${PID} | wc
  65576  655682  5903280
$ lsof -p ${PID} | grep TCP | wc
  65499  654990  5895033

결론: Spring Boot 프레임워크 프로세스에서 file descriptor를 모두 사용(최대 65,535개)하고 있는 상태에서 추가적으로 사용하려고 할 때 예외가 발생한 것으로 확인되며, 실제 file descriptor 목록을 조회시 TCP socket과 관련된 file descriptor만 65,000개가 넘었다.

Analysis

Grafana에서 위 프로세스의 file descirptor 관련 panel 조회 결과는 아래와 같다.

  • metric: process_files_open_files

  • type: counter

위 panel을 통해 알 수 있듯이 해당 프로세스에서 02/22 ~ 02/28일 까지 지속적으로 개수가 증가한 것을 확인할 수 있으며, 초 당 증가율(increase rate per second)은 0.1 ~ 0.15, 분 당 증가율(increase rate per minute)은 60을 곱한 6 ~ 7.5임을 알 수 있다. 실제로 위 프로세스는 주기적으로 특정 URL과 통신을하며 이 주기가 10초다.

결론: 프로세스가 주기적으로 HTTP 통신하는 부분에서 사용할 TCP socket 관련 file descriptor에 대한 누수(leak)이 발생한 것으로 추측된다.

Resolution

위 stack trace에서 알 수 있듯이 io.netty.channel.unix.Socket.newSocketStream0(Socket.java:421)에서 io.netty.channel.ChannelException가 발생한 것으로 이를 토대로 구글링 시 spring-boot, reactor-netty GitHub issue를 찾을 수 있었다.

이는 reactor-netty 0.9.8.RELEASE 버전에서 발생하는 이슈로 channel 클래스에 대한 정보를 얻으려고 할 때 newSocketStream()이 호출되어 발생하는 이슈라고 한다(실제 위 프로세스에서 사용하는 버전과 동일한 것을 확인했다). 해당 이슈 해결에 대한 pull request는 아래와 같다.

결론: reactor-netty 0.9.9.RELEASE 버전에서 해당 이슈는 해결된 것으로 보이며, 실제 프로세스에서도 해당 library 버전으로 변경해 socket 누수가 발생하지 않은 것을 확인할 수 있었다.

Etc

Last updated