We've been using SoftEther for about 10 years now. Originally with a single server on a single fiber line that never went down a single time in ~7 years apart from updates. About three years ago we expanded to a cluster with two fiber lines, two members, and a controller that does not accept connections itself. Ever since then, about once every 4 to 6 weeks, one or the other of the cluster members will get into a state where it stops accepting connections, but does not go down. The controller continues to assign connections to it and all of them connect, get an IP, and then disconnect immediately. Restarting the vpnserver process on the cluster member resolves the issue. Stopping the vpnserver process on that member will also allow the other member to get the connections successfully. There are no errors in the logs and the logs seem to indicate the client disconnected rather than being dropped by the server. We started preemptively restarting the process on the members on alternate weekends, but that actually made it worse. It would accelerate the odd behavior rather than resolve it. If we restart a member before the issue occurs, then it occurs within about 48 hours every time. If we wait until the issue occurs before restarting the member, then it is good again for 4 to 6 weeks. Restarting the machine vs just restarting the vpnserver process does not make a difference. There does appear to be a slow memory leak in vpnserver, but we do not come anywhere near running out of RAM. There are a lot of logs, but again, we come nowhere near running out of disk space. The cluster members are dedicated to vpnserver and do not run anything else. The members are physical machines, the controller is a VM. The controller was hosted on Citrix Hypervisor but is now on Proxmox. The members have dedicated 10Gb connections inside and outside. We are using RADIUS with MFA via Microsoft NPS server. 90% of the time everything works great, but when one of the cluster members gets stuck, nobody can connect. Existing connections on either member are not affected. As long as a user remains connected, their connection seems to continue functioning indefinitely even while the member is actively dropping new connections. We also have some other standalone SoftEther servers for specific purposes, all running in VMs, using the same settings apart from clustering, and they never have this issue. All servers are on the latest version. The OS on all servers is Ubuntu 22.04.5 LTS.
We've been fighting this issue for years now and it's generally low priority, but the downtime tends to occur at the most obnoxious possible times when it does occur. Any ideas would be appreciated. Thank you.
Cluster members stop accepting new connections
-
tlogiudice
- Posts: 1
- Joined: Thu Apr 28, 2022 8:24 pm
