On September 29th, 2025, a major incident occurred affecting the AP-Auth service. Access Points (APs) across multiple sites experienced disconnections and instability. The incident was detected due to a sudden surge in incoming network traffic, which led to saturation of the AP-Auth service and widespread connectivity issues. The support and engineering teams were alerted and began investigating the root cause.
The incident was triggered by an unexpected and sustained spike in incoming traffic to the load balancer, with traffic volume increasing by a factor of seven compared to the previous month. This network overload saturated the AP-Auth service’s TCP connection pool, resulting in mass AP disconnections while trying to renew their jwt tokens.
Further investigation revealed that the load exceeded the configured limits, which caused exhaustion of resources. There was no evidence of database or CPU bottlenecks.
Upon detection of the incident, the engineering team took several immediate actions:
These actions stabilized the service and eventually allowed APs to progressively reconnect.
To prevent similar incidents, the following corrective actions have been implemented:
In addition, the recent migration on November 18th transitioned the ap-auth to a new generation server that offers enhanced scalability and stability, allowing for better monitoring and higher load handling.
We sincerely appreciate your patience and continued support.