AP authentication issue

Incident Report for VUSION Cloud

Postmortem

What happened?

On September 29th, 2025, a major incident occurred affecting the AP-Auth service. Access Points (APs) across multiple sites experienced disconnections and instability. The incident was detected due to a sudden surge in incoming network traffic, which led to saturation of the AP-Auth service and widespread connectivity issues. The support and engineering teams were alerted and began investigating the root cause.

What went wrong, and why?

The incident was triggered by an unexpected and sustained spike in incoming traffic to the load balancer, with traffic volume increasing by a factor of seven compared to the previous month. This network overload saturated the AP-Auth service’s TCP connection pool, resulting in mass AP disconnections while trying to renew their jwt tokens.

Further investigation revealed that the load exceeded the configured limits, which caused exhaustion of resources. There was no evidence of database or CPU bottlenecks.

How did we respond?

Upon detection of the incident, the engineering team took several immediate actions:

  • Increased system limits for TCP thread pool size and open files.
  • Implemented automated restarts of the AP-Auth service at regular intervals to restore partial connectivity.
  • Enhanced real-time monitoring of TCP connection states and log analytics.

These actions stabilized the service and eventually allowed APs to progressively reconnect.

How are we preventing this in the future?

To prevent similar incidents, the following corrective actions have been implemented:

  • System parameters have been reviewed and increased to handle higher loads.
  • Real-time dashboards and alerts have been improved to monitor traffic and resource usage, enabling earlier detection of anomalies.

In addition, the recent migration on November 18th transitioned the ap-auth to a new generation server that offers enhanced scalability and stability, allowing for better monitoring and higher load handling.

We sincerely appreciate your patience and continued support.

Posted Nov 19, 2025 - 15:50 UTC

Resolved

Our metrics show that the incident has been resolved and AP can connect as expected.

On behalf of VUSION Group, we would like to thank you for your understanding.
Posted Sep 29, 2025 - 18:08 UTC

Monitoring

Dear customers,

A workaround has been implemented and our metrics show that Access-Points can successfully connect. We will be monitoring closely that all services are restored and will notify of any changes.

Thank you for your understanding.
Posted Sep 29, 2025 - 16:24 UTC

Update

Dear customers,

Issue is still on-going. Our engineers are working to restore services.
We will notify of any change.

Thank you for your understanding.
Posted Sep 29, 2025 - 14:59 UTC

Identified

Dear customers,

Our Access-Points authentication server is experiencing issues. Customers may experience difficulties configuring access-points and AP requiring tokens may not receive them in due time, causing them to be offline.

We will notify you as it progresses.

Thank you for your understanding.
Posted Sep 29, 2025 - 11:55 UTC
This incident affected: Europe (VUSION Cloud API - Europe) and Americas (VUSION Cloud API - Americas).