[Europe] Offline access-points for a subset of customers
Incident Report for VUSION Cloud
Postmortem

On November 12th, 2023, a subset of customers experienced disruption to their Access Points (APs), which lost connection to the V:Cloud and could be seen as offline status on VUSION Manager. 

Disruption was caused by an internal error on the AP management module which blocked the service from running properly. Cloud engineers were able to identify an new workaround, and all APs went back online quickly thereafter.

We have taken the following steps to reduce the likelihood and impact of similar incidents in the future:

  • Scheduled a change to reduce the risk of such an internal error;
  • Fine-tuned alerts to detect any similar issue earlier in order to avoid impacts for customers;
  • Fine-tuned alerts to detect such symptoms, as early as possible;
  • Escalated the issue to Microsoft Azure for root cause analysis;
  • Working on improving our internal processes, especially by training our engineers and support teams on how to react to this type of scenario.

Please note that all operations resumed successfully after the APs went back online and that no data or transmissions were lost, and no actions were required from customers.

On behalf of SES-imagotag, the entire Cloud team would like to apologize for any inconvenience caused. Please be assured that we are continually working to enhance the quality of our platform and improve the user experience.

Thank you for your understanding and continued support.

Posted Nov 17, 2023 - 20:17 UTC

Resolved
Dear customers,

On November 12th, 2023, at 5h UTC, a subset of customers experienced a disruption to their Access Points (APs), which lost connection to VUSION Cloud and could be seen as offline status on VUSION Manager.
Impacted customers have their AP targeting seshftweup001t004hfcoredns.westeurope.cloudapp.azure.com:7354
The issue has been resolved at approximately 8h UTC by our engineers.

Please note that all operations resumed successfully after the APs went back online and that no data or transmissions were lost, and no actions were required from customers.

Engineers are working on fixing the underlying issue, so that it does not occur again in the future. Improved alerting has already been set to identify any issue as early as possible. We are committed to fixing the root cause of the problem as quickly as possible and to improve our processes to react on incidents in the swiftest way. We appreciate your patience and understanding as we work to resolve this issue. A post-mortem analysis will be published in the coming weeks.

On behalf of SES-imagotag, the whole Cloud team would like to apologize for the incident and to thank you for your understanding. Please be assured that we are committed to continuous improvement and that we strive to provide the best experience as possible.
Posted Nov 12, 2023 - 06:00 UTC