[Europe] Offline access-points for a subset of customers
Incident Report for VUSION Cloud
Postmortem

On November 6th, 2023, at 4:25 UTC, a subset of customers experienced a disruption to their Access Points (APs), which lost connection to the V:Cloud and could be seen as offline status on VUSION Manager.

Disruption was caused by an internal error on the AP management module which blocked the service from running properly. Additionally, when our engineers attempted to apply our recovery procedure on the affected endpoint via Microsoft Azure, the operation failed with an error pointing to another module. This prevented the team to apply curative action. The issue was escalated to Microsoft Azure. In parallel, Cloud engineers were able to identify an new workaround, and all APs went back online quickly after 8:40 UTC.

We have taken the following steps to reduce the likelihood and impact of similar incidents in the future:

  • Scheduled a change to reduce the risk of such an internal error;
  • Fine-tuned alerts to detect any similar issue earlier in order to avoid impacts for customers;
  • Create alerts to detect such symptoms, as early as possible;
  • Escalated the reboot issue to Microsoft Azure for root cause analysis
  • Working on improving our internal processes, especially by training our engineers and support teams on how to react, communicate and fix this type of scenario.

Please note that all operations resumed successfully after the APs went back online and that no data or transmissions were lost, and no actions were required from customers.

On behalf of SES-imagotag, the entire Cloud team would like to apologize for any inconvenience caused. Please be assured that we are continually working to enhance the quality of our platform and improve the user experience.

Thank you for your understanding and continued support.

Posted Nov 17, 2023 - 20:14 UTC

Resolved
The incident has been resolved. All services are back to normal.

The incident was linked to our cloud services provider Microsoft Azure, an RCA will be communicated when available. We will keep monitoring services.

SES-imagotag team apologizes for the disturbance it may have caused.
Posted Nov 06, 2023 - 10:40 UTC
Monitoring
Services are coming back online. We are continuing to monitor.
Posted Nov 06, 2023 - 08:59 UTC
Update
We are continuing to investigate this issue.
Posted Nov 06, 2023 - 08:44 UTC
Investigating
Dear customers,

We have identified that a subset of transmitters are offline. Impacted customers are using the endpoint ap-eu.vusion.io:64106

Our cloud engineers are working to identify possible fix to bring them back online.

On behalf of SES-imagotag, the whole team would like to apology for the impact it may have.
Posted Nov 06, 2023 - 08:43 UTC
This incident affected: Europe (VUSION Cloud API - Europe).