On January 9th and 10th 2018 we experienced two major service outages affecting all customers using ScriptRunner for Jira Cloud. Between 18:30 and 22:40 on 9th Jan our add-on was unavailable, communication with customer Jira instances was broken and Scheduled Jobs did not execute. Between 00:30 and 09:00 on 10th Jan no customer scripts were executed - this includes scripts from the Script Console, Script Listeners, Escalation Service and Scheduled Jobs.
We understand that these outages cause problems with your business procedures and daily workflows that depend on ScriptRunner and we are sorry.
Our team is now working to improve our procedures, infrastructure and software to prevent these problems from occurring again.
We identified several problems during our root cause analysis:
Additionally, we did not update this StatusPage during the outages and we did not have a clear internal plan to follow during out-of-hours outages.
Changes We're Making
We have made it clear internally who should be contacted during out-of-hours outages. We will be making sure that engineers are readily available out-of-hours.
We are modifying our alerting systems to better report service outages.
We have updated our service autoscaling to prevent scaling down until we have resolved the problem where new services don't start successfully.
We are investigating alternatives to our logging infrastructure and intermediate fallback steps we can implement in the mean time.