High RDS CPU on multiple environments

lowAtlassianSep 9, 2024 08:56Duration: 9h 16m
computedatabase
Database OverloadCapacity Issue

Summary

Earlier today, we experienced RDS problems for Jira, Jira Service Management, and Jira Work Management. The issue has been resolved, and the services are operating normally.

Impact

none

Timeline

Sep 9, 2024 08:56

[investigating] We are investigating issues with Latency and 5xx errors for Jira, Jira Service Management, Jira Work Management due to High RDS CPU on several EU-Central environment and will provide updates here soon.

via statuspage
+2h
Sep 9, 2024 10:56

[monitoring] We鈥檝e taken steps to address the issue by scaling up the RDS instances, and we will continue to monitor the systems to ensure everything remains stable.

via statuspage
+5h 5m
Sep 9, 2024 16:01

[identified] We are observing different RDS problems in different environments for Jira, Jira Service Management, and Jira Work Management. We are actively working on this and will provide more updates here soon.

via statuspage
+18m
Sep 9, 2024 16:19

[investigating] We are observing different RDS problems in different environments for Jira, Jira Service Management, and Jira Work Management. We are actively working on this and will provide more updates here soon.

via statuspage
+54m
Sep 9, 2024 17:13

[monitoring] We identified the root cause of the RDS problems causing issues for Jira, Jira Service Management, and Jira Work Management. We mitigated the issue and are now monitoring the fix.

via statuspage
+59m
Sep 9, 2024 18:12

[resolved] Earlier today, we experienced RDS problems for Jira, Jira Service Management, and Jira Work Management. The issue has been resolved, and the services are operating normally.

via statuspage

Lessons Learned

馃搳Incidents related to compute, database have occurred 41 times across all providers in the past year. This is one of the most common failure categories in cloud infrastructure.

馃挕This incident is categorized as: Database Overload, Capacity Issue. Consider implementing preventive measures specific to this failure category.