Delayed APM Distribution Metrics, Data Streams Monitoring Metrics & Monitor Notifications

highDatadogAPMDec 4, 2024 18:48Duration: 3h 1m
api
API Issue

Summary

This incident has been resolved.

Impact

major

Timeline

Dec 4, 2024 18:48

[investigating] We are investigating increased latency in processing APM Distribution Metrics and Data Streams Monitoring Metrics as well as monitors notifications based on these metrics, which began at 17h47 UTC. As a result of this issue, some users may see delays or gaps for these metrics on graphs, including APM pages as well as delayed monitor notifications.

via statuspage
+0m
Dec 4, 2024 18:49

[identified] The issue has been identified and a fix is being implemented.

via statuspage
+4m
Dec 4, 2024 18:53

[identified] We are continuing to work on a fix for this issue.

via statuspage
+12m
Dec 4, 2024 19:05

[identified] Data Streams Monitoring metrics and associated monitor notifications based on these metrics have recovered.

via statuspage
+38m
Dec 4, 2024 19:43

[monitoring] A fix has been implemented and we are monitoring the results.

via statuspage
+2h 6m
Dec 4, 2024 21:50

[resolved] This incident has been resolved.

via statuspage

Lessons Learned

Datadog has experienced 33 incidents in the past year. This frequency suggests systemic reliability challenges that may warrant additional monitoring.

📊Incidents related to api have occurred 186 times across all providers in the past year. This is one of the most common failure categories in cloud infrastructure.

💡This incident is categorized as: API Issue. Consider implementing preventive measures specific to this failure category.