Class BackoffLatencyAlerter

java.lang.Object
com.dieselpoint.norm.latency.BackoffLatencyAlerter
All Implemented Interfaces:
LatencyAlerter

public abstract class BackoffLatencyAlerter extends Object implements LatencyAlerter
One of the dangers when reporting latency issues to external services, is that the reporting itself a) takes a significant amount of time and may create Customer Experience issues, and b) you end up with millions of latency alerts when a database goes bad.

This class implements a basic "exponential backoff with jitter" algorithm. Subclasses can simply implement alertLatencyFailureAfterBackoffAndJitter(DbLatencyWarning, long) to take advantage of the exponential backoff facility.

var cwAlerter = CloudWatchAlerter( Duration.ofMillis( 500 ), Duration.ofMinutes( 10 ) ); will initially alert at 500ms intervals, then 1000ms (1 second), 2 seconds, 4 seconds .... 10 minutes.

For more information, refer to Exponential Backoff And Jitter by Amazon Web Services

When implementing your alerter, remember to swallow errors. You don't want your platform slowing down/failing because the monitoring service is failing. See the alertLatencyFailureAfterBackoffAndJitter(DbLatencyWarning, long) documentation for further steps to ensure that monitoring doesn't accidentally become a significant overhead.

  • Constructor Details

    • BackoffLatencyAlerter

      public BackoffLatencyAlerter(Duration minimumReportingInterval, Duration maximumReportingInterval)
  • Method Details

    • alertLatencyFailure

      public void alertLatencyFailure(DbLatencyWarning warning)
      Specified by:
      alertLatencyFailure in interface LatencyAlerter
    • alertLatencyFailureAfterBackoffAndJitter

      public abstract boolean alertLatencyFailureAfterBackoffAndJitter(DbLatencyWarning warning, long numberOfAlertsSwallowed)
      Parameters:
      warning - the latency warning
      numberOfAlertsSwallowed - the number of alerts that were swallowed during the exponential backoff period. This might (or might not) be interesting to report alongside the current issue. It'll definitely give you a sense of how bad things have gone!
      Returns:
      true if notifying the remote service was successful, false otherwise. If false, then we'll automatically backoff calls to reporting in the same way as latency failures, to avoid a slowdown / issue on monitoring impacting the actual customer experience