I’m sure anyone out there using redis has probably seen log output like this from time to time:
Users can’t sign up… and our payment processor is failing… and users can’t login… and our SSO is broken! A low-level exception is bubbling all the way up and it looks like we have lots of issues but in fact there’s only one root cause.
If you’re using redis throughout your application and letting exceptions bubble up, those timeouts could trigger multiple alerts from different entry points. If you’re using something like Key Transactions or Keyword Log Alerts to monitor the most important parts of your system then you’ll be getting several alarms indicating multiple things have failed. I’m not picking on redis in particular, any component that is subject to intermittent failures can cause this if it’s used liberally throughout the application: sql database, cloud storage etc. The point is that when one of these services goes down, you suddenly start to get alerts saying that multiple parts of your application have failed.
Things are a little different if you are using Railtown.
When these errors starting occurring, you’ll get one notification to your preferred channel (slack, teams, zapier, email… etc.) or if you are in the app already, you’ll see new errors in your deployed environments right on the home page:
Here we can see the error is not affecting the production environment, even though it’s running the same build as the test environment. Good news. Now we can dive into the Error Bucket and see what’s going on.
We can see several entry points for the same error, so instead of three alerts or key transaction failures we just have one root cause to investigate:
Now we can dig into those entry points and see the associated error logs and stack traces. Railtown has already linked them together for you.
This is the just the start of how Railtown can make your life easier. If you’ve linked Jira or Azure DevOps to your account, we’ll try to match these errors to recent changes in your application via the tickets/work items. Perhaps in this case, you took a ticket to reduce the connection timeout to the redis server and, in the test environment, the infrastructure is not powerful enough to handle that change. Our smart AI ticket matching can figure this out, but that’s a story for another time.
I hope this post has explained how Railtown’s Error Buckets can reduce the noise and take some of the stress out of dealing with intermittent errors in your application’s dependencies.
If you don’t already have a railtown.ai account sign up and let us help you improve the quality of your software and increase your developer velocity.