How would you normally communicate with your teammates for daily business? Like most others you might use chats like Slack for informal and quick exchange in topic specific channels.
So why not chatting with your application infrastructure on AWS the in same way and in the same collaboration tool your DevOps team uses anyway?
Forget about endless email chains
The major factor of efficient DevOps is, to help your entire team to stay updated on operational events, security findings or application alerts. The goal is always, to respond as qucikly as possible. To achieve this you need to ensure to pass the right information to the right group of people, typically in real-time.
Using Slack chats is pretty handy in this sense, as it provides you dedicated channels and don't let you get lost in endless email chains. Your team can directly take action on events and keep everybody in the loop by just texting back. This decreases response times and lets you address issues before they become serious. Think about situations like system overloads, budget alerts or even security events.
Enough talk - let's do this
The only thing you need is simply the right integration of your infrastructure resources with your chat client.
"Yeah, sounds great... let me quickly build our own integration! I promise you all the bells and whistles you can imagine, and it won't take longer than 1-2 days!" - Random keen engineer
Well yes, we also built such DIY integrations in the past, they worked and mostly fit our requirements. But they always needed some maintenance and resulted in duplicated code when used in different projects. In the end they were less fancy than we thought and it took longer to build, test and sharpen them to production readiness.
That's why we were excited to test AWS Chatbot since it was launched last year. After playing around with it we finally decided to take a deep-dive into AWS Chatbot for these main reasons:
- it's fully managed, so we don't have spent our own resources to maintain it
- it brings monitoring, alerting and even interactions with our AWS resources directly to slack channels
- it helps us to reduce context switches, e.g. Email, Chat, Phone, AWS console etc.
- the setup is really easy and takes only a few minutes
- it's designed for cross-account usage
- it's completely free and comes with no additional costs
The architectural diagram shows, how simple this is from an abstract perspective:
Events, emitted by your application or the AWS Services you use, are just published to Amazon SNS Topics. AWS Chatbot holds a subscription for such SNS topics and takes care for pretty-formatting the notification message before sending it to the connected Slack channel.
While you can configure almost everything in AWS Chatbot with Cloudformation, the first step is - unfortunately - still a manual one which you have to do in the AWS Management Console. So, go to AWS Chatbot for the initial configuration.
Granting AWS Chatbot the permissions to access your Slack workspace
Luckily this needs to be done only once and of course you have to make sure you are logged into the right workspace.
Once this is done, AWS Chatbot is ready to use in your Slack workspace.
Example: Forward AWS GuardDuty findings to Slack
Having Amazon SNS as our input channel for AWS Chatbot, brings a bunch of notification scenarios and possible usecases, e.g: * security events * AWS Lambda invocation errors * usage spikes of your Amazon EC2 instances * budget & costs alarms * CI/CD pipeline events * etc.
For our example here, let's assume we like to get notified about unusual user behavior or possible other security events in our account.
We use AWS GuardDuty to help us with this as a managed threat detection service, provided by AWS without any architectural or performance impact. It analyses AWS CloudTrail logs, VPC Flow Logs, and DNS query logs and generates security findings out of them. If you are interested about how it works in detail and what kind of events you get, check it's documentation or - even better - just enable it in your account and let it work for you (disclaimer: it comes with additional costs)
Since we favor infrastructure as code (IaC) over clicking in the AWS Management Console, we use the AWS Cloud Development Kit (CDK) for Typescript to first create a stack with the SNS topic.
Now we are ready to enable AWS GuardDuty in our account and establish a CloudWatch rule, so that all events from GuardDuty are published to the created Topic. We do this in a separate CDK construct for better reusability and code structuring. See this article for a deeper look into CDK constructs.
To bring the pieces together, we have to call the GuardDuty construct in our stack.
Last but not least, we can add AWS Chatbot and subscribe it to the SNS topic. We again build a custom Construct for this purpose.
At instantiation time, this only creates the internal role for AWS Chatbot, while the actual subscription is later created from outside of the Construct in the stack. Finally, our stack looks like this:
See it in action
Once the stack is deployed, we have to wait until GuardDuty detects unusual behavior, or we can fast-forward and generate a set of sample findings.
Without any line of application code, we set up everything to get a nice security alarming with findings from GuardDuty in Slack. It shows a possible backdoor of an EC2 instance and states that there might be some malicious behavior that looks like a DoS attack. Clicking on the message header brings you directly to AWS GuardDuty and to the concerning finding.
Let's interact with it
We add it to our stack and create a corresponding CloudWatch Alarm, which fires in case of execution errors. Additionally, we configure the alarm to publish a message to the earlier created SNS topic, when in alarm. And we do it, with only a few lines of code.
After triggering the lambda now a few times, we get the expected ErrorAlarm notification in our Slack channel.
Conveniently, the message comes with two short-hand buttons, which let us start to interact with AWS Chatbot and e.g. ask it to return the corresponding logs for this error.
AWS Chatbot comes back to us, with the exact slice of CloudWatch logs we need for inspecting the error.
For more sophisticated interactions, AWS Chatbot allows you to execute most read-only commands from the AWS CLI. You can even invoke Lambda Functions to start whatever workflow you like or create support cases directly via Slack.
Like always, when it comes to permissions, be sure you know what you do and start as restrictive as possible. Especially when granting AWS Chatbot access to your account insights, keep in mind you give those permissions to everybody in the connected slack channel, too!
AWS Chatbot comes with some IAM permission templates and guidelines how to set up appropriate permissions. Although, we highly recommend figuring out what's really needed in your case and start with a simple setting, e.g. don't allow any command execution at all. Such a policy could look like this and is part of the example above:
Have a look at AWS Chatbot's security documentation, to learn more in detail.
Limitations and room for improvement
Since AWS Chatbot is still new and not yet as settled as other services, of course it has it's limitations.
AWS Chatbot seems to always answer in the main channel but not in sub-threads. This makes it sometimes a bit difficult to track the conversation regarding one specific event. As a workaround, you can work with multiple channels, but the problem in general stays the same.
No support chat via Slack
When offering the option for creating support cases anyway, it would be pretty straight forward to handle support chats directly where you are in Slack. Unfortunately AWS Chatbot is only wrapping the CLI command for creating support cases here. For us this would be a killer feature!
Undelivered messages & throttling
When using Slack as main notification tool for your infrastructure, you better keep in mind that you need a fallback. AWS Chatbot itself doesn't yet provide dead letter handling e.g. via email or a dead letter queues, but you can easily set this up yourself using Cloudwatch Alarms again. AWS Chatbot also allows only for 10 events per second and any event above is throttled. In most cases this shouldn't be a big issue, but you should at least know it.
We wouldn't say the documentation is poor, but it took a while to understand, that the documentation paradigm is different here. Other than having a full reference sheet of what you can do with it, you have to interact and chat with AWS Chatbot - even for documentation purpose. The key is, just try chatting with it and it will help you to figure out the correct syntax for what you have in mind.
...to wrap it up
In few words, with AWS Chatbot you get a managed, easy-to-use and cheap solution to get notified when operational events in your infrastructure occur. Whatever you like to connect to it, if not already natively supported by AWS Chatbot you can most likely define a Cloudwatch Alarm for it that publishes to SNS.
All in all - it's not yet perfect, but it's a perfect start!