Logs That Work for You
Logs are a critical part of understanding what’s happened in your systems — but they’re just as valuable for what’s about to happen.
In this blog, we’ll cover how to detect anomalies in log data, manage detectors using Terraform (with our custom provider), and integrate alerts into your observability stack using CloudWatch and Slack.
What is Log Anomaly Detection?
Log anomaly detection scans events ingested into a log group and identifies unusual patterns. It uses machine learning and pattern recognition to build baselines of expected log content, then flags deviations from that norm.
Each anomaly is given a priority based on how much it deviates from expected values, and a severity based on keywords found in the logs (like FATAL
, ERROR
, or WARN
). For example, if a value spikes by 500% but has no known keywords, it may still be flagged as high-priority due to its deviation alone.
This makes anomaly detection a powerful way to surface both known and unknown issues.
Managing Anomaly Detectors with Terraform
We manage our infrastructure with Terraform, but the standard AWS provider didn’t support log anomaly detectors — so we built our own.
Using the terraform-plugin-sdk
, we created a provider and resource that lets you easily manage detectors and associate them with log groups.
Here’s a sample resource block:
resource "aws_cloudwatch_log_anomaly_detector" "detector" {
provider = test-log-anomaly
name = "test-log-detector"
log_group = "LOG_GROUP_NAME"
evaluation_frequency = "15"
}
You can consume our provider using Docker. Here's a snippet to get started:
FROM hashicorp/terraform:1.2.5 as run
RUN apk add bash
RUN mkdir -p /root/.terraform.d/plugins
COPY --from=ghcr.io/skpr/terraform-provider-log-anomaly:latest /root/.terraform.d/plugins /root/.terraform.d/plugins
RUN chmod +x /root/.terraform.d/plugins/registry.terraform.io/*/*/*/linux_amd64/terraform-provider-*
Setting Up Alarms and Slack Alerts
Detection is only part of the picture — you need to know when something’s wrong.
We created a CloudWatch alarm that triggers a Lambda function to notify our Slack channel when a high-priority anomaly is detected. That gives our team a chance to review the alert and either train the model to recognise it as expected, or keep flagging it in future.
Here’s a Terraform snippet for the CloudWatch alarm:
resource "aws_cloudwatch_metric_alarm" "anomaly_detector" {
alarm_name = "test-anomaly-detector"
alarm_description = "High priority log anomalies have been detected."
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "5"
period = "60"
namespace = "AWS/Logs"
metric_name = "AnomalyCount"
dimensions = {
"LogAnomalyDetector" = "test-log-detector"
"LogAnomalyPriority" = "HIGH"
}
statistic = "Sum"
threshold = 1
insufficient_data_actions = []
alarm_actions = [
arn:aws:lambda:us-west-2:123456789012:function:my-function,
]
actions_enabled = true
}
Right now, it’s configured to alert on a single high-priority anomaly within a one-hour window. Over time, we’ll refine this to reduce noise and focus only on actionable alerts.
Next Steps
We’re just getting started. Next on the roadmap: CLI tools for anomaly tracking, developer-specific notifications, and smarter training feedback to improve model accuracy.
For more information on feature updates, watch this space.