Of all the amazing announcements made during the re:Invent keynotes, among the most impressive I heard was Thursday’s announcement of X-Ray.
In short, X-Ray is Amazon Web Services’ attempt to provide application monitoring across multiple disconnected services. And if it works as advertised, it’s a potential game-changer for DevOps.
As the cliche goes, microservice architecture‘s greatest strength is also its greatest weakness. Having multiple independent services that are chained together into a solution certainly provides resilience, flexibility, scalability, portability, swifter development and many other benefits. But it does come at a cost; generally, more complexity, additional latency and more potential points of failure/trouble.
It’s also difficult to understand how the individual portions of a microservice-based application perform as a unit. I can easily monitor how each microservice in an application performs but I don’t necessarily understand how a problem at one point in a workflow affects the performance of other microservices.
For example, if I have a microservice that handles authentication for several other microservices in a solution, what’s the effect of a 300 millisecond delay in that authentication microservice on the performance of the application as a whole? I know that 300 ms delay is being amplified because it’s central to several concurrent tasks, but how bad is it to overall application performance, really?
This applies to the underlying services within non-microservice-based architectures, too. For example, I may have a solution that is served from a single EC2 instance, but it may rely on DynamoDB and S3 to get things done. Latency among those requests affects performance of my solution, too.
X-Ray aims to provide a means of measuring and understanding performance among instances and services. To accomplish this task, a JSON payload, containing request identification and related metadata, is added to the HTTP header of each request. This payload, which Amazon calls a segment, is used to derive performance data for the request. According to Amazon, the granularity of this data is highly flexible, from a simple latency report all the way down to the performance of specific lines of code.
This data is sampled and aggregated by transaction. In other words, X-Ray groups data from end to end; that is, from the first resource call to the last one, needed to satisfy a unique request from the initiating service. So if I have a solution that requires calls to three EC2 instances, two SQS queues and one DynamoDB to send data back to the caller, all the performance data will be grouped together for that one transaction.
But I won’t see data for every single transaction: I will get a “service graph” that represents the performance among these instances and services over some time-span. Default time-spans are 1, 5, 15 and 30 minutes; and 1, 3 and 6 hours. You can expand this to a custom date and time range.
After doing so, X-Ray reports back average latency, request duration, response codes and – depending on your instrumentation – some traces (that is, each transaction) that went into producing that average. It also details the performance of every instance/service that is part of the trace, so you can quickly identify the bottlenecks and trouble spots for a given workflow, rather than having to divine it from individual services’ logs.
If you’re in DevOps, X-Ray promises to remove a significant headache and hours of work from your daily struggle.
X-Ray is in preview, so you need to sign up to use it.