Why this matters
Your service makes 5 backend calls per user request. Each call has a P99 of 50ms — sounds great. But the combined P99 is much worse: even one slow call ruins the request. With 5 calls, the chance of at least one being a P99 outlier is roughly 5%. Tail latency dominates user experience, and reducing it is mathematically harder than reducing average latency.
Request hedging (Jeff Dean, "The Tail at Scale", 2013) is the elegant trick: send the same request to two replicas; take whichever responds first; cancel the other. Tail latency drops dramatically because one slow replica no longer slows you.