When Webhooks Go Missing: A DNS Mystery in the Cloud

Apr 21, 2025

3 min read


dns-compare-result

Context

Last week, one of our merchants reached out with a puzzling issue: they weren’t receiving payment callbacks (aka webhooks) from our payment gateway.

Normally, this kind of problem comes down to two things—either the webhook URL is wrong, or the merchant’s server isn’t reachable. But this time, everything checked out:

✅ Correct webhook URL

✅ Server is reachable

✅ Webhool URL is reachable from local tests

So… what was going on?

The Plot Thickens

After digging deeper, we made a curious discovery: The webhook URL worked only when accessed from outside of Alibaba Cloud. But from our payment gateway—which is hosted entirely within Alibaba Cloud? Crickets. Silence. Nada.

Even stranger: This issue was isolated to just one merchant. We serve over 20,000 merchants, and no one else had reported this problem.

Theories and Findings

Here’s what we pieced together:

✅ Both our systems and the merchant’s server are hosted on Alibaba Cloud.

✅ The webhook URL works perfectly fine outside Alibaba Cloud.

✅ Only this subdomain is affected, while the main domain works fine.

✅ Verified client’s DNS configuration and it was setup correctly.

✅ When we ping the server’s IP address, it responds immediately.

✅ That pointed us to a classic suspect: DNS resolution issues.

We suspected our Kubernetes cluster’s DNS might be misbehaving—but we were using the standard config from Kubernetes docs.

As an experiment, we switched our DNS to Google DNS (8.8.8.8). Boom. It worked.

Just to be sure, we spun up a brand new ECS instance in Alibaba’s Bangkok region—new VPC, fresh public IP—and… same issue.

At this point, the problem was clearly tied to Alibaba Cloud’s default DNS.

But we were hesitant to start switching DNS settings across the board, especially since everything else was working fine for everyone else.

So We Called In Reinforcements

Their response?

It is an Authoritative server issue between Alibaba Cloud and Bluehost (client’s nameserver). Several regions are affected, including Kuala Lumpur and Bangkok.

The issue is not limited to our payment gateway, but also affects other services hosted on Alibaba Cloud.

The Fix (sort of)

Unfortunately, there’s no ETA from Alibaba Cloud or Bluehost on when this will be resolved.

In the meantime, we’ve started routing all outbound traffic through an alternate DNS provider as a temporary workaround — for this merchant and anyone else who might run into the same issue.

Workaround

Conclusion

Have you considered what you would do if this issue occurred in your production environment?

TL;DR

  • One merchant couldn’t receive webhooks.
  • Everything looked fine—except DNS resolution within Alibaba Cloud.
  • It’s an issue between Alibaba Cloud DNS and Bluehost.
  • We’re temporarily routing merchant’s domain through alternate DNS as a temporay fix.