blog details

MQTT at Scale: From “It Works” to “It’s Operable”

MQTT looks deceptively simple: clients connect, publish, subscribe—done. The trouble starts when your fleet grows: 10,000 devices becomes 250,000; “one topic per device” becomes millions of subscriptions; a single reconnect wave turns into a thundering herd. Suddenly you’re dealing with session state, fanout math, broker hot spots, throttling limits in managed services, and a cost curve driven by payload size and message rates—not just “CPU and RAM.”

This guide gives you a production mental model for MQTT at scale, plus broker options, load-testing steps, performance/cost/security trade-offs, and a checklist you can use before your next rollout.

What “MQTT at scale” really means (and why it breaks)

“Scale” is not one number. It’s at least five:

  1. Concurrent connections (devices + apps + gateways)
  2. Message throughput (pub/s, bytes/s)
  3. Fanout (one publish → how many subscribers receive it)
  4. Session state (offline queues, inflight messages, retained state)
  5. Operational complexity (multi-tenancy, auth, quotas, observability, upgrades)

A system can “handle 1 million connections” in a benchmark and still melt down in production if:

  • one topic has 50,000 subscribers (fanout storm),
  • retained messages are misused (memory/disk growth),
  • clients reconnect in waves (connection churn),
  • quotas throttle you (managed services),
  • or you can’t see what’s happening (no good metrics and traces).

The trade-offs you can’t avoid

  • QoS vs throughput: Higher QoS adds packets and state.
  • Durability vs latency: Persisting more state increases I/O pressure.
  • Simplicity vs resilience: Single broker is simple; clustering/multi-region is resilient but harder to operate.

How MQTT works at scale (the mental model)

At small scale, MQTT is “a broker and some clients.” At real scale, think of it as four layers:

1) Connection layer (TCP/TLS/WebSockets)

  • Handles handshakes, keep-alives, NAT timeouts, and reconnections.
  • MQTT over WebSockets is common for browser dashboards, but it changes performance characteristics and connection counts.

2) Session layer (identity + state)

This is where many scale problems live:

  • Persistent sessions mean the broker remembers subscriptions and may store queued messages for offline clients.
  • Expiry controls matter: MQTT 5 added clearer mechanisms like session/message expiry, and better server feedback.

3) Routing layer (topics → subscribers)

Routing cost depends on:

  • number of subscriptions,
  • wildcard usage,
  • shared subscriptions (work distribution),
  • and whether you’re bridging across brokers/regions.

Shared subscriptions (MQTT 5) are the big “scale lever” for backend consumers: multiple consumers share one subscription so a message goes to only one of them (load balancing).

4) Storage layer (retained + offline queues + inflight)

Retained messages and offline queues are powerful—and dangerous at scale if you don’t set limits.

  • Retained messages are stored by the broker and delivered to new subscribers immediately on subscribe.

Soft checkpoint (architecture review): If your roadmap includes >50k devices, multi-tenant topics, or “store-and-forward” offline behavior, it’s worth validating your topic tree, session strategy, and quota assumptions early—before you pick tooling.

Best practices & pitfalls

1) Topic design: prevent fanout explosions

Do

  • Design topics by who consumes (not just who publishes).
  • Keep hot topics narrow. Use per-device topics for telemetry; use shared topics sparingly.
  • Put tenant/site/device hierarchy early (helps filters and routing).

Avoid

  • One “global” topic that thousands of clients subscribe to.
  • Excessive wildcards in high-fanout paths (# on broad roots).

Rule of thumb: Always estimate fanout:
effective deliveries/sec = publishes/sec × avg subscribers per publish

2) QoS strategy: pick the lowest QoS that meets business needs

  • QoS 0: fastest; can lose messages.
  • QoS 1: “at least once”; duplicates are possible.
  • QoS 2: “exactly once” in theory—but many managed hubs don’t support it.

At scale, QoS 1 + idempotency wins:

  • Add a message id / sequence id in payload.
  • Make consumers dedupe safely.

3) Retained messages: use like “last known value,” not history

Retained messages are great for:

  • last known device status,
  • configuration snapshots,
  • “current mode” values.

But they’re not a time-series database.

  • Each retained topic consumes broker storage.
  • At massive topic counts, retained misuse becomes a memory/disk problem.

4) Shared subscriptions: scale your consumers cleanly

If backend services need to process a high-rate topic stream, shared subscriptions let you run a consumer group-like pattern behind MQTT.
This is often the simplest way to scale processing without adding extra routing layers.

5) Connection churn: design for reconnect storms

At scale, devices don’t reconnect one-by-one—they reconnect in waves:

  • power outages,
  • cellular carrier resets,
  • firmware rollouts,
  • broker restarts.

Mitigations

  • Randomized reconnect backoff with jitter (clients).
  • Staggered fleet rollouts.
  • Separate “control plane” topics from “telemetry plane.”

6) Observability: if you can’t measure it, you can’t scale it

Track, at minimum:

  • connections (current + churn rate),
  • publish rate and bytes/sec,
  • subscription count and wildcard share,
  • retained count and size,
  • offline queue depth,
  • auth failures,
  • latency percentiles (p50/p95/p99).

Performance, cost, and security considerations

Performance planning: the three numbers to size first

  1. Concurrent connections (peak, not average)
  2. Peak publishes/sec (burst, not average)
  3. Fanout factor (avg subscribers per publish)

Then validate with load tests that include:

  • realistic payload sizes,
  • realistic topic filters,
  • reconnect churn,
  • and your actual auth method (TLS certs, tokens, etc.).

Cost modeling: message size matters more than people expect

Managed services often bill or quota by message “chunks.”

  • AWS IoT Core: payloads up to 128 KB; metered in 5 KB increments (e.g., 8 KB counts as two).
  • Azure IoT Hub: quota math depends on tier; message size used for daily quota calculations is standardized (for many tiers, 4 KB is used as the baseline unit).

Practical takeaway:
If you can reduce average payload from 8 KB → 4 KB, you can materially reduce “billed message units” in some pricing models.

Security: MQTT is secure when you operate it securely

Core must-haves:

  • TLS everywhere (prefer mutual TLS for devices).
  • Strong device identity + rotation.
  • Least-privilege topic ACLs (devices can publish only to their own topics).
  • Rate limits and anomaly detection (auth failures, reconnect spikes).
  • Regular security testing (OWASP IoT projects are a good baseline).

Real-world use cases (mini case study)

Scenario: A telemetry platform grows from 15k to 180k devices across 12 regions.

Before

  • Single broker deployment.
  • One “global telemetry” subscription consumed by a single analytics service.
  • Retained messages used for both “status” and pseudo-history.

What broke

  • Analytics consumer became the bottleneck.
  • Fanout spikes during dashboard usage caused latency jumps.
  • Retained store grew unpredictably.

After

  • Split topic tree by tenant/site/device; moved dashboards to read from a state store (not retained-as-history).
  • Introduced shared subscriptions for backend processors to scale horizontally.
  • Added reconnect jitter, rate limiting, and operational SLO dashboards.

Result (typical outcomes you should aim for)

  • Predictable latency under bursts (no “mystery” spikes).
  • Linear scaling of consumers (add replicas instead of redesigning).
  • Lower storage volatility by narrowing retained usage.

FAQs

1) What does “MQTT at scale” mean?

It means you can handle peak connections, throughput, fanout, and session state—reliably—while still being able to operate upgrades, failures, and security policies without outages.

2) How many MQTT connections can a broker handle?

It depends on broker, hardware, payload rates, and session settings. Vendors publish high-scale benchmarks (e.g., HiveMQ’s large-scale connection tests and EMQX’s WebSocket scale reports), but you should validate with your own workload shape.

3) What’s the best MQTT broker for large IoT fleets?

Start from your constraints:

  • If you want fully managed identity + integration, consider managed hubs (AWS IoT Core / Azure IoT Hub).
  • If you need full protocol control, custom routing, or on-prem, consider clustered brokers (EMQX, HiveMQ, VerneMQ).
    Use a load test that matches your fanout and reconnect pattern before deciding.

4) Do I need QoS 2 for IoT?

Often, no. Many managed IoT hubs don’t support QoS 2, so production designs commonly use QoS 1 plus idempotent processing.

5) How do shared subscriptions work in MQTT 5?

Multiple clients share one subscription group so the broker distributes matching messages across them—like a consumer group—improving throughput and fault tolerance for backend processing.

6) Are retained messages bad at scale?

They’re not “bad,” but they’re easy to misuse. Retained messages are perfect for “last known state,” but using them as history can blow up broker storage and complicate ops.

7) How do I load test MQTT?

Use a benchmark tool and simulate:

  • realistic payload sizes,
  • topic trees and wildcard patterns,
  • QoS and session settings,
  • reconnect churn,
  • auth (TLS, certs, tokens).

Many broker ecosystems provide benchmark tools/guides (e.g., EMQX’s eMQTT-Bench documentation).

8) Is MQTT secure over the internet?

Yes—when you use TLS, strong identity, topic ACLs, and monitoring. Also treat IoT security like a system: test regularly and follow IoT security testing guidance.

MQTT isn’t hard to implement. The hard part is making reconnect storms, fanout spikes, and quota limits boring in production.

Conclusion

MQTT is one of the simplest protocols to start with—and one of the easiest to underestimate. The moment your fleet grows, “publish/subscribe” turns into a real system: session state, fanout, backpressure, quotas, security policies, and the operational muscle to observe and recover fast. Most outages don’t come from MQTT itself; they come from scaling assumptions that were never tested under peak reconnects, realistic topic trees, and real auth.

The good news: MQTT scales cleanly when you treat it like an operations problem early. Do the fanout math, choose the right QoS strategy, limit retained usage to “last known value,” use shared subscriptions to scale consumers, and load test the failure modes—not just the happy path. Once those are in place, MQTT becomes what it was always supposed to be: a reliable, lightweight backbone for device communication.

Know More

If you have any questions or need help, please contact us

Contact Us
Download