Multi-tenant Ribbon SBC for a B2B operator: field report
Architecting a Ribbon 5400 SBC to host dozens of customer PBX tenants on shared infrastructure. Decisions, pitfalls, and numbers after 18 months of production.
A European B2B operator entrusts us with their SIP backbone: hosting dozens of customer PBXs (Microsoft Teams, 3CX, FreePBX, legacy IPBX) on a Ribbon 5400 SBC pair in high availability, without degrading any tenant's voice quality when another saturates its trunk. This article details the structural decisions made, the three pitfalls that cost us the most time, and what we would change with eighteen months of hindsight. All names are anonymized.
The starting context
The operator had been running a monolithic setup: a single SBC, one global addressContext, around fifty sipTrunkGroup objects stacked flat. When a new customer arrived, the ops team added a trunk group, copy-pasted a routing rule, crossed their fingers. Three symptoms surfaced during scoping:
- Coupling: any incident hits every tenant — A customer runs a load test, the SBC's CPS saturates, every other tenant gets
503 Service Unavailableon new call attempts. - Numbering plan collisions — Two customers use the same internal prefix (
9for outbound), translation rules step on each other, operators patch by hand case by case. - No per-tenant metrics — CDRs come out in a single raw file, customer reporting takes a full day of Excel extraction per request.
The goal was twofold: logical isolation between tenants (no single tenant can disrupt the others) and industrializable provisioning (a new customer = a repeatable procedure, not an artisan one-shot).
Decision 1 — One address context per segment, not per customer
First consultant reflex: addressContext per customer = total isolation. First mistake. On Ribbon SBC 5400, the maximum number of addressContext is capped, and each context carries its own routing engine overhead.
Final decision: one address context per market segment, not per customer.
| Address context | Population | Shared policy |
|-----------------|-----------|---------------|
| AC-MICROSOFT | Teams Direct Routing tenants | Strict TLS, Opus codec priority, NAT-aware |
| AC-CLOUDPBX | 3CX cloud, Yeastar, RingCentral | TLS, RTPProxy, transcoding mediation |
| AC-LEGACY | Inherited on-premise IPBX | UDP tolerated, RFC 2833 forced, G.711 fallback |
| AC-CARRIER | Upstream SIP carrier trunks | mTLS, STIR/SHAKEN, anti-replay |
Isolation stays strong (a TLS incident on AC-LEGACY doesn't touch AC-MICROSOFT), while keeping the context count operationally manageable (4 instead of 50+).
For per-customer separation, we work inside a context with zone and sipTrunkGroup objects following a strict naming convention:
zone : ZN-<CUSTOMERID>
sipTrunkGroup : <CUSTOMERID>-CORE-PRI / <CUSTOMERID>-CORE-SEC
ipPeer : IP-<CUSTOMERID>-<ROLE>
This naming lets any ops engineer (and our scripts) instantly find every object belonging to a given customer, and remove them cleanly at termination.
Decision 2 — Tenant numbering plan lives in SMM, not in routing
The pitfall with internal customer numbering plans is that they diverge with every new acquisition. Customer A uses 9 for outbound. Customer B uses 0. Customer C asks for 8 because their old IPBX did it that way. Trying to handle this in the SBC's routing rules leads to combinatorial explosion.
Our approach: every tenant-specific manipulation lives in SMM (SIP Message Manipulation) attached to its sipTrunkGroup, not in the routing engine.
# SMM example — Customer using prefix "9" for outbound
# Strip the 9, normalize to E.164 with default country code
condition : Header.RequestLine.URI.User starts_with "9"
action : strip first 1 char from Header.RequestLine.URI.User
action : prepend "+33" to Header.RequestLine.URI.User
The global routing engine only ever sees E.164-formatted numbers. It never has to know about Customer X's habits.
Operational benefit: adding a customer with an exotic plan = writing a dedicated SMM, attaching it to their trunk group, done. No patches inside central rules shared by everyone.
Decision 3 — Quotas per tenant, not per SBC
The legacy SBC configuration habit is to set global thresholds (MaxSessions, CPS) at the box level. When a tenant launches a test wave, they consume the entire quota.
On Ribbon, policer objects let you cap per sipTrunkGroup. Our standard:
| Tenant tier | MaxSessions | CPS | Burst | |-------------|-------------|-----|-------| | Pilot (≤ 50 users) | 30 | 5 | 10 | | Standard (≤ 500 users) | 200 | 20 | 50 | | Enterprise (> 500 users) | sized per case | sized per case | × 2 |
Beyond quota, the SBC returns 486 Busy Here to the offending tenant. Other tenants see nothing. This single measure eliminated 80% of cross-tenant incidents reported in year 1.
Three pitfalls that cost us time
Pitfall 1 — TLS and SNI
Several Microsoft Teams Direct Routing trunks started failing silently after a firmware upgrade. Root cause: Microsoft sends SNI in the TLS ClientHello. Our SBC, configured with a single certificate on the signaling interface, negotiated correctly, but Teams refused the session if the SNI didn't exactly match the expected FQDN.
Fix: configure multiple certificates on the same interface with SNI-based selection. Once done, failures gone. The initial diagnostic was misleading (logs showed TLS handshake OK).
Pitfall 2 — RTP that doesn't follow the route
For tenants behind restricted-cone NAT (typical for small on-premise PBXs behind a classic firewall), the SBC negotiated signaling correctly, but the return RTP flow shipped to the private IP advertised by the customer, never received.
Fix: force latching (RTP source learning) on matching profiles, and enable media bypass detection to avoid wasting RTP ports on sessions that never complete. Note: latching is not a default setting on Ribbon "secure" profiles — it must be explicitly enabled.
Pitfall 3 — CDRs that don't know who's who
First production month, customer request: "give me the outbound call count for my tenant in March." Honest answer: we didn't know. CDRs didn't carry the tenant identifier.
Rework: add an outbound SMM that injects the tenant ID into a custom header (P-Qaryon-Tenant-ID), then configure Ribbon's CDR enrichment to expose that header in the end-of-call record. Every CDR now carries the tenant. Customer reporting = one SQL query instead of a man-day of Excel export.
The numbers after 18 months
- 42 tenants active on the SBC pair, mixed segments
- Zero cross-tenant incidents reported since the per-sipTrunkGroup limit was deployed
- Provisioning a new tenant: 35 minutes on average (scripted procedure), down from 4 hours
- Monthly per-tenant report generation: 2 minutes (SQL) versus 1 person-day
What we would change in hindsight
With eighteen months of production behind us, two decisions would be made differently:
Industrialize provisioning from day one. We wrote the automation scripts in month 6, after having manually mis-provisioned about ten tenants. Reworking them took time and created unnecessary risk. If doing it again: Terraform/Ansible before the first customer, not after the tenth.
Pick the enriched CDR format from the start. All the SMM + enrichment work described above would have been five times simpler to do before going live. Once in production, changing the CDR pipeline without breaking existing customer exports demands a lot of caution and maintenance windows.
Conclusion
A multi-tenant SBC is not just stacking trunk groups in a box. The real decisions are isolation per segment rather than per customer, pushing tenant-specific behaviors into SMM rather than routing, and explicitly capping quotas per trunk group. Each decision has an upfront cost — and each one prevents several production incidents.
This type of mission is exactly what qaryon delivers: target architecture, configuration, production rollout, knowledge transfer. One point of contact from scoping to cutover.
qaryon — Consulting, audit, and deployment for unified communications. Get in touch.