Back to Repository

Orchestration Engine

Reliability
Role: Platform & Integration Reliability SpecialistYear: 2019-2025

Client project: Tenable and ServiceNow Escalation Workflows

Technical Deep Dive

Built an at-least-once distributed orchestration system with Celery + RabbitMQ, queue isolation, and observability practices that improved MTTR by 35%.

Client Context

Tenable and ServiceNow escalation flows faced duplicate ticketing, schema drift, and operational instability during high-volume incident periods.

Execution

Implemented explicit idempotency keys, queue segmentation by integration domain, dead-letter handling, and exponential backoff with jitter to contain retries and isolate failures.

Outcome

Sustained 1,000+ concurrent events, reduced operational overhead by 72%, and improved MTTR by 35% across 15+ P0-P3 escalations with better triage visibility.

Handled 1,000+ concurrent events with idempotent, retry-safe processing and reduced operational overhead by 72%.

Core Stack

Python
Celery
RabbitMQ
Idempotency
Observability

Metrics

concurrency

1,000+

overhead_drop

72%

mttr_gain

35%