AI Inference at Scale: Reliability, Observability, Cost & Sustainability
AI inference is no longer a simple model call—it is a multi-hop DAG of planners, retrievers, vector searches, large models, tools, and agent loops. With this complexity comes new failure modes: tail-latency blowups, silent retry storms, vector store cold partitions, GPU queue saturation, exponential cost curves, and unmeasured carbon impact.
In this talk, we unveil ROCS-Loop, a practical architecture designed to close the four critical loops of enterprise AI:
•Reliability (Predictable latency, controlled queues, resilient routing)
•Observability (Full DAG tracing, prompt spans, vector metrics, GPU queue depth)
•Cost-Awareness (Token budgets, model tiering, cost attribution, spot/preemptible strategies)
•Sustainability (SCI metrics, carbon-aware routing, efficient hardware, eliminating unnecessary work)
KEY TAKEAWAYS
•Understand the four forces behind AI outages (latency, visibility, cost, carbon).
•Learn the ROCS-Loop framework for enterprise-grade AI reliability.
•Apply 19 practical patterns to reduce P99, prevent retry storms, and control GPU spend.
•Gain a clear view of vector store + agent observability and GPU queue metrics.
•Learn how ROCS-Loop maps to GCP, Azure, Databricks, FinOps & SCI.
•Leave with a 30-day action plan to stabilize your AI workloads.
⸻
AGENDA
1.The Quiet Outage: Why AI inference fails
2.X-Ray of the inference pipeline (RAG, agents, vector, GPUs)
3.Introducing the ROCS-Loop framework
4.19 patterns for Reliability, Observability, FinOps & GreenOps
5.Cross-cloud mapping (GCP, Azure, Databricks)
6.Hands-on: Diagnose an outage with ROCS
7.Your 30-day ROCS stabilization plan
8.Closing: Becoming a ROCS AI Architect
About Rohit Bhardwaj
Rohit Bhardwaj is a Director of Architecture working at Salesforce. Rohit has extensive experience architecting multi-tenant cloud-native solutions in Resilient Microservices Service-Oriented architectures using AWS Stack. In addition, Rohit has a proven ability in designing solutions and executing and delivering transformational programs that reduce costs and increase efficiencies.
As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.
Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.
Rohit loves to connect on http://www.productivecloudinnovation.com.
http://linkedin.com/in/rohit-bhardwaj-cloud or using Twitter at rbhardwaj1.