AI Inference at Scale: Reliability, Observability, Cost & Sustainability
Vendor-aware playbooks for bursty inference and agent flows: queueing, cache design for RAG, GPU pooling vs autoscaling, FinOps for AI, and GreenOps—grounded in Google Cloud’s AI/ML Perspective, Azure Well-Architected for AI workloads, and the Databricks Well-Architected Lakehouse.
Problems solved
- Latency spikes at P95/P99
- Runaway cloud/GPU costs
- Lack of observability in inference pipelines
- Inefficient vector DB and cache usage
What you’ll learn
- When to use serverless triggers, async queues, or GPU pooling
- How to instrument prompts, vector queries, GPU utilization
- Apply FinOps guardrails: usage attribution, rightsizing, spot/preemptible
- Add carbon-aware practices: SCI metrics, time/region shifting
Cloud anchors
- GCP: AI/ML WA pillars for reliability & cost
- Azure: AI workload guides + assessment tools
- Databricks: Seven-pillar Lakehouse WA trade-offs
About Rohit Bhardwaj
Rohit Bhardwaj is a Director of Architecture working at Salesforce. Rohit has extensive experience architecting multi-tenant cloud-native solutions in Resilient Microservices Service-Oriented architectures using AWS Stack. In addition, Rohit has a proven ability in designing solutions and executing and delivering transformational programs that reduce costs and increase efficiencies.
As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.
Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.
Rohit loves to connect on http://www.productivecloudinnovation.com.
http://linkedin.com/in/rohit-bhardwaj-cloud or using Twitter at rbhardwaj1.