Thinking out loud on hard problems
- The Infrastructure That Enforces Itself: Compliance-Grade Multi-Tenant SaaS on Amazon EKS
A deep dive into building compliance-grade multi-tenant SaaS on Amazon EKS — using Flux GitOps, Terraform Enterprise workspace versioning, Vault 2.0 workload identity, and Argo Workflows for fully automated, auditable tenant onboarding.
- Deploying Production-Grade LLM Inference on AWS EKS — A Hands-On Deep Dive
An architectural walkthrough of the GenAI on EKS workshop — vLLM, Ray Serve, Karpenter, DCGM + AMP observability, and AWS Strands Agents — with the design decisions behind each layer.
- PageIndex and Vectorless RAG — A Structural Alternative for Professional Documents
Reasoning-based retrieval as an alternative to vector similarity search for structured professional documents — how PageIndex achieves 98.7% on FinanceBench, applied across healthcare, wealth management, banking, and travel with full domain use cases and implementation pathway.
- Mamba and SSMs — What the Generation Backbone Change Means for RAG
A systems-level analysis of replacing the Transformer backbone with Mamba/SSM architectures in RAG systems — covering linear context scaling, constant KV cache memory, selective state tracking (Mamba-3), and the hybrid Transformer+Mamba pattern for enterprise deployment.
- The RAG Supporting Stack — Memory, Prompt Engineering, Fine-tuning, and Embeddings
The cross-cutting infrastructure that makes RAG systems work in practice — layered memory architecture, prompt engineering patterns, domain-specific fine-tuning, and embedding improvements including ColBERT, cross-lingual, hyperbolic, and negation-aware approaches.
- RAG in Personal Banking — Scale, AML, and Transaction Intelligence
L1–L4 RAG in personal banking — from FAQ deflection to agentic cash flow diagnosis. Covers PCI-DSS, FINTRAC/AML constraints, hybrid transaction dispute retrieval, financial health knowledge graphs, and the 10M+ daily transaction scale challenge.
- RAG in Wealth Management — Fiduciary Constraints and Retrieval Design
How L1–L4 RAG applies to wealth management platforms — from product FAQs to proactive portfolio review. Covers MiFID II suitability, IPS compliance graphs, Bloomberg/Refinitiv hybrid search, and FinBERT fine-tuning.
- RAG in Hospital Management Systems — Zero Hallucination Tolerance
Applying L1–L4 RAG to hospital management — clinical protocol lookup, drug interaction checking, differential diagnosis with knowledge graphs, and agentic patient surveillance. Covers HIPAA, HL7 FHIR, SNOMED CT, and clinical fine-tuning constraints.
- RAG in Travel & Tourism Systems — GDS, Visa Routing, and the AI Concierge
How L1 through L4 RAG applies to travel and tourism platforms — from airline FAQ bots to agentic honeymoon planners. Covers GDS integration, visa-route graphs, multilingual embeddings, and real-time inventory constraints.
- The Four RAG Levels — A Decision Framework for Enterprise Systems
L1 through L4 RAG — vanilla, hybrid, GraphRAG, and agentic — with a concrete decision framework for choosing the right retrieval level for any enterprise problem.
14 posts