Dhanasekar's Field Notes
agents
- Agent ArchitecturesReAct, plan-and-execute, reflection loops, multi-agent systems, and orchestration strategies.
- Tool Calling & Function CallingJSON tool schemas, parallel tool calls, result injection, and the tool use loop.
- Advanced Reasoning & Planningo1-style reasoning, chain-of-thought scaling, multi-step planning, and mathematical reasoning patterns.
- Test Time Compute in AgentsTree search, self-reflection, retry budgets, and allocating compute across agent steps.
- Memory & ContextEpisodic memory, conversation history, summarization, and long-term context management.
retrieval & grounding
evaluation & monitoring
- Benchmarks & LeaderboardsMMLU, HELM, BIG-Bench, Chatbot Arena, and what leaderboard scores actually measure.
- LLM-as-JudgeUsing models to evaluate model outputs, positional bias, self-preference, and calibration failures.
- Agentic EvalsTask completion metrics, trajectory evaluation, sandboxed environments, and failure modes.
- Red-teaming & Safety EvalsAdversarial prompting, systematic safety evaluation, eval coverage, and what gets missed.
alignment & safety
hardware
model architecture
- Building a TransformerFrom tokenization to attention to feedforward layers — how the core architecture actually works.
- Attention MechanismsMulti-head, grouped-query, multi-query, and multi-head latent attention variants and their tradeoffs.
- Modern Architecture PatternsMoE routing, sampling strategies, structured generation, and architectural innovations.
- Emerging Architecture PatternsState space models, Mixture of Depths, sparse attention patterns, and novel positional encodings.
memory systems
- Attention & MemoryFlashAttention, IO-aware algorithms, and why materializing attention matrices kills performance.
- KV Cache SystemsCaching, paged attention, block-sparse memory, prefix sharing, and quantized KV storage.
- Caching StrategiesSemantic caching, prefix caching, cross-request sharing, and intelligent cache eviction policies.
- Long Context HandlingRoPE scaling, sliding window attention, and what breaks at 100K+ token contexts.
model optimization
- Quantization & CompressionINT8, INT4, GPTQ, AWQ, and the precision-accuracy tradeoff in production systems.
- Knowledge TransferDistillation, supervised fine-tuning, QLoRA, and fitting large model capabilities into smaller ones.
- Efficient InferenceLoRA at inference time, speculative decoding, and draft model verification strategies.
serving infrastructure
- Batching & Scheduling✓Continuous batching, iteration-level scheduling, and how vLLM keeps GPUs saturated.
- Prefill vs DecodeChunked prefill, decode-only servers, disaggregation, and why separating phases improves utilization.
- Scaling & ParallelismTensor parallelism, pipeline parallelism, sequence parallelism, and distributed inference patterns.
- Test Time Compute & Inference ScalingChain-of-thought, best-of-N, process reward models, and compute-optimal inference.
- Inference EconomicsGPU cost per token, batching efficiency, latency-throughput tradeoffs, and margin at scale.
data systems
- Pretraining DataCommon Crawl, deduplication, quality filtering, domain mixing ratios, and data scaling laws.
- Data Curation at ScalePerplexity filtering, MinHash dedup, domain classifiers, and toxic content removal pipelines.
- Instruction Tuning DataFLAN, Alpaca, ShareGPT, and what separates good instruction-following data from noise.
edge & distributed
fundamentals
- Networking EssentialsTCP/IP, HTTP, DNS, load balancers, and how data moves across the internet.
- API DesignREST, GraphQL, RPC, versioning, pagination, and designing interfaces clients actually use.
- Data ModelingRelational vs document models, normalization, denormalization, and schema design tradeoffs.
- Database IndexingB-trees, hash indexes, covering indexes, and when indexes hurt more than they help.
- CachingCache-aside, write-through, write-back, eviction policies, and the cache invalidation problem.
- ShardingHorizontal partitioning, shard keys, rebalancing, and handling cross-shard queries.
- Consistent HashingHash rings, virtual nodes, load distribution, and why naive hashing breaks at scale.
- CAP TheoremConsistency, availability, partition tolerance, and the tradeoffs you actually make in practice.
- Numbers to KnowLatency numbers, throughput limits, modern hardware capabilities, and back-of-envelope calculations.
- ConcurrencyLocks, mutexes, semaphores, race conditions, deadlocks, and lock-free data structures.
- ReliabilityFailure modes, MTTF, MTTR, redundancy, replication, circuit breakers, and fault tolerance.
patterns
- Real-time UpdatesWebSockets, Server-Sent Events, long polling, and pushing data to clients.
- Dealing with ContentionOptimistic locking, pessimistic locking, versioning, and handling concurrent writes.
- Multi-step ProcessesDistributed transactions, sagas, event sourcing, CQRS, and maintaining consistency across services.
- Scaling ReadsRead replicas, caching layers, CDNs, and handling read-heavy workloads.
- Scaling WritesWrite sharding, write-ahead logs, batching, and handling write-heavy workloads.
- Handling Large BlobsObject storage, chunking, resumable uploads, and serving large files efficiently.
- Managing Long Running TasksJob queues, async processing, task scheduling, retries, and monitoring background work.
observability & operations
- Observability FundamentalsThe three pillars, instrumentation, telemetry, and understanding system behavior.
- Metrics, Logging, TracingTime series data, structured logging, distributed tracing, and OpenTelemetry.
- SLIs, SLOs, SLAsService level indicators, objectives, agreements, error budgets, and measuring reliability.
- Incident ResponseOn-call, debugging production, postmortems, and learning from failures.
- Capacity PlanningLoad testing, growth modeling, resource estimation, and staying ahead of demand.
data engineering
performance engineering
- Profiling & BenchmarkingCPU profiling, memory profiling, flame graphs, and finding bottlenecks.
- Optimization TechniquesAlgorithmic optimization, caching strategies, lazy evaluation, and making systems faster.
- Tail LatencyP99, P999, hedged requests, and why the slowest requests matter most.
- Load TestingStress testing, capacity testing, soak testing, and validating system limits.
infrastructure & deployment
production readiness
- Migration StrategiesZero-downtime deployments, data migrations, expand-contract pattern, and backwards compatibility.
- Multi-Region & Disaster RecoveryActive-active, active-passive, failover strategies, RPO, RTO, and building resilient systems.
- Cost OptimizationResource rightsizing, reserved instances, spot instances, autoscaling, and managing cloud spend.
security fundamentals
- STRIDE Threat ModelingSpoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege — systematic threat analysis.
- Threat Modeling & Attack TreesIdentifying attack vectors, building threat models, risk assessment, and prioritizing security work.
- Penetration TestingRed team exercises, vulnerability scanning, exploit chains, and validating security controls.
- CTF & Security ChallengesCapture the flag techniques, reverse engineering, binary exploitation, and offensive security skills.
privacy & compliance
- Privacy EngineeringPrivacy by design, data minimization, purpose limitation, and building privacy-preserving systems.
- De-identification & AnonymizationPII removal, k-anonymity, differential privacy, tokenization, and re-identification risks.
- Data GovernanceData classification, retention policies, access controls, audit trails, and regulatory compliance.
llm security
- Prompt InjectionDirect and indirect injection, tool call hijacking, cross-context leakage, and defense strategies.
- Jailbreaks & Adversarial PromptingGCG, many-shot jailbreaking, role-play attacks, and why aligned models stay brittle.
- Privacy & Training Data ExtractionMemorization, membership inference attacks, verbatim extraction, and differential privacy.
- Model Supply ChainBackdoor attacks, poisoned weights, malicious fine-tunes, and risks in open-source model ecosystems.