AI Engineering Blog

Cole Hoffer

Making AI do the thing.

Practical notes on building production RAG systems, LLM evaluations, and retrieval optimization.

Using citation behavior in production RAG systems to generate labeled training data for domain-specific reranking models.

How human follow-up behavior reveals response quality, and why specific instruction adherence outperforms vague relevance scoring.

Using LLM-based classification as a second pass to filter retrieval candidates when similarity thresholds fail to generalize.

About me

Staff AI Engineer at Reforge.

This is a collection of notes on lessons learned from yelling at LLMs, professionally.

→

Latest

A comprehensive guide to using Reciprocal Rank Fusion (RRF) to combine BM25 and semantic search results for production RAG systems.

Using Hypothetical Document Embeddings to bridge vocabulary gaps between user queries and specialized document corpora.

Why decomposing queries into structured filters before semantic search improves retrieval precision and performance.

A comprehensive, interactive guide to understanding the BM25 ranking algorithm for AI engineers building RAG systems and search applications.

Combining chain-of-thought reasoning with logprob extraction improves LLM classification accuracy while giving you real confidence scores.