Java LangChain4j: Building AI Agents in Pure Java Without Python
LangChain4j is Java's answer to Python's LangChain — a comprehensive framework for building LLM-powered applications and autonomous AI agents using idiomatic Java. While Python dominated early AI tooling, LangChain4j brings the full agent stack to the JVM: tool execution via annotated methods, persistent conversation memory, RAG with embedded vector stores, and structured output parsing into Java records. This post covers the full framework from first steps through production-grade agent deployment in Spring Boot.
Table of Contents
- Why LangChain4j? Java vs Python for AI Agents
- AiServices: The Zero-Boilerplate Agent Interface
- @Tool Annotation: Giving Agents Java Superpowers
- Chat Memory: Per-User Conversation State
- RAG in LangChain4j: Document Ingestion to Query Answering
- Structured Output: LLM Responses as Java Records
- Streaming and Async Responses
- Spring Boot Integration: @Bean + @Service Wiring
- Key Takeaways
1. Why LangChain4j? Java vs Python for AI Agents
The "just use Python for AI" argument has a real cost in enterprise environments: a Python microservice adds a second runtime, a second CI pipeline, a second set of deployment artifacts, and a language mismatch between the AI tier and the core business logic tier. LangChain4j eliminates this cost by running the full agent stack inside the JVM, next to your existing Spring Boot services.
LangChain4j's key differentiators over raw API clients are: the AiServices interface declaration pattern (define what you want, LangChain4j wires the how), annotation-driven tool registration with automatic JSON schema generation, built-in chat memory with multiple backend options (in-memory, Redis, custom), and a unified embedding store abstraction covering pgvector, Chroma, Weaviate, and Pinecone.
2. AiServices: The Zero-Boilerplate Agent Interface
The AiServices builder is LangChain4j's flagship feature. You declare an interface with @SystemMessage, @UserMessage, and @MemoryId annotations, and LangChain4j generates a fully functional implementation at runtime — no implementation code required:
<!-- pom.xml -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-spring-boot-starter</artifactId>
<version>0.36.2</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-open-ai-spring-boot-starter</artifactId>
<version>0.36.2</version>
</dependency>// Declare the agent interface — no implementation needed
@AiService
public interface CustomerSupportAgent {
@SystemMessage("""
You are a helpful customer support agent for an e-commerce platform.
Be concise, professional, and always check order status before responding.
Today's date: {{current_date}}
""")
String chat(@MemoryId String userId, @UserMessage String userMessage);
}
// Spring Boot auto-wires this — use it anywhere
@RestController
@RequiredArgsConstructor
public class SupportController {
private final CustomerSupportAgent agent;
@PostMapping("/support/chat")
public ResponseEntity<String> chat(
@RequestHeader("X-User-Id") String userId,
@RequestBody String message) {
return ResponseEntity.ok(agent.chat(userId, message));
}
}# application.properties
langchain4j.open-ai.chat-model.api-key=${OPENAI_API_KEY}
langchain4j.open-ai.chat-model.model-name=gpt-4o
langchain4j.open-ai.chat-model.temperature=0.33. @Tool Annotation: Giving Agents Java Superpowers
The @Tool annotation transforms any Java method into a callable LLM tool. LangChain4j introspects the method signature, generates a JSON schema from parameter types and @P descriptions, and handles the complete request/response cycle — including multi-step tool calls:
@Component
@RequiredArgsConstructor
public class OrderTools {
private final OrderService orderService;
private final ShippingService shippingService;
@Tool("Get the current status and details of an order by order ID")
public OrderStatus getOrderStatus(@P("The order ID, e.g. ORD-12345") String orderId) {
return orderService.getStatus(orderId);
}
@Tool("Cancel an order if it has not yet shipped")
public String cancelOrder(
@P("The order ID to cancel") String orderId,
@P("Reason for cancellation") String reason) {
return orderService.cancel(orderId, reason)
? "Order " + orderId + " has been successfully cancelled."
: "Order " + orderId + " cannot be cancelled — it has already shipped.";
}
@Tool("Get estimated delivery date for a shipped order")
public String getDeliveryEstimate(@P("The order ID") String orderId) {
return shippingService.getEstimatedDelivery(orderId)
.map(date -> "Expected delivery: " + date)
.orElse("No delivery estimate available yet.");
}
}
// Register tools with the agent
@Configuration
public class AgentConfig {
@Bean
public CustomerSupportAgent customerSupportAgent(
ChatLanguageModel chatModel,
ChatMemory chatMemory,
OrderTools orderTools) {
return AiServices.builder(CustomerSupportAgent.class)
.chatLanguageModel(chatModel)
.chatMemoryProvider(memoryId -> MessageWindowChatMemory.withMaxMessages(20))
.tools(orderTools)
.build();
}
}@Tool methods before executing business logic. The LLM may pass unexpected parameter values. Use Bean Validation annotations (@NotNull, @Pattern) on tool parameters and handle validation exceptions gracefully.
4. Chat Memory: Per-User Conversation State
Stateful conversations require per-user memory. LangChain4j separates the memory interface from its implementation, making it easy to switch from in-memory (development) to Redis (production) without changing agent code:
@Configuration
public class MemoryConfig {
// Production: Redis-backed per-user memory with TTL
@Bean
@Profile("prod")
public ChatMemoryProvider redisChatMemoryProvider(RedisTemplate<String, Object> redis) {
return memoryId -> new RedisChatMemory(redis, memoryId.toString(),
MessageWindowChatMemory.builder()
.maxMessages(30)
.build());
}
// Development: in-memory (no persistence between restarts)
@Bean
@Profile("dev")
public ChatMemoryProvider inMemoryChatMemoryProvider() {
Map<Object, ChatMemory> memories = new ConcurrentHashMap<>();
return memoryId -> memories.computeIfAbsent(memoryId,
id -> MessageWindowChatMemory.withMaxMessages(20));
}
}
// Agent with per-user memory — userId is the @MemoryId key
@AiService
public interface PersonalAssistant {
@SystemMessage("You are a personal assistant. Remember user preferences across conversations.")
String chat(@MemoryId long userId, @UserMessage String message);
}
MessageWindowChatMemory keeps the last N messages. TokenWindowChatMemory keeps messages within a token budget — better for cost control since it won't accidentally exceed context window limits on long conversations.
5. RAG in LangChain4j: Document Ingestion to Query Answering
LangChain4j's RAG stack covers the full pipeline from document loading through query-time retrieval. The EasyRag module provides a one-call setup; advanced use cases compose individual components:
// Document ingestion — run once, store in pgvector
@Component
@RequiredArgsConstructor
public class KnowledgeBaseIngester {
private final EmbeddingModel embeddingModel;
private final EmbeddingStore<TextSegment> embeddingStore;
public void ingest(Path documentPath) {
List<Document> documents = FileSystemDocumentLoader.loadDocuments(documentPath);
DocumentSplitter splitter = DocumentSplitters.recursive(500, 50); // chunk size, overlap
List<TextSegment> segments = splitter.splitAll(documents);
List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
embeddingStore.addAll(embeddings, segments);
}
}
// Retrieval-augmented agent — answers grounded in documents
@Configuration
@RequiredArgsConstructor
public class RagAgentConfig {
@Bean
public CustomerSupportAgent ragAgent(
ChatLanguageModel chatModel,
EmbeddingModel embeddingModel,
EmbeddingStore<TextSegment> embeddingStore) {
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.75)
.build();
return AiServices.builder(CustomerSupportAgent.class)
.chatLanguageModel(chatModel)
.contentRetriever(retriever)
.chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(10))
.build();
}
}# pgvector EmbeddingStore configuration
langchain4j.pgvector.host=localhost
langchain4j.pgvector.port=5432
langchain4j.pgvector.database=mydb
langchain4j.pgvector.table=embeddings
langchain4j.pgvector.dimension=1536 # text-embedding-3-small output dimension6. Structured Output: LLM Responses as Java Records
LangChain4j can return structured Java types directly from agent interfaces — no JSON parsing required. It generates a JSON schema instruction, appends it to the system prompt, and deserializes the response:
// Java records for structured output
record SentimentAnalysis(Sentiment sentiment, double confidence, List<String> reasons) {}
enum Sentiment { POSITIVE, NEGATIVE, NEUTRAL }
record ExtractedEntities(
List<String> productNames,
List<String> orderIds,
String urgencyLevel // HIGH, MEDIUM, LOW
) {}
@AiService
public interface TextAnalysisAgent {
// Returns structured Java record — LangChain4j handles JSON schema + parsing
@UserMessage("Analyze the sentiment of the following customer message: {{message}}")
SentimentAnalysis analyzeSentiment(@V("message") String message);
@SystemMessage("Extract all mentioned entities from customer support messages.")
@UserMessage("{{message}}")
ExtractedEntities extractEntities(@V("message") String message);
}
// Usage in a Spring service:
@Service
@RequiredArgsConstructor
public class TicketTriageService {
private final TextAnalysisAgent analysisAgent;
public TicketPriority triage(String ticketBody) {
ExtractedEntities entities = analysisAgent.extractEntities(ticketBody);
SentimentAnalysis sentiment = analysisAgent.analyzeSentiment(ticketBody);
return TicketPriority.from(entities.urgencyLevel(), sentiment.sentiment());
}
}7. Streaming and Async Responses
LangChain4j supports streaming via TokenStream — a reactive-style API for receiving token chunks as they arrive, enabling real-time display without waiting for the full response:
@AiService
public interface StreamingAssistant {
// Return TokenStream for streaming support
TokenStream chat(@MemoryId String userId, @UserMessage String message);
}
@RestController
@RequiredArgsConstructor
public class StreamingController {
private final StreamingAssistant assistant;
@GetMapping(value = "/assistant/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter stream(
@RequestHeader("X-User-Id") String userId,
@RequestParam String message) {
SseEmitter emitter = new SseEmitter(60_000L); // 60s timeout
assistant.chat(userId, message)
.onNext(token -> {
try { emitter.send(token); }
catch (IOException e) { emitter.completeWithError(e); }
})
.onComplete(response -> emitter.complete())
.onError(emitter::completeWithError)
.start();
return emitter;
}
}8. Spring Boot Integration: @Bean + @Service Wiring
LangChain4j's Spring Boot starter auto-configures ChatLanguageModel and EmbeddingModel beans. The @AiService annotation on agent interfaces triggers automatic registration as Spring beans, making agents injectable anywhere in the application context:
// Complete Spring Boot setup for a production LangChain4j agent
@SpringBootApplication
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
// Everything auto-wired via langchain4j.open-ai.* properties:
// - ChatLanguageModel bean (OpenAI GPT-4o)
// - EmbeddingModel bean (text-embedding-3-small)
// - @AiService interfaces registered as Spring beans
// - @Tool components discovered and registered automatically
// Observability: LangChain4j integrates with Micrometer for latency + token metrics
@Configuration
public class ObservabilityConfig {
@Bean
public ChatModelListener tokenMetricsListener(MeterRegistry registry) {
return event -> {
if (event instanceof LlmResponseEvent resp) {
registry.counter("llm.tokens.input").increment(resp.inputTokenCount());
registry.counter("llm.tokens.output").increment(resp.outputTokenCount());
}
};
}
}9. Key Takeaways
- LangChain4j's
AiServices.builder()+@AiServiceannotation eliminates agent boilerplate — declare an interface, LangChain4j wires the implementation. @Toolannotations on Spring beans transform existing service methods into LLM-callable tools with automatic JSON schema generation.- Per-user
ChatMemoryvia@MemoryIdenables stateful multi-turn conversations — useMessageWindowChatMemoryfor development, Redis-backed for production. - LangChain4j's RAG stack (document loaders, splitters, embedding stores) covers the full ingestion-to-retrieval pipeline with minimal code.
- Structured output returns strongly-typed Java records from LLM responses — no JSON parsing, no DTOs, just annotated interfaces.
- The Spring Boot starter auto-configures all model and embedding beans;
@AiServiceinterfaces become injectable Spring beans with zero configuration.
Leave a Comment
Using LangChain4j in production? Share your agent architecture, tool design patterns, or questions below.