Elasticsearch with Spring Boot: Full-Text Search, Aggregations & Production Guide (2026)

A complete guide to integrating Elasticsearch 8 with Spring Boot 3: from custom analyzers and Spring Data ES to faceted search, relevance tuning, bulk indexing, zero-downtime reindexing, and production cluster operations.

Elasticsearch Spring Boot Production Guide 2026
TL;DR: Use Elasticsearch when you need ranked full-text search, faceted navigation, or analytics at scale. Spring Data Elasticsearch 5 + the official Java client (ES 8) makes it production-ready in Spring Boot 3 — with type-safe queries, automatic index management, and reactive support.

1. When to Use Elasticsearch

FeaturePostgreSQL FTSElasticsearchSolr
Full-text ranking✅ Basic BM25✅ Advanced BM25, tunable✅ Good
Custom tokenizers❌ Limited✅ Extensive✅ Good
Faceted search❌ Manual✅ Native aggregations✅ Native facets
Horizontal scale❌ Complex sharding✅ Native clustering✅ SolrCloud
Spring integration✅ Spring Data JPA✅ Spring Data ES 5⚠️ Limited

Decision guide: Use Elasticsearch when you need (a) ranked relevance scoring with tunable weights, (b) faceted search for e-commerce-style filtering, (c) more than 50M searchable documents, or (d) real-time analytics on log/event data alongside search.

2. Core Concepts

  • Index: A collection of documents (equivalent to a DB table). ES 8 defaults to 1 primary shard.
  • Shard: Horizontal slice of an index; each shard is an independent Lucene instance. Scale reads by adding replicas; scale writes/capacity by adding primary shards.
  • Mapping: Schema definition for field types. Always use explicit mapping in production — dynamic mapping can create unintended field types.
  • Inverted index: Core data structure. Maps each token (word) to the list of documents containing it. Searching for "laptop" is O(1) — vs O(N) for SQL LIKE '%laptop%'.
  • Analyzer pipeline: Character filters → Tokenizer → Token filters. Applied at index time and query time. Custom analyzers let you control how text is tokenized (e.g., edge-ngram for autocomplete).

3. Spring Boot 3 Setup

// pom.xml — Spring Data Elasticsearch 5 (ES 8.x)
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!-- IMPORTANT: RestHighLevelClient is REMOVED in ES 8 — use ElasticsearchClient -->

# application.yml
spring:
  elasticsearch:
    uris: https://localhost:9200
    username: elastic
    password: ${ES_PASSWORD}
    connection-timeout: 3s
    socket-timeout: 30s
❌ BAD: Using RestHighLevelClient — deprecated since ES 7.15, removed in ES 8. Spring Data ES 5 uses the new ElasticsearchClient (Java API Client) automatically. Do not add the old high-level client dependency.

4. Index Mapping with @Document & @Field

// ✅ GOOD: Explicit mapping with @Document and typed @Field annotations
@Document(indexName = "products", shards = 3, replicas = 1)
@Setting(settingPath = "es-settings.json")  // custom analyzers
public class ProductDocument {

    @Id
    private String id;

    @MultiField(mainField = @Field(type = FieldType.Text, analyzer = "custom_edge_ngram"),
                otherFields = {@InnerField(suffix = "keyword", type = FieldType.Keyword)})
    private String name;

    @Field(type = FieldType.Text, analyzer = "custom_synonym")
    private String description;

    @Field(type = FieldType.Keyword)     // exact match, used in facets
    private String category;

    @Field(type = FieldType.Double)
    private double price;

    @Field(type = FieldType.Date, format = DateFormat.epoch_millis)
    private Instant createdAt;

    @Field(type = FieldType.Integer)
    private int salesCount;              // for popularity boosting

    @CompletionField(maxInputLength = 100)
    private Completion suggest;          // autocomplete
}

5. Custom Analyzers: Edge-Ngram, Synonym, HTML Strip

// es-settings.json (place in src/main/resources/)
{
  "analysis": {
    "filter": {
      "edge_ngram_filter": {
        "type": "edge_ngram",
        "min_gram": 2,
        "max_gram": 15
      },
      "synonym_filter": {
        "type": "synonym",
        "synonyms": ["mobile, phone, cell", "laptop, notebook, computer"]
      }
    },
    "analyzer": {
      "custom_edge_ngram": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": ["lowercase", "edge_ngram_filter"]
      },
      "custom_synonym": {
        "type": "custom",
        "tokenizer": "standard",
        "char_filter": ["html_strip"],
        "filter": ["lowercase", "synonym_filter", "stop"]
      }
    }
  }
}

Edge-ngram analyzer enables autocomplete: indexing "laptop" produces "la", "lap", "lapt", "lapto", "laptop" — a prefix search for "lap" matches it. Use a separate search analyzer (standard) so the query text is not also ngrammed at search time.

6. Full-Text Queries: match, bool, highlight

// ✅ GOOD: NativeQuery with BoolQuery, highlights, and multi_match
@Service
public class ProductSearchService {
    @Autowired private ElasticsearchOperations operations;

    public SearchResult search(String query, String category, Pageable pageable) {
        // Build bool query: must = full-text, filter = category (cached, no scoring)
        Query esQuery = NativeQuery.builder()
            .withQuery(q -> q.bool(b -> {
                b.must(m -> m.multiMatch(mm -> mm
                    .query(query)
                    .fields("name^3", "description^1")  // name boosted 3x
                    .type(TextQueryType.BestFields)
                    .fuzziness("AUTO")));
                if (category != null) {
                    b.filter(f -> f.term(t -> t.field("category").value(category)));
                }
                return b;
            }))
            .withHighlightQuery(new HighlightQuery(
                new Highlight(List.of(new HighlightField("name"), new HighlightField("description"))),
                ProductDocument.class))
            .withPageable(pageable)
            .build();

        SearchHits<ProductDocument> hits = operations.search(esQuery, ProductDocument.class);
        return mapToResult(hits);
    }
}

7. Aggregations: Terms, Date Histogram, Facets

// ✅ GOOD: Aggregation for category facets + price range distribution
NativeQuery aggQuery = NativeQuery.builder()
    .withQuery(q -> q.matchAll(m -> m))
    .withAggregation("categories", Aggregation.of(a -> a
        .terms(t -> t.field("category").size(20))))
    .withAggregation("price_ranges", Aggregation.of(a -> a
        .range(r -> r.field("price")
            .ranges(
                AggregationRange.of(rng -> rng.to(50.0)),
                AggregationRange.of(rng -> rng.from(50.0).to(200.0)),
                AggregationRange.of(rng -> rng.from(200.0))
            ))))
    .withMaxResults(0)  // only aggregations, no hits
    .build();

SearchHits<ProductDocument> result = operations.search(aggQuery, ProductDocument.class);

// Parse category facets
ElasticsearchAggregation catAgg = result.getAggregations().get("categories");
catAgg.aggregation().getAggregate().sterms().buckets().array()
    .forEach(b -> System.out.println(b.key().stringValue() + ": " + b.docCount()));

8. Relevance Tuning: Boosting & Function Score

Raw BM25 scores rank by text similarity only. Production search needs business logic: boost recent products, popular items, or specific brands.

// ✅ GOOD: function_score with recency decay + popularity field value factor
Query functionScoreQuery = NativeQuery.builder()
    .withQuery(q -> q.functionScore(fs -> fs
        .query(inner -> inner.multiMatch(mm -> mm
            .query(searchText).fields("name^3", "description")))
        .functions(
            // Boost by recency: score decays if older than 30 days
            FunctionScore.of(f -> f.gauss(g -> g
                .field("createdAt")
                .placement(p -> p.origin(new FieldValue.Builder().stringValue("now").build())
                    .scale(new JsonData.Builder().build())  // "30d"
                    .decay(0.5)))),
            // Boost by sales count (popularity)
            FunctionScore.of(f -> f.fieldValueFactor(fvf -> fvf
                .field("salesCount")
                .factor(0.1)
                .modifier(FieldValueFactorModifier.Log1p)
                .missing(1.0)))
        )
        .scoreMode(FunctionScoreMode.Sum)
        .boostMode(FunctionBoostMode.Multiply)))
    .build();

9. Bulk Indexing & Zero-Downtime Reindexing

// ✅ GOOD: Zero-downtime reindex pattern with index aliases
// Step 1: Create new versioned index
ElasticsearchClient client;
String newIndex = "products-" + LocalDate.now();
client.indices().create(c -> c.index(newIndex));

// Step 2: Bulk index data to new index (batch of 500)
BulkIngester<ProductDocument> ingester = BulkIngester.of(b -> b
    .client(client)
    .maxOperations(500)
    .maxConcurrentRequests(3)
    .listener(new BulkListener<>() {
        @Override public void beforeBulk(long executionId, BulkRequest request, List items) {}
        @Override public void afterBulk(long executionId, BulkRequest request, List items, BulkResponse response) {
            if (response.errors()) log.error("Bulk had errors");
        }
        @Override public void afterBulk(long executionId, BulkRequest request, List items, Throwable failure) {
            log.error("Bulk failed", failure);
        }
    }));

productRepository.streamAll().forEach(p ->
    ingester.add(op -> op.index(i -> i.index(newIndex).id(p.getId()).document(p))));
ingester.close();  // flush remaining

// Step 3: Atomically swap alias "products" to point to new index
client.indices().updateAliases(u -> u.actions(
    Action.of(a -> a.remove(r -> r.index("products-*").alias("products"))),
    Action.of(a -> a.add(add -> add.index(newIndex).alias("products")))
));

10. Production Operations

AreaKey ActionTool / API
Cluster healthMonitor green/yellow/red statusGET /_cluster/health
Slow queriesEnable slow log (>100ms)_settings slowlog thresholds
JVM heapSet -Xms = -Xmx, max 26GB (compressed oops)jvm.options
Index lifecycleILM for log rotation (hot/warm/cold/delete)PUT /_ilm/policy
SnapshotsDaily snapshots to S3 (Elastic snapshot API)PUT /_snapshot

11. Interview Questions & Production Checklist

Q: When would you NOT use Elasticsearch for search?

A: When your dataset is under 1M documents — PostgreSQL full-text search (tsvector/GIN index) is sufficient and avoids operational overhead. When you need strong ACID consistency for the search index. When the team has no Elasticsearch expertise — operational complexity (cluster management, mapping migrations, JVM tuning) is significant.

✅ Production Checklist
  • Use explicit mapping (disable dynamic)
  • ElasticsearchClient not RestHighLevelClient
  • Index aliases for zero-downtime reindex
  • Set JVM heap to 50% of RAM, max 26GB
  • Use filter context (not query context) for non-scoring filters
  • Enable slow query log in production
  • Replicas = 1 minimum for HA
  • Daily snapshots to S3
Tags:
elasticsearch spring boot spring data elasticsearch elasticsearch java client 2026 elasticsearch aggregations full text search java elasticsearch custom analyzer

Leave a Comment

Related Posts

System Design

Distributed Search Engine Design

DevOps

ELK Stack for Java Microservices

System Design

Autocomplete System Design

Core Java

PostgreSQL Query Optimization

Back to Blog Last updated: April 11, 2026