Elasticsearch is powerful but unforgiving. Small configuration mistakes can cause 10x performance degradation. As Elasticsearch consultants, we've audited dozens of clusters. These seven mistakes appear in almost every underperforming deployment.
1. Using Dynamic Mapping in Production
The mistake: Letting Elasticsearch auto-detect field types instead of defining explicit mappings.
Why it's bad:
- Elasticsearch guesses wrong. A field that should be
keywordbecomestext - Strings are indexed as both
textandkeywordby default (doubling storage) - Mapping conflicts when different documents have different types for the same field
- You can't change mappings after indexing without reindexing everything
The fix: Always define explicit mappings before indexing data.
PUT /my-index
{
"mappings": {
"dynamic": "strict", // Reject unmapped fields
"properties": {
"title": { "type": "text", "analyzer": "standard" },
"status": { "type": "keyword" },
"created_at": { "type": "date" },
"price": { "type": "float" }
}
}
}
Setting "dynamic": "strict" rejects documents with unmapped fields, preventing silent failures.
2. Over-Sharding
The mistake: Creating too many shards "for future scalability."
Why it's bad:
- Each shard has overhead (memory, file handles, threads)
- Queries hit all shards, so more shards = more coordination overhead
- Small shards are inefficient (Lucene prefers larger segments)
- A common anti-pattern: 50 shards for an index with 100,000 documents
The fix: Size shards between 10GB and 50GB. For most use cases:
| Data Size | Recommended Shards |
|---|---|
| < 10GB | 1 primary shard |
| 10GB - 50GB | 1-2 primary shards |
| 50GB - 200GB | 3-5 primary shards |
| > 200GB | Plan based on nodes and use case |
You can always split shards later with the Split API. You can't easily merge them.
3. Using Queries Instead of Filters
The mistake: Putting filter clauses in the must section instead of filter.
Bad:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "elasticsearch" } },
{ "term": { "status": "published" } }, // WRONG!
{ "range": { "date": { "gte": "2025-01-01" } } } // WRONG!
]
}
}
}
Good:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "elasticsearch" } }
],
"filter": [
{ "term": { "status": "published" } }, // CORRECT
{ "range": { "date": { "gte": "2025-01-01" } } } // CORRECT
]
}
}
}
Why it matters:
filterclauses are cached. After the first query, subsequent queries skip evaluation.filterclauses don't calculate relevancy scores (faster).- This single change can improve query performance by 2-5x for filtered queries.
Rule of thumb: If a clause is yes/no (not "how relevant"), put it in filter.
4. Returning Too Much Data
The mistake: Returning full documents when you only need a few fields.
Why it's bad:
- Large
_sourcefields consume network bandwidth and memory - Serialization/deserialization adds latency
- If you're only showing titles in search results, why return the full document body?
The fix: Use source filtering to return only what you need.
{
"query": { "match": { "content": "elasticsearch" } },
"_source": ["title", "author", "created_at"],
"highlight": {
"fields": {
"content": { "fragment_size": 150 }
}
}
}
This returns only title, author, and created_at fields, plus highlighted snippets from content. Response size drops dramatically.
5. Missing Refresh Interval Tuning
The mistake: Using the default 1-second refresh interval for batch indexing.
Why it's bad:
- Every refresh creates a new Lucene segment
- High refresh frequency = many small segments = slow queries
- During bulk indexing, you're wasting resources refreshing data that isn't being searched yet
The fix: Disable refresh during bulk indexing, then refresh once at the end.
// Disable refresh before bulk indexing
PUT /my-index/_settings
{ "index": { "refresh_interval": "-1" } }
// Bulk index your data...
// Re-enable refresh and force a refresh
PUT /my-index/_settings
{ "index": { "refresh_interval": "30s" } }
POST /my-index/_refresh
For production search indexes that don't need real-time indexing, consider 30s or 60s intervals instead of the default 1s.
6. No Synonym or Analyzer Strategy
The mistake: Using default analyzers for domain-specific search.
Why it's bad:
- Legal search: "plaintiff" doesn't find "claimant"
- E-commerce: "laptop" doesn't find "notebook computer"
- Users get frustrated and blame "the search doesn't work"
The fix: Build domain-specific synonyms and custom analyzers.
PUT /products
{
"settings": {
"analysis": {
"filter": {
"product_synonyms": {
"type": "synonym",
"synonyms": [
"laptop, notebook, portable computer",
"phone, mobile, smartphone, cell phone",
"tv, television, flat screen"
]
}
},
"analyzer": {
"product_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "product_synonyms", "stemmer"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "product_analyzer"
}
}
}
}
Pro tip: Build your synonym list from search logs. See what users search for and what they actually click on.
7. Not Monitoring Query Performance
The mistake: No visibility into which queries are slow or why.
Why it's bad:
- Slow queries degrade over time as data grows
- One bad query pattern can bring down the cluster
- Without metrics, you're guessing at what to optimize
The fix: Enable slow query logging and monitor with Kibana.
PUT /my-index/_settings
{
"index.search.slowlog.threshold.query.warn": "1s",
"index.search.slowlog.threshold.query.info": "500ms",
"index.search.slowlog.threshold.fetch.warn": "500ms",
"index.search.slowlog.level": "info"
}
Also monitor these Elasticsearch metrics:
- Query latency (p50, p95, p99): Median vs tail latency
- Indexing rate: Documents indexed per second
- JVM heap usage: Should stay below 75%
- Segment count: High counts indicate merge pressure
- Search thread pool rejections: Indicates overload
Quick Checklist
Before deploying Elasticsearch to production:
- Define explicit mappings (no dynamic mapping)
- Right-size your shards (10-50GB each)
- Use
filtercontext for yes/no clauses - Return only the fields you need
- Tune refresh interval for your use case
- Build domain-specific synonyms
- Enable slow query logging and monitoring
Need an Elasticsearch Audit?
We've optimized clusters that went from 5-second queries to under 100ms. As Elasticsearch specialists, we know where to look.
Common results from our performance audits:
- 5-10x query speed improvements
- 30-50% reduction in infrastructure costs
- Elimination of timeout errors