A legal firm came to us after their search system returned results for "terminal illness" when attorneys searched for "contract termination." Their previous Elasticsearch implementation used default analyzers, no synonym handling, and zero domain-specific tuning.
The search was fast. It was also useless.
This is the gap that matters in regulated industries. Speed is a given. The hard part is precision, compliance, and domain awareness — three things that generic Elasticsearch implementations consistently miss.
We've built search systems for legal tech (Kompas — 1M+ legal documents, 100ms queries, $220K+ annual ROI), healthcare platforms with HIPAA requirements, and enterprise document management systems. This guide covers the architecture decisions that separate compliance-grade search from a default Elasticsearch deploy.
Why Regulated Industries Break Default Elasticsearch
Elasticsearch out of the box is built for e-commerce and content sites. It's optimized for recall (find everything remotely relevant) over precision (find exactly what's needed). For a product catalog, that's fine. For a legal document database or a system handling Protected Health Information, it's a liability.
Three fundamental problems surface when you deploy standard Elasticsearch in regulated environments:
1. Domain-Specific Language Isn't Optional
Legal terminology has overlapping meanings that standard analyzers can't handle. "Consideration" means one thing in contract law and something entirely different in common English. "Discovery" is a legal process, not a Netflix category. "Brief" is a document, not an adjective.
Healthcare has the same problem. "Discharge" could be a patient leaving, a wound draining, or an electrical event. "Acute" means something specific clinically. Drug names have brand and generic variants that must resolve to the same compound.
Without custom analyzers and synonym handling, your search returns noise. Legal professionals won't tolerate it — they need to find the exact precedent, statute, or clause. A wrong result in legal search isn't an inconvenience. It's a malpractice risk.
What we built for Kompas: Manual synonym lists with automated Elasticsearch integration, domain-specific analyzers for legal terminology, n-gram analyzers for partial matching, and query boosting ranked by document download frequency. The result: attorneys finding the right document in under 100ms across 1M+ legal records.
2. Audit Trails Are a Regulatory Requirement, Not a Feature
In healthcare, HIPAA requires logging every access to Protected Health Information. In legal tech, firms need audit trails for data governance and conflict checks. In financial services, SOC 2 demands comprehensive access logging.
Default Elasticsearch has no audit logging. You'll need to build it at two levels:
Cluster-level auditing: Who accessed which index, when, from what IP. Elasticsearch's built-in audit logging (available in the commercial license) captures this, but you need to configure retention policies and ensure logs themselves are tamper-proof.
Application-level auditing: Which user searched for what, which documents they viewed, which results they exported. This requires custom implementation — Elasticsearch doesn't track application-level user identity out of the box.
We pipe both layers into dedicated audit indices with append-only write patterns and Kibana dashboards that compliance officers can actually use. The audit data itself becomes searchable — which is useful when regulators come asking questions.
3. Access Control Goes Beyond "Admin" and "User"
Regulated industries need granular access control. A junior associate shouldn't see privileged client communications. A nurse shouldn't access records from a different department. A compliance officer needs read access to everything but write access to nothing.
This means implementing:
- Document-level security: Users only see search results they're authorized to access
- Field-level security: Sensitive fields (SSN, patient ID, privileged notes) are hidden from unauthorized roles
- Role-based access control: Mapped to your application's permission model, not just Elasticsearch's native roles
- Multi-tenancy isolation: Data from different clients, firms, or departments physically or logically separated
We've implemented multi-role access control across six different regulated platforms — Align (legal tech, 3 roles), Eagle Eyes (DEA compliance, 4 roles), Meducation (healthcare, 3 roles). The pattern repeats: your Elasticsearch security model must mirror your application's permission model exactly.
The Architecture: Compliance-Grade Elasticsearch
Here's the architecture we use for regulated-industry Elasticsearch deployments. Each layer addresses a specific compliance or precision requirement that generic setups skip.
Compliance-Grade Search Architecture
| Layer | Generic Setup | Regulated Setup |
|---|---|---|
| Encryption | TLS in transit only | TLS in transit + AES-256 at rest + node-to-node encryption |
| Access Control | Basic auth or API key | RBAC + document-level + field-level security |
| Audit Logging | Application logs only | Cluster + application audit with append-only indices |
| Analyzers | Standard analyzer | Custom analyzers with domain synonyms + n-grams + fuzzy |
| Relevancy | Default BM25 scoring | Custom boosting + function_score + domain-weighted ranking |
| Data Retention | No policy | ILM policies meeting regulatory retention requirements |
Every row in that table represents a decision that gets skipped in a standard deployment and becomes expensive to retrofit later. Here's how we handle each one.
Encryption: Three Layers, Not One
Most Elasticsearch deployments enable TLS for the HTTP API and call it done. Regulated industries need three layers:
In transit: TLS 1.2+ on the HTTP API (client-to-cluster) and the transport layer (node-to-node). If your nodes communicate over plaintext internally, a compromised network segment exposes everything.
At rest: AES-256 encryption on the underlying storage. AWS Elasticsearch/OpenSearch Service handles this natively. Self-hosted deployments need dm-crypt or equivalent volume encryption.
In application: Sensitive fields encrypted before indexing for defense-in-depth. This adds latency, so we apply it selectively — SSNs, patient identifiers, privileged communication markers — not every field.
Custom Analyzers: Making Search Actually Useful
This is where most Elasticsearch consulting engagements start and generic implementations fail. The default standard analyzer tokenizes text, lowercases it, and removes stop words. That's wildly insufficient for legal or medical text.
For legal tech, we build analyzer chains that include:
- Synonym filters mapping domain terms ("NDA" → "non-disclosure agreement", "IP" → "intellectual property")
- N-gram tokenizers for partial matching (searching "breach" finds "breach of contract" and "data breach")
- Phrase matching with slop parameters tuned for legal phrasing
- Multi-field indexing where the same content is analyzed differently for exact match vs. fuzzy match vs. autocomplete
For healthcare, add medical synonym dictionaries (SNOMED CT, ICD-10 code descriptions), drug name resolution (brand ↔ generic), and anatomical term expansion.
The Kompas implementation uses manual synonym lists that the firm's librarians update directly, with automated sync to Elasticsearch. This creates a feedback loop: the people who know the domain shape the search behavior. No ML model required — just thoughtful query architecture.
Legal Tech Search: The Kompas Pattern
We built Kompas for a legal tech company drowning in document chaos. 1M+ documents, attorneys wasting hours finding precedents, and a previous search system that only worked if you typed the exact document title.
Here's what the implementation actually looked like:
The numbers:
- 100ms average query response on 1M+ documents
- $220K+ per year recovered in search time savings
- Synonym + fuzzy matching built for legal terminology
- Autocomplete ranked by document download frequency
- Full audit logging via Kibana for compliance
- Delivered in 8-10 weeks on Laravel + Vue.js + AWS
The key architectural decisions:
Precision over recall. Legal search penalizes false positives more than false negatives. An attorney who finds 5 highly relevant documents is happier than one who finds 500 partially relevant ones. We tuned BM25 parameters, added multi-field boosting (title > headings > body), and weighted by document usage patterns.
User-driven relevancy. Download frequency as a ranking signal. If 50 attorneys downloaded a specific contract template, it should rank higher for related queries. This was more effective than any ML-based relevancy model we could have built — the firm's own behavior became the ranking algorithm.
Favorites and history. Search isn't just about the query — it's about the workflow around it. We built "My Favourites" for document bookmarking, search history for re-finding, and "Recently Added" for staying current. These UX features drove adoption more than raw search quality. You can read the full Kompas case study for the complete implementation details.
Healthcare Search: HIPAA-Compliant Elasticsearch
Healthcare adds a compliance layer that changes everything about how you deploy Elasticsearch. It's not enough to encrypt data and add role-based access — you need to prove you did it, document how it works, and maintain evidence that it's operating correctly.
Here's what a HIPAA-compliant Elasticsearch deployment requires beyond standard configuration:
PHI in Elasticsearch: What Counts and What Doesn't
Protected Health Information includes any individually identifiable health data: patient names, medical record numbers, dates of service, diagnosis codes tied to identifiable patients. If your Elasticsearch index contains any of this, the entire cluster falls under HIPAA requirements.
The architecture decision: do you index PHI directly, or index de-identified metadata with references back to a HIPAA-compliant database?
We've done both. For clinical search tools where providers need to find patient records quickly, PHI must be in the index for search to work. For research or analytics tools, de-identified indices with reference IDs are safer and simpler to secure.
The BAA Question
If you're using a managed Elasticsearch service (AWS OpenSearch, Elastic Cloud), you need a Business Associate Agreement with the provider. AWS OpenSearch is HIPAA-eligible — but you still need to configure it correctly. Elastic Cloud offers HIPAA-eligible deployments on specific plans.
Self-hosted means you control compliance but own all the operational burden: patching, backup encryption verification, access log retention. For most healthcare SaaS startups, managed services with a BAA are the right call.
Our healthcare search stack: We pair Elasticsearch with the same compliance patterns we use across our healthcare platforms — HIPAA-grade encryption, append-only audit logging, session management with automatic timeouts, and role-based access control. We've shipped 4+ HIPAA-compliant systems including patient portals, case management, and clinical learning platforms.
Elasticsearch vs OpenSearch for Regulated Deployments
This is one of the most common questions we get in Elasticsearch consulting engagements. The short answer: both work for compliance. The real decision is about licensing, cloud provider, and which security features you need.
Elasticsearch vs OpenSearch for Compliance
| Feature | Elasticsearch | OpenSearch |
|---|---|---|
| Field-level security | Platinum/Enterprise license | Free (built-in) |
| Document-level security | Platinum/Enterprise license | Free (built-in) |
| Audit logging | Platinum/Enterprise license | Free (built-in) |
| HIPAA-eligible managed | Elastic Cloud (specific plans) | AWS OpenSearch Service |
| Encryption at rest | Manual config or Elastic Cloud | AWS KMS integration |
| License | SSPL (not fully open source) | Apache 2.0 (fully open source) |
For healthcare startups on AWS: OpenSearch is the path of least resistance. HIPAA-eligible, security features included free, and managed service reduces ops burden. For enterprises already on Elastic Cloud with paid licenses: Elasticsearch's native features are excellent. We cover this in depth in our Elasticsearch vs OpenSearch comparison.
Common Mistakes in Regulated Elasticsearch Deployments
We've audited failing search implementations across legal tech and healthcare. The same mistakes appear repeatedly:
Indexing PHI with no field-level security. Every user with read access sees every field. A billing clerk sees clinical notes. A researcher sees patient identifiers. One audit and you're explaining why your search architecture violates minimum necessary access.
No data retention policy. Elasticsearch indices grow indefinitely. Compliance frameworks specify retention periods — HIPAA requires maintaining records for 6 years, but also requires you to dispose of data you no longer need. Index Lifecycle Management policies solve this, but nobody configures them during initial setup.
Audit logs in the same cluster as data. If someone compromises your Elasticsearch cluster, they can modify the audit logs too. Audit indices should write to a separate, append-only destination that the application cannot modify retroactively.
Standard analyzers on domain-specific text. We covered this already, but it's worth repeating: the number one complaint we hear from legal and healthcare clients is "search doesn't find what I'm looking for." It's almost always an analyzer problem, not a data problem.
No multi-tenancy strategy. Firms with multiple practice groups, hospitals with multiple departments, SaaS platforms with multiple customers — all need data isolation. The choice between index-per-tenant, filtered aliases, or document-level security depends on scale and compliance requirements. Get this wrong early and migration is painful.
What to Look For in an Elasticsearch Consultant
If you're evaluating Elasticsearch consulting services for a regulated deployment, here's what separates specialists from generalists:
Ask for a regulated-industry deployment. "We've built search" isn't enough. Ask specifically: have you built Elasticsearch for legal tech, healthcare, or financial services? Do you understand HIPAA, SOC 2, or industry-specific compliance requirements? Can you show a real deployment with audit trails and access controls?
Ask about custom analyzers. If their Elasticsearch experience is limited to default configurations, they won't deliver useful search for domain-specific content. Synonym handling, custom tokenizers, and relevancy tuning are where the real expertise lives.
Ask for measurable outcomes. Query response times, indexing throughput, relevancy metrics, ROI from improved search. Any Elastic Stack consultant worth hiring can show numbers from past engagements — not just "we made it faster" but "100ms queries on 1M+ documents with $220K+/year in recovered search time."
Ask about the full stack. Elasticsearch is one component. You need someone who can integrate it with your application layer (Laravel, Node.js, React), deploy it properly (AWS, self-hosted), and build the monitoring/alerting around it (Kibana, Logstash). A consultant who only knows the Elasticsearch API will leave gaps.
Frequently Asked Questions
Can Elasticsearch be used in HIPAA-compliant applications?
Yes. Elasticsearch supports HIPAA compliance when properly configured: encryption at rest and in transit, role-based access control, audit logging, and field-level security. The key is building compliance into the architecture from day one, not bolting it on later. We've built HIPAA-compliant search systems handling 1M+ documents with sub-100ms query times.
What makes Elasticsearch different for legal tech vs general search?
Legal search requires domain-specific synonym handling (e.g., "termination" means contract termination, not job termination), precision over recall, multi-field boosting for relevancy, and audit trails for every search query. Generic implementations miss these nuances — legal professionals won't tolerate irrelevant results the way e-commerce shoppers might.
How much does Elasticsearch consulting for regulated industries cost?
A compliance-grade Elasticsearch implementation typically costs $25,000-$50,000 over 6-10 weeks. This includes architecture design, HIPAA or compliance configuration, index mapping, relevancy tuning, audit logging, and deployment. Performance optimization on existing clusters runs $10,000-$20,000 for 2-4 weeks.
Should I use Elasticsearch or OpenSearch for regulated industries?
Both work for regulated industries. Elasticsearch offers stronger security features natively (field-level security, document-level security in the commercial license). OpenSearch is fully open-source and available as AWS-managed service with HIPAA eligibility. The choice often depends on your cloud provider and licensing preferences rather than compliance capability.
How do you implement audit trails in Elasticsearch?
We use Elasticsearch's built-in audit logging plus custom application-level logging. Every search query, document access, and index modification is logged with timestamp, user ID, IP address, and action details. Kibana dashboards provide real-time visibility into access patterns. For HIPAA, we implement append-only audit indices with retention policies that meet regulatory requirements.
Can Elasticsearch handle sensitive data like PHI or legal documents?
Yes, with proper configuration. We implement TLS for all cluster communication, encryption at rest for indices, field-level security to restrict PHI access, role-based access control, and comprehensive audit logging. For legal documents, we add document-level security so users only see documents they're authorized to access.
Next Step
If you're building search for a regulated industry — legal tech, healthcare, financial services — the architecture decisions you make in week one determine your compliance posture for years. Getting it right from the start costs $25K-$50K. Retrofitting it later costs 2-3x that.
We offer a free 30-minute architecture review where we'll look at your current search setup (or planned architecture), identify compliance gaps, and give you a realistic assessment of what it takes to get to production. No pitch deck — just an honest technical conversation.
Book a free architecture review →
Or explore more from our Elasticsearch consulting practice: