PostgreSQL Indexing Strategies That Cut Query Time by 90%

Bad indexes don't just leave performance on the table — they actively make writes slower while failing to speed up reads. Over the years of managing PostgreSQL databases handling billions of rows at TechYantram, we've distilled the indexing decisions that matter most. This is the guide we wish we had early on.

Reading EXPLAIN ANALYZE Before You Optimize

Never add an index without first understanding what the planner does. Run:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

Critical numbers to look at:

Seq Scan vs Index Scan: A sequential scan on a large table is usually the problem
Actual Rows vs Estimated Rows: Large divergence means stale statistics — run ANALYZE table_name
Buffers hit vs read: "Buffers: shared hit=4512 read=120" — high reads mean data is going to disk
Actual time: The wall-clock cost of each node

B-tree Indexes: The Default and Its Limits

B-tree is the default index type and handles =, <, >, BETWEEN, and LIKE 'prefix%'. It does NOT help with:

LIKE '%suffix' or LIKE '%middle%' (use GIN with pg_trgm)
Array containment (@>) (use GIN)
Full-text search (use GIN)
Nearest-neighbor queries (use GiST or HNSW with pgvector)

Composite Index Column Ordering

The most common indexing mistake is wrong column order in composite indexes. The rule: put the equality columns first, then range columns.

-- Query pattern:
SELECT * FROM orders WHERE user_id = 123 AND created_at > now() - interval '30 days';

-- WRONG order: range column first
CREATE INDEX idx_wrong ON orders(created_at, user_id);

-- CORRECT order: equality column first
CREATE INDEX idx_correct ON orders(user_id, created_at);

With the correct order, PostgreSQL uses the user_id equality to narrow the search space, then scans the created_at range within that narrow band. The wrong order forces a scan of the entire created_at range first.

Partial Indexes: Indexing Only What You Query

If 95% of your queries filter on a specific value (like status = 'pending'), a partial index is dramatically smaller and faster:

-- Only indexes pending orders — maybe 2% of the table
CREATE INDEX idx_pending_orders ON orders(created_at, user_id)
WHERE status = 'pending';

-- The query must include the WHERE clause for the index to be used
SELECT * FROM orders WHERE status = 'pending' AND user_id = 123;

This index might be 50x smaller than a full index on the same columns, fitting entirely in shared buffers and surviving restarts warm.

Covering Indexes: Eliminate Heap Fetches

When all columns in a query are covered by the index, PostgreSQL can answer the query from the index alone — no heap page lookup required. This is called an Index-Only Scan:

-- Query only needs user_id, status, and total_amount
SELECT user_id, status, total_amount
FROM orders
WHERE user_id = 123;

-- Covering index includes all needed columns
CREATE INDEX idx_covering ON orders(user_id) INCLUDE (status, total_amount);

The INCLUDE clause adds non-indexed columns to the leaf pages. The planner will use an Index-Only Scan when the visibility map shows all pages are all-visible, avoiding heap I/O entirely.

GIN Indexes for Full-Text Search and Arrays

For full-text search, GIN indexes on tsvector columns are the standard approach:

-- Add a generated tsvector column
ALTER TABLE articles
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
  ) STORED;

CREATE INDEX idx_articles_fts ON articles USING GIN(search_vector);

-- Query with ranking
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & indexing') query
WHERE search_vector @@ query
ORDER BY rank DESC;

Index Bloat and Maintenance

B-tree indexes accumulate bloat from updates and deletes. Rows removed from the heap leave dead entries in the index. Monitor with:

SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Run REINDEX CONCURRENTLY index_name (PostgreSQL 12+) to rebuild bloated indexes without locking. Schedule this during low-traffic windows. autovacuum helps reclaim dead tuples from the heap but doesn't repack indexes — manual reindex is needed for heavily-updated tables.

When NOT to Add an Index

Indexes have write overhead. Every INSERT, UPDATE, and DELETE must update all indexes on the table. Rules of thumb for skipping indexes:

Tables under ~10,000 rows — sequential scans are fast enough and the planner will usually choose them anyway
Columns with very low cardinality (e.g., a boolean or a 3-value enum on a large table) — unless combined with high-cardinality columns in a composite index
Write-heavy tables with rare reads — the write amplification exceeds the read benefit
Columns rarely used in WHERE, JOIN, or ORDER BY clauses

Practical Checklist

Before any query optimization:

Run EXPLAIN (ANALYZE, BUFFERS) and look for Seq Scans on large tables
Check pg_stat_user_tables for seq_scan counts — high sequential scans are index candidates
Use pg_stat_statements to find the top 10 slowest queries by total time
Add the index, rerun EXPLAIN, and verify an Index Scan appears
Monitor write latency on the affected table to ensure you haven't degraded INSERTs

Disciplined indexing is the difference between a database that scales gracefully and one that requires constant firefighting. The patterns above are what we apply to every production PostgreSQL schema at TechYantram.