Bad indexes don't just leave performance on the table — they actively make writes slower while failing to speed up reads. Over the years of managing PostgreSQL databases handling billions of rows at TechYantram, we've distilled the indexing decisions that matter most. This is the guide we wish we had early on.
Reading EXPLAIN ANALYZE Before You Optimize
Never add an index without first understanding what the planner does. Run:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';
Critical numbers to look at:
- Seq Scan vs Index Scan: A sequential scan on a large table is usually the problem
- Actual Rows vs Estimated Rows: Large divergence means stale statistics — run
ANALYZE table_name - Buffers hit vs read: "Buffers: shared hit=4512 read=120" — high reads mean data is going to disk
- Actual time: The wall-clock cost of each node
B-tree Indexes: The Default and Its Limits
B-tree is the default index type and handles =, <, >, BETWEEN, and LIKE 'prefix%'. It does NOT help with:
LIKE '%suffix'orLIKE '%middle%'(use GIN withpg_trgm)- Array containment (
@>) (use GIN) - Full-text search (use GIN)
- Nearest-neighbor queries (use GiST or HNSW with
pgvector)
Composite Index Column Ordering
The most common indexing mistake is wrong column order in composite indexes. The rule: put the equality columns first, then range columns.
-- Query pattern: SELECT * FROM orders WHERE user_id = 123 AND created_at > now() - interval '30 days'; -- WRONG order: range column first CREATE INDEX idx_wrong ON orders(created_at, user_id); -- CORRECT order: equality column first CREATE INDEX idx_correct ON orders(user_id, created_at);
With the correct order, PostgreSQL uses the user_id equality to narrow the search space, then scans the created_at range within that narrow band. The wrong order forces a scan of the entire created_at range first.
Partial Indexes: Indexing Only What You Query
If 95% of your queries filter on a specific value (like status = 'pending'), a partial index is dramatically smaller and faster:
-- Only indexes pending orders — maybe 2% of the table CREATE INDEX idx_pending_orders ON orders(created_at, user_id) WHERE status = 'pending'; -- The query must include the WHERE clause for the index to be used SELECT * FROM orders WHERE status = 'pending' AND user_id = 123;
This index might be 50x smaller than a full index on the same columns, fitting entirely in shared buffers and surviving restarts warm.
Covering Indexes: Eliminate Heap Fetches
When all columns in a query are covered by the index, PostgreSQL can answer the query from the index alone — no heap page lookup required. This is called an Index-Only Scan:
-- Query only needs user_id, status, and total_amount SELECT user_id, status, total_amount FROM orders WHERE user_id = 123; -- Covering index includes all needed columns CREATE INDEX idx_covering ON orders(user_id) INCLUDE (status, total_amount);
The INCLUDE clause adds non-indexed columns to the leaf pages. The planner will use an Index-Only Scan when the visibility map shows all pages are all-visible, avoiding heap I/O entirely.
GIN Indexes for Full-Text Search and Arrays
For full-text search, GIN indexes on tsvector columns are the standard approach:
-- Add a generated tsvector column
ALTER TABLE articles
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
) STORED;
CREATE INDEX idx_articles_fts ON articles USING GIN(search_vector);
-- Query with ranking
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & indexing') query
WHERE search_vector @@ query
ORDER BY rank DESC;
Index Bloat and Maintenance
B-tree indexes accumulate bloat from updates and deletes. Rows removed from the heap leave dead entries in the index. Monitor with:
SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC;
Run REINDEX CONCURRENTLY index_name (PostgreSQL 12+) to rebuild bloated indexes without locking. Schedule this during low-traffic windows. autovacuum helps reclaim dead tuples from the heap but doesn't repack indexes — manual reindex is needed for heavily-updated tables.
When NOT to Add an Index
Indexes have write overhead. Every INSERT, UPDATE, and DELETE must update all indexes on the table. Rules of thumb for skipping indexes:
- Tables under ~10,000 rows — sequential scans are fast enough and the planner will usually choose them anyway
- Columns with very low cardinality (e.g., a boolean or a 3-value enum on a large table) — unless combined with high-cardinality columns in a composite index
- Write-heavy tables with rare reads — the write amplification exceeds the read benefit
- Columns rarely used in WHERE, JOIN, or ORDER BY clauses
Practical Checklist
Before any query optimization:
- Run
EXPLAIN (ANALYZE, BUFFERS)and look for Seq Scans on large tables - Check
pg_stat_user_tablesforseq_scancounts — high sequential scans are index candidates - Use
pg_stat_statementsto find the top 10 slowest queries by total time - Add the index, rerun EXPLAIN, and verify an Index Scan appears
- Monitor write latency on the affected table to ensure you haven't degraded INSERTs
Disciplined indexing is the difference between a database that scales gracefully and one that requires constant firefighting. The patterns above are what we apply to every production PostgreSQL schema at TechYantram.