As a backend developer, you’ve likely faced the challenge of optimizing database queries to keep your application fast and scalable. MySQL, one of the most popular relational databases, offers a powerful tool to achieve this: indexing. Indexes can dramatically speed up queries, but they come with trade-offs and nuances that require careful consideration. In this in-depth guide, I’ll walk you through everything you need to know about MySQL indexing, from the basics to advanced techniques like JSON and multi-valued indexes. Whether you’re a beginner or a seasoned developer, this post will equip you with practical insights to optimize your MySQL databases.
Let’s dive into the world of MySQL indexing, using a sample users
table to illustrate concepts with real-world examples.
What is a MySQL Index?
A MySQL index is a data structure, typically a B+ tree, that improves the speed of data retrieval by allowing the database to locate rows without scanning the entire table. Think of it as the index in a book: instead of flipping through every page to find a topic, you check the index to jump directly to the relevant pages.
Why Indexes Matter
-
Performance: Indexes reduce I/O and CPU usage, making queries like
SELECT ... WHERE
orJOIN
faster, especially on large tables. - Scalability: They ensure your application remains responsive as data grows.
-
Trade-offs: Indexes consume disk space and slow down write operations (
INSERT
,UPDATE
,DELETE
) because the index must be updated.
Example Table: users
To ground our discussion, let’s use a sample users
table:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(50),
email VARCHAR(100),
status TINYINT,
last_login TIMESTAMP,
profile JSON,
bio TEXT
);
We’ll assume this table has 10,000 rows and add indexes to optimize various queries.
Why B+ Tree? A Quick Look at Index Structures
MySQL primarily uses B+ trees for indexes because they’re efficient for a wide range of queries:
-
Ordered Storage: B+ trees store data in sorted order, making them ideal for range queries (
WHERE id BETWEEN 100 AND 200
) and sorting (ORDER BY id
). - Balanced Structure: Ensures logarithmic search times, even for large datasets.
- Leaf Nodes: Contain actual data or pointers to rows, with intermediate nodes guiding the search.
Hash indexes, another option, are faster for equality checks (WHERE email = 'john@example.com'
) but don’t support range queries or sorting. InnoDB uses hash indexes internally (adaptive hash index), but B+ trees are the default for most user-defined indexes.
Clustered vs. Secondary Indexes in InnoDB
In InnoDB, MySQL’s default storage engine, indexes are categorized as clustered or secondary.
Clustered Index
- Definition: The clustered index determines the physical storage order of table data, typically based on the primary key. Each table has exactly one clustered index.
-
How It Works: Leaf nodes of the B+ tree contain the entire row data. For our
users
table, the clustered index onid
stores rows sorted byid
. -
Performance: Queries like
SELECT * FROM users WHERE id = 1
are fast because they access the clustered index directly, with no additional lookups. - Example:
SELECT * FROM users WHERE id = 1;
-
EXPLAIN output:
id | select_type | table | type | key | rows | Extra 1 | SIMPLE | users | const| PRIMARY | 1 |
-
type: const
: Direct access to a single row via the primary key.
-
Secondary Index
- Definition: A secondary index (non-clustered) stores the indexed column value and the primary key, pointing to the clustered index.
-
How It Works: Leaf nodes contain
(indexed_column, primary_key)
. For an index onemail
(idx_email
), a query likeWHERE email = 'john@example.com'
finds theid
inidx_email
, then looks up the full row in the clustered index (bookmark lookup). - Performance: Slightly slower than clustered index due to the extra lookup, but still much faster than a full table scan.
- Example:
CREATE INDEX idx_email ON users (email);
EXPLAIN SELECT id, username FROM users WHERE email = 'john@example.com';
-
EXPLAIN output:
id | select_type | table | type | key | rows | Extra 1 | SIMPLE | users | ref | idx_email| 1 | Using index condition
-
type: ref
: Uses the index for an equality check. -
rows: 1
: Estimates one row, indicating high selectivity.
-
Key Difference
- Clustered: Contains all row data, one per table, no lookup needed.
- Secondary: Contains indexed column and primary key, requires bookmark lookup for non-indexed columns.
Index Cardinality: The Key to Selectivity
Index cardinality is the number of unique values in an index. It’s a critical metric for the MySQL query optimizer, which uses it to estimate selectivity—how many rows a query will return.
-
High Cardinality: Indexes like
id
(10,000 unique values in 10,000 rows) oremail
(~9,900 unique values) are highly selective, returning few rows. -
Low Cardinality: Indexes like
status
(2 values: 0 or 1) are less selective, potentially returning many rows (~5,000 forstatus = 1
).
Why Cardinality Matters
The optimizer prefers high-cardinality indexes because they narrow down results efficiently. For example:
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com' AND status = 1;
-
idx_email
(cardinality ~9,900) is chosen overidx_status
(cardinality 2) because it’s more selective. - Output:
type: ref
,key: idx_email
,rows: ~1
.
Updating Cardinality with ANALYZE TABLE
Cardinality estimates can become outdated after significant data changes. Run ANALYZE TABLE
to refresh statistics:
ANALYZE TABLE users;
This updates:
- Cardinality for indexes.
- Row count and average row size.
-
Histograms (MySQL 8.0+, for non-indexed columns like
last_login
). - Clustering factor (how scattered secondary index entries are).
Without ANALYZE TABLE
, the optimizer might choose a full table scan over an index, slowing queries. For example, post-insert of 9,000 rows:
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';
- Before:
type: ALL
,rows: 1000
(outdated stats). - After:
type: ref
,key: idx_email
,rows: 1
.
Decoding Query Plans with EXPLAIN
EXPLAIN
is your go-to tool for understanding how MySQL executes a query. It reveals the optimizer’s plan, including index usage, estimated rows, and extra operations.
Example
EXPLAIN SELECT id, username FROM users WHERE email = 'john@example.com' AND status = 1;
- Output:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | users | ref | idx_email,idx_status | idx_email| 302 | const | 1 | Using index condition
-
Key Fields:
-
type: Access method (
const
,ref
,range
,index
,ALL
). Lower is better (ALL
= full table scan). - possible_keys: Indexes considered.
- key: Index used.
- rows: Estimated rows scanned.
-
Extra: Additional operations (e.g.,
Using filesort
for sorting,Using index
for covering index).
-
type: Access method (
Tips
- Aim for
type: ref
orconst
for equality checks,range
for ranges. - Minimize
rows
to reduce I/O. - Watch for
Using filesort
orUsing temporary
, which indicate extra processing. - Run
ANALYZE TABLE
ifEXPLAIN
shows unexpected plans.
Prefix Indexes: Saving Space with Trade-offs
A prefix index indexes only the first N characters of a string column (CHAR
, VARCHAR
, TEXT
), reducing storage but limiting query support.
When to Use
-
Long Strings: For columns like
email
(VARCHAR(100)), indexing the full column is costly. -
Prefix Queries: Queries like
WHERE email LIKE 'john.doe@%'
. - Storage Constraints: To save disk/memory or speed up writes.
-
Key Size Limits: To fit within InnoDB’s 767/3072-byte limit for
TEXT
.
Example
CREATE INDEX idx_email_prefix ON users (email(10));
EXPLAIN SELECT * FROM users WHERE email LIKE 'john.doe@%';
-
Output:
type: range
,key: idx_email_prefix
,rows: ~10
. - Supports
LIKE 'prefix%'
but notLIKE '%example.com'
.
Trade-offs
-
Advantages:
- Smaller index size (e.g., 10MB vs. 50MB).
- Faster writes due to less index maintenance.
-
Disadvantages:
- Lower cardinality (e.g., 500 unique prefixes vs. 9,900 full emails), reducing selectivity.
- Limited query support (no mid-string matches).
- Bookmark lookup for non-covered columns.
- Choosing prefix length requires testing (
SELECT COUNT(DISTINCT LEFT(email, N))
).
Full-Text Indexes: Powering Text Search
Full-text indexes are designed for keyword searches in CHAR
, VARCHAR
, or TEXT
columns, using an inverted index for word-based lookups.
When to Use
-
Text-Heavy Columns: Like
bio
(TEXT) inusers
for user profiles. -
Over
LIKE '%term%'
:LIKE
causes full table scans, slow on large tables. - Natural Language Search: Supports stemming (e.g., “running” matches “run”) and relevance ranking.
-
Complex Searches: Boolean mode for
+term
,-term
, or phrases.
Example
CREATE FULLTEXT INDEX idx_bio_full ON users (bio);
SELECT id, username, MATCH(bio) AGAINST('software engineer') AS relevance
FROM users
WHERE MATCH(bio) AGAINST('software engineer');
-
EXPLAIN:
type: fulltext
,key: idx_bio_full
,rows: ~2
. - Boolean Mode:
SELECT id FROM users WHERE MATCH(bio->>'$.bio') = 'admin' +python -engineer' IN BOOLEAN MODE);
Trade-offs
- Advantages: Fast keyword searches, relevance ranking, Boolean logic.
-
Disadvantages: Limited to text columns, overhead for stop words, requires tuning (e.g.,
innodb_ft_min_token_size
).
Indexing JSON Columns: Handling Semi-Structured Data
MySQL’s JSON data type (since 5.7) is great for dynamic data, but indexing JSON requires extracting values to scalar columns.
1. Generated Columns with Indexes
- How: Create a stored generated column to extract a JSON field, then index it.
- Example:
ALTER TABLE users
ADD COLUMN role VARCHAR(50) GENERATED ALWAYS AS (JSON_UNQUOTE(profile->>'$.role')) STORED,
ADD INDEX idx_role (role);
SELECT id FROM users WHERE role = 'admin';
-
EXPLAIN:
type: ref
,key: idx_role
,rows: ~1
.
2. Functional Indexes (MySQL 8.3+)
- How: Index a JSON expression directly.
- Example:
CREATE INDEX idx_age ON users ((JSON_UNQUOTE(profile->>'$.age')));
SELECT id FROM users WHERE profile->>'$.age' = '30';
3. Full-Text for JSON Text
- Extract text fields to a generated column with a full-text index:
ALTER TABLE users
ADD COLUMN bio_text TEXT GENERATED ALWAYS AS (JSON_UNQUOTE(profile->>'$.bio')) STORED,
ADD FULLTEXT INDEX idx_bio_fulltext (bio_text);
SELECT id FROM users WHERE MATCH(bio_text) AGAINST('software engineer');
Supported Functions
-
JSON_EXTRACT
,->
,->>
: Extract fields. -
JSON_CONTAINS
,JSON_SEARCH
: Search within JSON. -
MEMBER OF
: Check array membership (used later).
Multi-Valued Indexes: Indexing JSON Arrays
Multi-Valued Indexes (MVIs, MySQL 8.2+) are designed for JSON arrays, creating multiple index records per array element.
When to Use
-
JSON Arrays: Like
skills: ["python", "sql"]
inprofile
. -
Over Queries:
JSON_CONTAINS
scans all rows without an index.
Example
ALTER TABLE users ADD INDEX idx_skills ((CAST(profile->>'$.skills' AS CHAR(50) ARRAY)));
SELECT id, username FROM users WHERE 'python' MEMBER OF (profile->'$.skills');
-
EXPLAIN:
type: ref
,key: idx_skills
,rows: ~2
. - Alternative:
SELECT id FROM users WHERE JSON_CONTAINS(profile->'$.skills', '["python"]');
Benefits
- Replaces full scans with index lookups.
- Simplifies schemas (no need for normalized tables like
user_skills
). - Supports
MEMBER OF
,JSON_CONTAINS
,JSON_OVERFLOW
.
Trade-offs
- Increased storage (multiple entries per row).
- Slower writes due to index maintenance.
- Limited to scalar arrays, no nested arrays.
Best Practices for MySQL Indexing
-
Analyze with
EXPLAIN
: Always check query plans** to confirm index usage. -
Update Statistics: Run
ANALYZE TABLE
after data changes. -
Choose High-Cardinality Indexes: Prioritize columns like
email
overstatus
. - Use Covering Indexes: Include queried columns in the index to avoid lookups.
-
Test Prefix Lengths: For prefix indexes, use
SELECT COUNT(DISTINCT ...)
to find optimal length. -
Tune Full-Text: Adjust
min_token_size
or stop words for better results. - Optimize JSON: Use generated columns or functional indexes for scalar fields, MVIs for arrays.
- Balance Read/Write: Avoid over-indexing in write-heavy workloads.
Conclusion
MySQL indexing is a powerful tool to optimize database performance, but it’s not a one-size-fits-all solution. By understanding clustered and secondary indexes, leveraging cardinality and statistics, analyzing plans with EXPLAIN, and applying specialized indexes like prefix, full-text, JSON, and multi-valued indexes, you can tailor your indexing strategy to your application’s needs. The users
table examples show how these concepts apply in real-world scenarios, from speeding up user searches to handling JSON arrays.
Start experimenting with indexes in your MySQL databases, and use EXPLAIN
to validate your choices. Share your indexing tips or questions in the comments—I’d love to hear your experiences!
About the Author: I am a backend developer passionate about database optimization and scalable systems. Follow me on Medium for more insights on MySQL, Python, and backend development.