Introduction
Lateral flattening is the process of converting nested or hierarchical JSON structures into flat, tabular formats. Python’s pandas
library provides powerful tools for this task, particularly the json_normalize
function. This article explores how to use Python to explode arrays and flatten nested JSON data for use in analytics, databases, or machine learning pipelines.
JSON’s nested structure is ideal for web APIs and configuration files but becomes cumbersome for:
- Relational databases (e.g., PostgreSQL, MySQL).
- Tabular analysis tools (e.g., pandas, Excel).
- Machine learning models (most require 2D input).
Flattening resolves nested keys and explodes arrays into rows or columns, enabling compatibility with these systems.
Python Implementation with pandas
Key Function: json_normalize
The pandas.json_normalize
method recursively flattens nested JSON structures. Its parameters include:
-
data
: The JSON input (dict or list of dicts). -
record_path
: The key containing the array to explode. -
meta
: Fields to preserve as metadata (e.g.,id
,name
). -
meta_prefix
: A prefix for nested metadata keys (e.g.,user_
).
Example 1: Exploding a Simple Array
Input JSON:
{
"id": 1,
"name": "John Doe",
"contactIds": [1, 2, 3, 4]
}
Python Code:
import pandas as pd
data = {
"id": 1,
"name": "John Doe",
"contactIds": [1, 2, 3, 4]
}
# Explode the "contactIds" array into rows
df = pd.json_normalize(
data,
record_path="contactIds", # Array to explode
meta=["id", "name"], # Fields to retain
record_prefix="contactId_" # Optional: prefix for exploded values
)
print(df)
Output:
id name contactId_
0 1 John Doe 1
1 1 John Doe 2
2 1 John Doe 3
3 1 John Doe 4
Example 2: Handling Nested Objects
For JSON with nested objects (e.g., address.street
), json_normalize
automatically concatenates keys:
Input JSON:
{
"id": 1,
"name": "John Doe",
"address": {
"street": "123 Main St",
"city": "Anytown"
}
}
Python Code:
df = pd.json_normalize(data)
print(df)
Output:
id name address.street address.city
0 1 John Doe 123 Main St Anytown
To rename columns:
df.columns = df.columns.str.replace(".", "_")
Example 3: Complex Nesting (Arrays of Objects)
For arrays containing nested objects, json_normalize
combines key concatenation and array explosion:
Input JSON:
{
"id": 1,
"orders": [
{"item": "A", "price": 10},
{"item": "B", "price": 20}
]
}
Python Code:
df = pd.json_normalize(
data,
record_path="orders", # Explode the "orders" array
meta=["id"], # Keep "id" as metadata
meta_prefix="user_"
)
print(df)
Output:
item price user_id
0 A 10 1
1 B 20 1
Advanced Customization
Handling Missing Data
Use the errors
parameter to ignore or raise errors for missing fields:
pd.json_normalize(data, errors="ignore") # Skip missing keys
Flattening Multiple Levels
For deeply nested JSON, combine json_normalize
with recursive functions or custom logic.
Alternatives to pandas
-
flatdict
Library: Lightweight flattening without dependencies. - Manual Recursion: Custom Python functions for edge cases.
When to Avoid Flattening
- Preserving Hierarchy: Nested JSON is more efficient for tree-like data (e.g., organizational charts).
- APIs: Clients often expect nested responses.
Footnotes
- Use
max_level
injson_normalize
to control flattening depth (e.g.,max_level=2
). - Flattened JSON may increase storage size due to duplicated metadata.