Lateral Flattening of JSON Data in Python
dotnet dev

dotnet dev @dotnetdev

Joined:
Jan 27, 2025

Lateral Flattening of JSON Data in Python

Publish Date: Jan 27
0 0

Introduction

Lateral flattening is the process of converting nested or hierarchical JSON structures into flat, tabular formats. Python’s pandas library provides powerful tools for this task, particularly the json_normalize function. This article explores how to use Python to explode arrays and flatten nested JSON data for use in analytics, databases, or machine learning pipelines.

JSON’s nested structure is ideal for web APIs and configuration files but becomes cumbersome for:

  • Relational databases (e.g., PostgreSQL, MySQL).
  • Tabular analysis tools (e.g., pandas, Excel).
  • Machine learning models (most require 2D input).

Flattening resolves nested keys and explodes arrays into rows or columns, enabling compatibility with these systems.


Python Implementation with pandas

Key Function: json_normalize

The pandas.json_normalize method recursively flattens nested JSON structures. Its parameters include:

  • data: The JSON input (dict or list of dicts).
  • record_path: The key containing the array to explode.
  • meta: Fields to preserve as metadata (e.g., id, name).
  • meta_prefix: A prefix for nested metadata keys (e.g., user_).

Example 1: Exploding a Simple Array

Input JSON:

{
  "id": 1,
  "name": "John Doe",
  "contactIds": [1, 2, 3, 4]
}
Enter fullscreen mode Exit fullscreen mode

Python Code:

import pandas as pd

data = {
    "id": 1,
    "name": "John Doe",
    "contactIds": [1, 2, 3, 4]
}

# Explode the "contactIds" array into rows
df = pd.json_normalize(
    data,
    record_path="contactIds",  # Array to explode
    meta=["id", "name"],       # Fields to retain
    record_prefix="contactId_" # Optional: prefix for exploded values
)

print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

   id      name  contactId_
0   1  John Doe           1
1   1  John Doe           2
2   1  John Doe           3
3   1  John Doe           4
Enter fullscreen mode Exit fullscreen mode

Example 2: Handling Nested Objects

For JSON with nested objects (e.g., address.street), json_normalize automatically concatenates keys:

Input JSON:

{
  "id": 1,
  "name": "John Doe",
  "address": {
    "street": "123 Main St",
    "city": "Anytown"
  }
}
Enter fullscreen mode Exit fullscreen mode

Python Code:

df = pd.json_normalize(data)
print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

   id      name  address.street address.city
0   1  John Doe     123 Main St     Anytown
Enter fullscreen mode Exit fullscreen mode

To rename columns:

df.columns = df.columns.str.replace(".", "_")
Enter fullscreen mode Exit fullscreen mode

Example 3: Complex Nesting (Arrays of Objects)

For arrays containing nested objects, json_normalize combines key concatenation and array explosion:

Input JSON:

{
  "id": 1,
  "orders": [
    {"item": "A", "price": 10},
    {"item": "B", "price": 20}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Python Code:

df = pd.json_normalize(
    data,
    record_path="orders",  # Explode the "orders" array
    meta=["id"],           # Keep "id" as metadata
    meta_prefix="user_"
)

print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

  item  price  user_id
0    A     10        1
1    B     20        1
Enter fullscreen mode Exit fullscreen mode

Advanced Customization

Handling Missing Data

Use the errors parameter to ignore or raise errors for missing fields:

pd.json_normalize(data, errors="ignore")  # Skip missing keys
Enter fullscreen mode Exit fullscreen mode

Flattening Multiple Levels

For deeply nested JSON, combine json_normalize with recursive functions or custom logic.

Alternatives to pandas

  • flatdict Library: Lightweight flattening without dependencies.
  • Manual Recursion: Custom Python functions for edge cases.

When to Avoid Flattening

  1. Preserving Hierarchy: Nested JSON is more efficient for tree-like data (e.g., organizational charts).
  2. APIs: Clients often expect nested responses.

Footnotes

  1. Use max_level in json_normalize to control flattening depth (e.g., max_level=2).
  2. Flattened JSON may increase storage size due to duplicated metadata.

Comments 0 total

    Add comment