A Guide to Analyzing Risk Factor Length in 10-K Filings with FinFeedAPI

The "Risk Factors" section, known as Item 1A in Form 10-K annual reports, details the challenges and uncertainties a company faces. A simple quantitative method to gauge perceived risk is to analyze the length of this section over time. An increase in length might correspond with periods of greater uncertainty or new regulations.

This guide shows you how to:

Get metadata for 10-K filings over several years.
Extract the text content of the "Risk Factors" section for each filing.
Measure the length of the extracted content.
Analyze how the median length of Item 1A changes over the years.

This analysis can be intensive in its use of an API because it requires fetching the content for many individual filings.

What you need:

Python 3.x with pandas and matplotlib.
The api-bricks-sec-api-rest library.
Your personal FinFeedAPI key.

1. Environment Setup
First, you need to install the FinFeedAPI client library if it is not already on your system.


pip install api-bricks-sec-api-rest

Next, prepare your Python script. This includes importing the required libraries and configuring the API client with your key.

# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import time
import re
import api_bricks_sec_api_rest

# --- API Configuration ---
# IMPORTANT: Replace "YOUR_API_KEY_HERE" with your actual key.
API_KEY = "YOUR_API_KEY_HERE"
api_client_config = api_bricks_sec_api_rest.Configuration()
api_client_config.api_key['Authorization'] = API_KEY
api_client = api_bricks_sec_api_rest.ApiClient(configuration=api_client_config)

# --- Analysis Parameters ---
# Years to analyze
START_YEAR = 2018
END_YEAR = 2023

# Max filings per year to analyze (to limit API calls in this demo)
MAX_FILINGS_PER_YEAR = 20 

# Delay between extractor calls (seconds)
API_DELAY = 0.3

# --- Plotting Configuration ---
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (14, 7)

2. Fetching 10-K Filings Metadata
The first step is to get a list of 10-K filings for each year in your chosen range. This gives you the accession_number for each filing, which is needed to extract its content.

# Initialize the FilingMetadataApi
filing_metadata_api = api_bricks_sec_api_rest.FilingMetadataApi(api_client)
all_10k_metadata = []

for year in range(START_YEAR, END_YEAR + 1):
    print(f"\nFetching 10-K metadata for year: {year}")
    year_start = f"{year}-01-01"
    year_end = f"{year+1}-12-30" 

    filings_this_year = []
    # Fetch a sample of filings for the year
    try:
        page_data = filing_metadata_api.v1_filings_get(
            form_type="10-K",
            filling_date_start=year_start,
            filling_date_end=year_end,
            page_size=MAX_FILINGS_PER_YEAR
        )
        if page_data:
            filings_this_year.extend(page_data)
    except api_bricks_sec_api_rest.ApiException as e:
        print(f"Exception when fetching metadata: {e}")

    all_10k_metadata.extend(filings_this_year)
    print(f"Finished fetching for {year}. Total records so far: {len(all_10k_metadata)}")

# Create DataFrame from metadata
if all_10k_metadata:
    metadata_df = pd.DataFrame.from_records([vars(f) for f in all_10k_metadata])
    metadata_df['filing_date'] = pd.to_datetime(metadata_df['filing_date'], errors='coerce')
    metadata_df['year'] = metadata_df['filing_date'].dt.year
else:
    metadata_df = pd.DataFrame()

3. Extracting Item 1A Content and Calculating Length
Now, loop through the metadata, call the /v1/extractor/item endpoint for each filing, and calculate the length of the returned text. A small delay is added between calls to manage the request rate.

results = [] 

if not metadata_df.empty:
    content_extraction_api = api_bricks_sec_api_rest.ContentExtractionApi(api_client)
    print(f"\nExtracting Item 1A content for {len(metadata_df)} filings...")

    for index, row in metadata_df.iterrows():
        acc_no = row['accession_number']
        year = row['year']
        print(f"  Processing {acc_no} (Year: {year})...")

        item_content = None
        try:
            content = content_extraction_api.v1_extractor_item_get(
                accession_number=acc_no,
                item_number="1A"
            )
            if content is not None:
                item_content = content
        except api_bricks_sec_api_rest.ApiException as e:
            print(f"    Could not extract Item 1A for {acc_no}.")

        time.sleep(API_DELAY)

        # Calculate length if content was found
        if item_content:
            length_chars = len(item_content)
            length_words = len(re.findall(r'\S+', item_content))
            results.append({
                'year': year,
                'accession_number': acc_no,
                'length_chars': length_chars,
                'length_words': length_words
            })
        else:
             results.append({ 'year': year, 'accession_number': acc_no, 'length_chars': None, 'length_words': None })


    length_df = pd.DataFrame(results)
    print("\nFinished extraction and length calculation.")
else:
    length_df = pd.DataFrame()

4. Analyzing Length Trends Over Time
The final step is to analyze the collected data. We will group the results by year and calculate the median word count. The median is used instead of the average to reduce the effect of outliers.

if not length_df.empty:
    length_df_cleaned = length_df.dropna(subset=['length_words'])

    if not length_df_cleaned.empty:
        # Calculate median length per year
        median_length_words = length_df_cleaned.groupby('year')['length_words'].median()
        count_per_year = length_df_cleaned.groupby('year').size()

        # Create the plot
        plt.figure(figsize=(14, 7))
        plt.plot(median_length_words.index, median_length_words.values, marker='o', linestyle='-', color='blue')
        plt.title('Median Length of Item 1A (Risk Factors) in 10-K Filings Over Time')
        plt.ylabel('Median Word Count')
        plt.xlabel('Year')
        plt.grid(True, linestyle='--', alpha=0.7)

        # Add count annotations
        for year, count in count_per_year.items():
             if year in median_length_words.index:
                  plt.text(year, median_length_words[year], f' n={count}', verticalalignment='bottom')

        plt.xticks(median_length_words.index.astype(int))
        plt.tight_layout()
        plt.show()

The plot shows the trend in the median length of the Risk Factors section, with annotations indicating how many filings were analyzed for each year.

Final Thoughts
This guide showed a quantitative method for analyzing the "Risk Factors" section of 10-K filings. We fetched metadata, used the /v1/extractor/item endpoint to get the text of Item 1A, calculated its length, and then visualized the trend over time.

This type of analysis can be a useful addition to qualitative reading, but it has limitations:

The analysis depends on the successful extraction of Item 1A.
Length is a proxy and does not capture the severity or type of risks discussed.
The results are sensitive to the sample of companies analyzed.

Even with these points, this method provides a programmatic way to measure one aspect of corporate risk disclosure.

Maciej Józefowicz @maciej_jozefowicz

A Guide to Analyzing Risk Factor Length in 10-K Filings with FinFeedAPI

Comments 0 total