Dynamic Log file for each spiders: Scrapy Logging
Ajit Kumar

Ajit Kumar @ajitkumar

About: I have been a researcher and python programmer since 2012. Currently, exploring full time AI researcher and developer job role with my current position.

Joined:
Jul 11, 2024

Dynamic Log file for each spiders: Scrapy Logging

Publish Date: Nov 25
0 0

To dynamically assign a log file without modifying the settings directly, you need to set up logging outside the immutable settings object. Here's how to fix this:


Updated Solution for Dynamic Logging

Instead of modifying the LOG_FILE in the immutable settings, directly reconfigure the Python logging module within the spider_opened signal handler.

Modified DynamicLogFileExtension

import os
import datetime
import logging
from scrapy import signals

class DynamicLogFileExtension:
    @classmethod
    def from_crawler(cls, crawler):
        # Instantiate the extension
        ext = cls()

        # Connect the spider_opened signal
        crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)
        return ext

    def spider_opened(self, spider):
        # Create a logs directory if it doesn't exist
        log_dir = "logs"
        os.makedirs(log_dir, exist_ok=True)

        # Generate a dynamic log file name
        log_file_name = f"{log_dir}/{spider.name}_{datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.log"

        # Set up logging to the file
        file_handler = logging.FileHandler(log_file_name, mode="w")
        file_handler.setLevel(logging.INFO)  # Adjust level as needed
        file_handler.setFormatter(
            logging.Formatter("%(asctime)s [%(name)s] %(levelname)s: %(message)s")
        )

        # Get Scrapy's root logger and add the file handler
        logger = logging.getLogger()
        logger.addHandler(file_handler)

        # Debugging message
        print(f"Log file for spider {spider.name} set to: {log_file_name}")
Enter fullscreen mode Exit fullscreen mode

Steps to Implement

  1. Add the Extension to settings.py Register the extension in your settings.py:
   EXTENSIONS = {
       'project_name.extensions.DynamicLogFileExtension': 500,
   }
Enter fullscreen mode Exit fullscreen mode
  1. Run the Spider Execute the spider as usual:
   scrapy crawl your_spider_name
Enter fullscreen mode Exit fullscreen mode
  1. Check the Log Directory Logs will now appear in the logs/ directory with a unique file for each spider:
   logs/
   ├── your_spider_2024-11-28_17-00-00.log
Enter fullscreen mode Exit fullscreen mode

Key Differences in the Fix

  1. Avoids Mutating Immutable Settings:

    • Instead of trying to set LOG_FILE in Scrapy's settings, the code configures Python's logging system directly.
  2. Customizes Logging for Each Spider:

    • Creates a new file for each spider dynamically using the spider_opened signal.
  3. Supports Multiple Handlers:

    • If needed, you can add additional loggers or handlers (e.g., console logging).

This approach avoids the TypeError and ensures your logs are correctly routed to dynamic log files for each spider.

Comments 0 total

    Add comment