Hi there I m trying to forward logs form an external library Prefect Community #ask-community

Hi there! I’m trying to forward logs form an exter...

Rasmus Lindqvist

08/02/2022, 7:43 AM

Hi there! I’m trying to forward logs form an external library, more specifically

scrapy

. I have configured scrapy to log to stdout, and configured my Prefect task to

log_stdout=True

. Should not Prefect then pick up the logs from scrapy or am I missing something?

✅ 1

Rob Freedy

08/02/2022, 2:59 PM

Hey Rasmus!! Depending on how your flow is configured, you may want to looking into either an external logger or making sure that your task decorator is set up correctly. Are you using 1.0 or 2.0? : https://docs-v1.prefect.io/core/concepts/logging.html#extra-loggers https://docs.prefect.io/concepts/logs/

Rasmus Lindqvist

08/02/2022, 3:14 PM

Hey Rob! We are using 1.0, and for this flow we are using the functional api. This is how we have the task decorator set up:

Copy code

@task(
    name=<TASK_NAME>,
    log_stdout=True,
    target=get_prefect_location(
        flow_name="{flow_name}", pipeline_name="{task_name}", interval="daily"
    )
    + "/output.csv",
)
def fetch(output_dir: str) -> pd.DataFrame:

Rasmus Lindqvist

08/02/2022, 3:15 PM

And I’ve tried

print

statements and they are visible in the logs in the UI

Rob Freedy

08/02/2022, 5:19 PM

Without knowing too much about how scrapy handles logging, it might be worth trying adding an extra logger as described here: https://docs-v1.prefect.io/core/concepts/logging.html#extra-loggers

Kevin Grismore

08/02/2022, 5:45 PM

coincidentally I have been running scrapy as well (though in prefect 2.0)

Kevin Grismore

08/02/2022, 5:46 PM

scrapy sends its logs to stderr by default, which is kind of a pain https://docs.scrapy.org/en/latest/topics/settings.html#std-setting-LOG_FILE

👍 1

Kevin Grismore

08/02/2022, 5:48 PM

getting the logger is a bit different in 2.0, but here's what I did to get it working:

Copy code

from contextlib import redirect_stderr

...

@task(name='run-scrapy')
def run_scrapy(spider: Dict):
    logger = get_run_logger()

    def write(msg: str):
        DEBUG = ' DEBUG: '
        INFO = ' INFO: '
        WARNING = ' WARNING: '
        ERROR = ' ERROR: '

        if msg != '\n':
            if DEBUG in msg:
                logger.debug(msg.split(DEBUG)[1])
            elif INFO in msg:
                <http://logger.info|logger.info>(msg.split(INFO)[1])
            elif WARNING in msg:
                logger.warning(msg.split(WARNING)[1])
            elif ERROR in msg:
                logger.error(msg.split(ERROR)[1])
        else:
            return None

    logger.write = write

    with redirect_stderr(logger):
        configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})

        process = CrawlerRunner(settings=spider['settings'])
        d = process.crawl(spider['name'])
        d.addBoth(lambda _: reactor.stop())
        reactor.run()

🔥 2

Kevin Grismore

08/02/2022, 5:50 PM

adding extra loggers never worked for me, presumably because the way scrapy logs by modifying the root logger is kind of a mess. you'll then have to set the log level from the spider itself if you don't want all the debug logs:

Copy code

class MySpider(scrapy.Spider):
    name = 'demo'
    start_urls = ['<https://some.website>']

    def __init__(self, *args, **kwargs):
        logger = logging.getLogger('scrapy')
        logger.setLevel(<http://logging.INFO|logging.INFO>)
        super().__init__(*args, **kwargs)

Rasmus Lindqvist

08/03/2022, 7:41 AM

Awesome! Thanks a lot 🔥 I have been struggling a lot with understanding the scrapy log config the last few days, so really great that you’ve figured it out. I will try your solution out, basically just looks like I need to change how to access the prefect logger in order to get it to work with Prefect 1.0.

🔥 1

Dominic Pham

08/12/2022, 10:23 PM

Hi I am encountering this problem too, did you manage to come up with a solution @Rasmus Lindqvist?

Rasmus Lindqvist

08/15/2022, 7:14 AM

Hi. Haven’t had the time to try out the solution suggested by Kevin. However, I’m fairly sure that it would work with just changing how you access the prefect logger, if you’re using Prefect 1.0. For Prefect 2.0; use Kevin’s solution 🙂

144 Views

Open in Slack

Previous Next