Rasmus Lindqvist

08/02/2022, 7:43 AM
Hi there! I’m trying to forward logs form an external library, more specifically
. I have configured scrapy to log to stdout, and configured my Prefect task to
. Should not Prefect then pick up the logs from scrapy or am I missing something?

Rob Freedy

08/02/2022, 2:59 PM
Hey Rasmus!! Depending on how your flow is configured, you may want to looking into either an external logger or making sure that your task decorator is set up correctly. Are you using 1.0 or 2.0? :

Rasmus Lindqvist

08/02/2022, 3:14 PM
Hey Rob! We are using 1.0, and for this flow we are using the functional api. This is how we have the task decorator set up:
Copy code
        flow_name="{flow_name}", pipeline_name="{task_name}", interval="daily"
    + "/output.csv",
def fetch(output_dir: str) -> pd.DataFrame:
And I’ve tried
statements and they are visible in the logs in the UI

Rob Freedy

08/02/2022, 5:19 PM
Without knowing too much about how scrapy handles logging, it might be worth trying adding an extra logger as described here:

Kevin Grismore

08/02/2022, 5:45 PM
coincidentally I have been running scrapy as well (though in prefect 2.0)
scrapy sends its logs to stderr by default, which is kind of a pain
👍 1
getting the logger is a bit different in 2.0, but here's what I did to get it working:
Copy code
from contextlib import redirect_stderr


def run_scrapy(spider: Dict):
    logger = get_run_logger()

    def write(msg: str):
        DEBUG = ' DEBUG: '
        INFO = ' INFO: '
        WARNING = ' WARNING: '
        ERROR = ' ERROR: '

        if msg != '\n':
            if DEBUG in msg:
            elif INFO in msg:
            elif WARNING in msg:
            elif ERROR in msg:
            return None

    logger.write = write

    with redirect_stderr(logger):
        configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})

        process = CrawlerRunner(settings=spider['settings'])
        d = process.crawl(spider['name'])
        d.addBoth(lambda _: reactor.stop())
🔥 2
adding extra loggers never worked for me, presumably because the way scrapy logs by modifying the root logger is kind of a mess. you'll then have to set the log level from the spider itself if you don't want all the debug logs:
Copy code
class MySpider(scrapy.Spider):
    name = 'demo'
    start_urls = ['<>']

    def __init__(self, *args, **kwargs):
        logger = logging.getLogger('scrapy')
        super().__init__(*args, **kwargs)

Rasmus Lindqvist

08/03/2022, 7:41 AM
Awesome! Thanks a lot 🔥 I have been struggling a lot with understanding the scrapy log config the last few days, so really great that you’ve figured it out. I will try your solution out, basically just looks like I need to change how to access the prefect logger in order to get it to work with Prefect 1.0.
🔥 1

Dominic Pham

08/12/2022, 10:23 PM
Hi I am encountering this problem too, did you manage to come up with a solution @Rasmus Lindqvist?

Rasmus Lindqvist

08/15/2022, 7:14 AM
Hi. Haven’t had the time to try out the solution suggested by Kevin. However, I’m fairly sure that it would work with just changing how you access the prefect logger, if you’re using Prefect 1.0. For Prefect 2.0; use Kevin’s solution 🙂