Hi there! I’m trying to forward logs form an exter...
# ask-community
r
Hi there! I’m trying to forward logs form an external library, more specifically
scrapy
. I have configured scrapy to log to stdout, and configured my Prefect task to
log_stdout=True
. Should not Prefect then pick up the logs from scrapy or am I missing something?
1
r
Hey Rasmus!! Depending on how your flow is configured, you may want to looking into either an external logger or making sure that your task decorator is set up correctly. Are you using 1.0 or 2.0? : https://docs-v1.prefect.io/core/concepts/logging.html#extra-loggers https://docs.prefect.io/concepts/logs/
r
Hey Rob! We are using 1.0, and for this flow we are using the functional api. This is how we have the task decorator set up:
Copy code
@task(
    name=<TASK_NAME>,
    log_stdout=True,
    target=get_prefect_location(
        flow_name="{flow_name}", pipeline_name="{task_name}", interval="daily"
    )
    + "/output.csv",
)
def fetch(output_dir: str) -> pd.DataFrame:
And I’ve tried
print
statements and they are visible in the logs in the UI
r
Without knowing too much about how scrapy handles logging, it might be worth trying adding an extra logger as described here: https://docs-v1.prefect.io/core/concepts/logging.html#extra-loggers
k
coincidentally I have been running scrapy as well (though in prefect 2.0)
scrapy sends its logs to stderr by default, which is kind of a pain https://docs.scrapy.org/en/latest/topics/settings.html#std-setting-LOG_FILE
👍 1
getting the logger is a bit different in 2.0, but here's what I did to get it working:
Copy code
from contextlib import redirect_stderr

...

@task(name='run-scrapy')
def run_scrapy(spider: Dict):
    logger = get_run_logger()

    def write(msg: str):
        DEBUG = ' DEBUG: '
        INFO = ' INFO: '
        WARNING = ' WARNING: '
        ERROR = ' ERROR: '

        if msg != '\n':
            if DEBUG in msg:
                logger.debug(msg.split(DEBUG)[1])
            elif INFO in msg:
                <http://logger.info|logger.info>(msg.split(INFO)[1])
            elif WARNING in msg:
                logger.warning(msg.split(WARNING)[1])
            elif ERROR in msg:
                logger.error(msg.split(ERROR)[1])
        else:
            return None

    logger.write = write

    with redirect_stderr(logger):
        configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})

        process = CrawlerRunner(settings=spider['settings'])
        d = process.crawl(spider['name'])
        d.addBoth(lambda _: reactor.stop())
        reactor.run()
🔥 2
adding extra loggers never worked for me, presumably because the way scrapy logs by modifying the root logger is kind of a mess. you'll then have to set the log level from the spider itself if you don't want all the debug logs:
Copy code
class MySpider(scrapy.Spider):
    name = 'demo'
    start_urls = ['<https://some.website>']

    def __init__(self, *args, **kwargs):
        logger = logging.getLogger('scrapy')
        logger.setLevel(<http://logging.INFO|logging.INFO>)
        super().__init__(*args, **kwargs)
r
Awesome! Thanks a lot 🔥 I have been struggling a lot with understanding the scrapy log config the last few days, so really great that you’ve figured it out. I will try your solution out, basically just looks like I need to change how to access the prefect logger in order to get it to work with Prefect 1.0.
🔥 1
d
Hi I am encountering this problem too, did you manage to come up with a solution @Rasmus Lindqvist?
r
Hi. Haven’t had the time to try out the solution suggested by Kevin. However, I’m fairly sure that it would work with just changing how you access the prefect logger, if you’re using Prefect 1.0. For Prefect 2.0; use Kevin’s solution 🙂
136 Views