Rasmus Lindqvist
08/02/2022, 7:43 AMscrapy
. I have configured scrapy to log to stdout, and configured my Prefect task to log_stdout=True
. Should not Prefect then pick up the logs from scrapy or am I missing something?Rob Freedy
08/02/2022, 2:59 PMRasmus Lindqvist
08/02/2022, 3:14 PM@task(
name=<TASK_NAME>,
log_stdout=True,
target=get_prefect_location(
flow_name="{flow_name}", pipeline_name="{task_name}", interval="daily"
)
+ "/output.csv",
)
def fetch(output_dir: str) -> pd.DataFrame:
print
statements and they are visible in the logs in the UIRob Freedy
08/02/2022, 5:19 PMKevin Grismore
08/02/2022, 5:45 PMfrom contextlib import redirect_stderr
...
@task(name='run-scrapy')
def run_scrapy(spider: Dict):
logger = get_run_logger()
def write(msg: str):
DEBUG = ' DEBUG: '
INFO = ' INFO: '
WARNING = ' WARNING: '
ERROR = ' ERROR: '
if msg != '\n':
if DEBUG in msg:
logger.debug(msg.split(DEBUG)[1])
elif INFO in msg:
<http://logger.info|logger.info>(msg.split(INFO)[1])
elif WARNING in msg:
logger.warning(msg.split(WARNING)[1])
elif ERROR in msg:
logger.error(msg.split(ERROR)[1])
else:
return None
logger.write = write
with redirect_stderr(logger):
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
process = CrawlerRunner(settings=spider['settings'])
d = process.crawl(spider['name'])
d.addBoth(lambda _: reactor.stop())
reactor.run()
class MySpider(scrapy.Spider):
name = 'demo'
start_urls = ['<https://some.website>']
def __init__(self, *args, **kwargs):
logger = logging.getLogger('scrapy')
logger.setLevel(<http://logging.INFO|logging.INFO>)
super().__init__(*args, **kwargs)
Rasmus Lindqvist
08/03/2022, 7:41 AMDominic Pham
08/12/2022, 10:23 PMRasmus Lindqvist
08/15/2022, 7:14 AM