Hi everyone. I am working with a data science tea...
# ask-community
p
Hi everyone. I am working with a data science team that wants to orchestrate some workflows. They already have scripts that run inside of a docker container. I was thinking that the easiest way to add these to the flow would be to use the Docker Tasks (
CreateContainer
,
StartContainer
, etc.). However, it seems that the logs would need to be retrieved after the container finishes running with
GetContainerLogs
. Is this correct? If so, this is less than ideal for our use case given that these are long-running processes we'd want to see the logs in real-time. So far, I've thought about a couple of alternatives: 1. Modify the Docker Tasks to use the
logs
method with
stream=True
(I haven't tested this yet, but the docs suggest this could work) 2. Add prefect to their docker image and create a flow that runs inside of the image Do you see another option? What would you recommend? Thanks!
👀 1
n
Hi @Pedro Machado - I'm not certain that you'll be able to stream logs from the container as you're intending (I'm not sure you won't be able to, to be clear) but a third and potentially simpler option would be to wrap the
StartContainer
task in your own and call both the
StartContainer
task and the
GetContainerLogs
method on some interval loop until your scripts inside the container have finished; that way you can batch the logs and store them however you need to without having to stream them.
To be clear, I think either of the methods you suggested could work, I think this ^ might be a little easier
p
Nice. I didn't think of this. I'll give it a try!