I m pretty puzzled by a good way to create logging from task Prefect Community #ask-community

I'm pretty puzzled by a good way to create logging...

Vincent Rubingh

09/25/2024, 6:04 PM

I'm pretty puzzled by a good way to create logging from tasks that run inside Docker containers (that reflects correctly in the Cloud UI). I have a flow spinning up a bunch of different docker containers as part of a deployment. However I am not sure what is a good way to handle logging for the individual containers through Prefect. Right now everything is running on the same machine, with a worker on Docker that runs a flow that has various tasks where docker containers are created and run using the prefect.infrastructure.container package. The logging from the containers is piped through to Prefect UI using log_prints=True. However that means there is no proper error handling as everything has the same severity, and errors are not propagated from the containers. It seems there are a few ways to handle this: • Parse the logs after a task is finished. This could work but it's not real-time and any unexpected crashes might prevent this from running as expected • Parse the logs while the container is running by running and connecting to the containers using the python docker package. This might work but I'm introducing another dependency (docker package) and this would not work when the machine is remote. It does not seem this is possible by only using the default prefect python package. • Somehow setup logging inside the container that gets parsed the relevant flow/task information from the worker, so that it can recreate a prefect logger and log everything directly to Prefect It feels like this should be easier and a ready solution should be available however after searching a lot online and trying various options I still haven't found a satisfying solution. Maybe I'm missing something about how Prefect is supposed to be set up and what the philosophy behind it is.

Nate

09/25/2024, 8:04 PM

hi @Vincent Rubingh - have you tried just using

get_run_logger

? it should just work and has all the standard methods (

info()

debug()

etc)

log_prints

is just sending what you send to

print

through

get_run_logger().info()

prefect rocket 1

Vincent Rubingh

10/02/2024, 6:43 PM

Hey Nate, thanks for your response and sorry for late reply. Yes I can get a logger inside the task like that, but that's not inside my docker container that is started up inside the task. So if I set

log_prints=false

, none of the output from inside the container (

docker_container_block

) ends up in prefect:

Copy code

@task(name="Parser", retries=1, log_prints=True)
def run__parser():

    #TODO: failures from within the docker container are not propagating to the (failed) task

    docker_container_block = DockerContainer.load("docker-container-name")
    docker_container_block.env.update(
        {
            "PREFECT_TASK_ID": runtime.task_run.id, 
            "PREFECT_TASK_NAME": runtime.task_run.name,
            "PREFECT_FLOW_ID": runtime.flow_run.id,
            "PREFECT_FLOW_NAME": runtime.flow_run.name,
            "MAX_NUM_ROUNDS": Variable.get('parser_num_rounds').value, 
        }
    )
    container_result = docker_container_block.run()

    log_result = log_parser_results(
        artifact_name="parser-report", 
        service_name="Parser", 
        service_path="parser/dev/logs/parser.log")
    
    return {}

Vincent Rubingh

10/02/2024, 6:45 PM

As far as I understand, I cannot use get_run_logger inside my

docker_container_block

? That's what I was trying, so I can get the logging with the right levels to propagate correctly (especially errors).

Vincent Rubingh

10/02/2024, 6:48 PM

and by inside I using

get_run_logger

in the actual docker container. That's why I was trying to forward some task_id and flow_id to the docker container, as I figured with those I can create a prefect logger in my container(s)

Nate

10/02/2024, 6:53 PM

as far as when you can use

get_run_logger

, its just within the context of any task or flow run (pedantic detail: there's a

ContextVar

we set when we enter the flow or task run engine, so if you're in that context, you can use

get_run_logger

) i.e. inside a flow or task function

So if I set
log_prints=false
, none of the output from inside the container (
docker_container_block
) ends up in prefect:

log_prints will patch the builtin

print

, which I don't see used above, so its not clear to me why anything should be sent to the API as logs, not knowing how

log_parser_results

works at least

Vincent Rubingh

10/02/2024, 7:02 PM

So I'll just make it as simple as possible to make it clear what my intention is: I have a bunch of custom docker images for a series of different services. Prefect seems like a great way to orchestrate them and run them consecutively as part of a flow. I setup a task (like the code above) that loads a docker image through a

DockerContainer

object (that's part of

prefect.infrastructure.container

). So far so good, I can get this to work, I use different tasks to run different docker containers. However, I want to somehow redirect the output from inside the docker containers (which can contain many lines of info/success/debug level output as well as errors) correctly to the Prefect UI. (Either redirect by changing the code inside the task above, or change my logging inside my docker image to log to prefect directly) So far I've only managed to redirect the output from the running docker containers by using

log_prints=false

. I haven't found another way to get for example an error to correctly propagate from inside my running docker container to Prefect. Is this a use case that's supported?

Vincent Rubingh

10/02/2024, 7:02 PM

Or would it be better to for example log my docker container output to a file, and then read/parse from there inside the task?

Nate

10/02/2024, 7:10 PM

so im sorry I don't have a ton of time to step through this super diligently but ill make a couple observations: • the infra blocks like

DockerContainer(...).run()

were originally added to prefect 2.x as a way of defining deployment config. they happen to be convenient wrappers for invoking work on some infra like a container, but these things are removed in 3.x and generally speaking I would recommend creating actual deployments for these pieces of work that run on their own containers, and then chain them in a parent flow, using

run_deployment

to kick them off. using these infra blocks directly is not a 1st class pattern and is likely why you're having a hard time finding info on it • its not clear to me whats happening in each container of yours rn, but generally if you're running prefect code, and you're in a task or flow context, using the

get_run_logger

should just work as long as that container has the PREFECT_API_KEY and PREFECT_API_URL set as env vars so that the API it talks to is the one you expect (again, setting these env vars just happens for you if you were to define deployments for these intermediate containers, but you'd have to inject it yourself if you use

DockerContainer

yourself directly)

Vincent Rubingh

10/02/2024, 7:12 PM

ah ok that's very helpful, I'll switch things over to prefect 3.x style then, and put these containers as their own deployments like you mentioned.

Vincent Rubingh

10/02/2024, 7:12 PM

that's exactly the kind of info I needed. Thank you!

Nate

10/02/2024, 7:13 PM

sure thing, in case its a useful reference, here's a silly example that follows the general pattern im talking about

gratitude thank you 1

Nate

10/02/2024, 7:15 PM

lastly, i'd point out for later if chaining "child" deployments becomes cumbersome, check out event

triggers

that can be defined on deployments, that way (in many cases) you can get away with not having a parent "chaining" flow at all, because each downstream deployment just has a trigger defined to

expect

the

prefect.flow-run.Completed

event from the upstream

42 Views

Open in Slack

Previous Next