https://prefect.io logo
Title
n

Nate Atkins

02/14/2020, 8:27 PM
I've been working to keep exception and log data in our environment instead of going to Prefect Cloud. Exceptions were pretty easily handled with a decorator to catch the exceptions, store them locally and then raise an exception that has the URI of the saved exception. I have a small app that pulls down the log from the cloud and builds a simple web page that lets you click to get the full exception from the local store. I wanted to do the same thing with logging. Log the message to a local file and update the message to Prefect Cloud with the URI of the locally stored log message. I can add the handler to log to a local file and I can update the formatter on the prefect logger to accomplish all of this when I have control of the application and call flow.run(). I'm having problems figuring out how to get the little stub of code that makes these change to the logging configuration to run with the agent picks up the flow from Prefect Cloud and runs the flow. Is there a place to put this code that the Agent will run it before the flow is kicked off? I guess as a last resort I can add a set task at the beginning of the flow to make this configuration change.
j

Jeremiah

02/14/2020, 8:35 PM
Nate - this is such a cool idea and one that I hadn’t considered before. To borrow Cloud’s existing vocabulary, you’ve essentially described the equivalent of a
ResultHandler
for logs: a way to ship the sanitized log to Cloud and recover the full log locally. We’ve been operating on the basis that logs in Cloud are either full opt-in or opt-out, but now that you’ve described this pattern I’m extremely interested in exploring it as a first-class middle ground.
BUT until that’s available, let’s see if we can’t find a place for you to do it yourself. If you’re running a fork of Prefect, you could just write your logic into `logging.py`: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/logging.py. Specifically
configure_logging()
, I think.
Using a task to modify configuration at the start of your flow will only work 1) if all your flows run in the same memory space via
LocalExecutor
and 2) if you never retry (since you won’t rerun the first task but instead will skip right to the retrying task), so I’d only adopt that solution if necessary
@josh is there any hook (or should we add a hook) for executing setup code in the agent prior to running the flow?
n

Nate Atkins

02/14/2020, 8:42 PM
Good point on the "same memory space". I run in Dask so that won't work. I guess I'll go back to me other plan which is to add it to the decorator that I'm using to capture and clean up the exceptions. It just feels funny to check and update the logging configuration in the decorator before each task function is actually called.
j

Jeremiah

02/14/2020, 8:44 PM
I agree with you - we’re going to take your use case into consideration and see how we can expose better hooks. Pragmatically, we generally assume each task is running in a separate process, so checking the config is “correct,” but we don’t want you to have to write code that isn’t nice and DRY
j

josh

02/14/2020, 8:44 PM
Is this something that you would want set on the process where the flow runs? Environments allow for
on_start
and
on_exit
callbacks which you may provide functions to run prior to the flow run and after it has finished. https://docs.prefect.io/cloud/execution/overview.html#environment-callbacks For example the
on_start
callback could perform your code before the flow starts however it won’t happen prior to the agent deploying it.
:upvote: 1
n

Nate Atkins

02/14/2020, 8:46 PM
I have to step out for a little bit. I'll take a look at that and let you know. Right now I think it applies to the entire flow.
The on_start() was exactly what I was looking for. 😀 As we move forward I think there is still value in thinking about how to add pluggable LogHandlers. It would be nice to be able to just drop in a S3, GCP, Local or Postgress log handler. Thanks for the quick response.
🎉 3
j

Jeremiah

02/14/2020, 9:42 PM
I agree, and glad you’re all set! It’ll be on our roadmap.
n

Nate Atkins

02/14/2020, 9:43 PM
I've got a little bit more exploration to do on the Exception one as well, but I think that could also follow a similar pattern.
I think these to enhancements can help the security story in the "Gotchas and Caveats" section of the FAQ. https://docs.prefect.io/cloud/faq/dataflow.html#where-does-all-of-this-take-place
:upvote: 1
Well darn! Somewhere between the on_start() and when the task is called the prefect logger gets reconfigured and my handler gets bounced.
End of on_start()
At beginning of task function()
c

Chris White

02/15/2020, 4:47 PM
Hey @Nate Atkins! Which environment class are you using and would you mind sharing your logic for reconfiguring the logger within
on_start
?
n

Nate Atkins

02/15/2020, 4:53 PM
I've been using the DaskExcutor and whatever it defaults to in a RemoteEnvironment. It got a little more complicated than I wanted for a proof of concept. I could also see other issues as mentioned in the PIN on environment callbacks working slightly differently. I also found that CloudHandler doesn't call a formatter so where I had the logic to add the URI there it never got called.
Where I got to before I decided to go down another path for now.
I now have a function to get a safe logger that replaces the call to logger=prefect.context.get("logger").
c

Chris White

02/15/2020, 5:38 PM
Gotcha gotcha, so I think what’s going on is that when dask submits work to a subprocess the logger is reconfigured from scratch. I believe we’ll need to bake in a hook into the prefect code that allows you to specify your own formatters so they are configured each time the logger is. I would expect that if you use the
LocalExecutor
this will work as you expect — would you be willing to open a Feature Request issue on GitHub for this? I think it’s a really great idea
n

Nate Atkins

02/15/2020, 5:43 PM
Yes, I'm working to get some code together that demonstrates what can be done for both logging and exceptions. They won't be a proposal for how it should be implemented, but something to anchor a discussion around and flesh out some requirements. I should be able to get something open in the next day or two.
👍 1
c

Chris White

02/15/2020, 5:44 PM
yea that sounds great thank you!!
n

Nate Atkins

02/16/2020, 5:56 PM
Here is the one for Local Log Storage. https://github.com/PrefectHQ/prefect/issues/2038
Here is thee one for Local Exception Storage. https://github.com/PrefectHQ/prefect/issues/2039
I haven't dug into the Parameters side of things yet, but it seems like Prefect Cloud could take one parameter that is a URL to the locally stored parameters. When the Agent picks up a run it would get the schema of required parameters. If they are in the local parameter file it would reject the job and let Prefect Cloud know which parameters are missing.
I think with ResultHandles, LocalLoggingHandler, LocalExceptionHandler and the Local Parameters tweak all of the "Gotchas and Caveats" have a local only option. • If you don't have privacy concerns with you data, you can use Prefect out of the box. • If you do have privacy concerns you can make N configuration changes and keep some or all of your critical data in your controlled storage environment.
I made a stab at implementing this. I sent a pull request with what I did in it. Lot's of holes and such, but something to start finding the places that need to be touched. https://github.com/PrefectHQ/prefect/pull/2048
c

Chris White

02/18/2020, 11:16 PM
wow nice thank you @Nate Atkins!