https://prefect.io logo
Title
a

Andrew Schechtman-Rook

02/26/2020, 2:30 AM
Hi folks! I've been poking around a bunch with Prefect for the last couple months, and over that time I've written a few utilities/extensions to help Prefect work better for my particular data and modeling workflows. In case anyone else may be able to benefit I've pulled them together into a small package — https://pypi.org/project/prefect-ds/. Please feel free to take a look 🙂
👊 5
:marvin: 4
❤️ 7
🚀 7
👏 5
j

Jeremiah

02/26/2020, 2:55 AM
This is awesome, thanks for sharing it! If you’re game, we can look at bringing some of it directly into Prefect, there’s a lot of good ideas here.
:upvote: 2
c

Chris White

02/26/2020, 2:55 AM
This is really awesome @Andrew Schechtman-Rook! There are a lot of things here that I think could be incorporated almost as-is into Prefect, and some other things that we could tweak to include if you’d be interested in working together on it
:upvote: 2
a

Andrew Schechtman-Rook

02/26/2020, 2:57 AM
yeah, ideally I'd love to see these things in prefect — I had to do some unpleasant contortions to minimize modifications to the Prefect classes to hopefully keep incompatibilities from cropping up too much, but I wasn't sure if y'all would be interested in the directions I was going in
my spare time is hit-and-miss, but I'm happy to help on it as I have time
j

Jeremiah

02/26/2020, 2:59 AM
Anyone who’s implemented a custom
Result
class definitely knows what they’re doing, trust me
c

Chris White

02/26/2020, 2:59 AM
awesome; yea this all seems very in-scope for the core library, and the
checkpoint_handler
is very similar to a feature we have been discussing internally so I’d love to work with you on it
a

Andrew Schechtman-Rook

02/26/2020, 2:59 AM
lol thanks
sure, do you think the
checkpoint_handler
is the best thing to go after first?
c

Chris White

02/26/2020, 3:00 AM
actually i think the simplest thing could be the pandas result handler
a

Andrew Schechtman-Rook

02/26/2020, 3:00 AM
happy to do whatever, but I'd probably want to try for the easiest, lowest-hanging fruit first
c

Chris White

02/26/2020, 3:01 AM
as far as I can tell, the only thing we’d need to do to introduce it in the core library is: - decide how to handle the pandas dependency (maybe as an extra) - create a serializer for it (e.g., https://github.com/PrefectHQ/prefect/blob/master/src/prefect/serialization/result_handlers.py)
a

Andrew Schechtman-Rook

02/26/2020, 3:04 AM
would you want the handler with the full filepath specification and string formatting support, or more like the
LocalResultHandler
where the user only specifies the directory?
(also, feel free to switch to DMs, issue on a repo, email, or whatever is convenient for you)
👍 1
c

Chris White

02/26/2020, 3:06 AM
Honestly in this case, my bias is to include it using the configuration that you found useful for your own work because chances are someone else will find it beneficial
a

Andrew Schechtman-Rook

02/26/2020, 3:08 AM
ok, I'll start with that and we can adjust as needed
👍 1
c

Chris White

02/26/2020, 3:08 AM
For the other more intricate features I’ll do some more review and contact you outside of this thread for how we can work on bringing them in
a

Andrew Schechtman-Rook

02/26/2020, 3:08 AM
sounds good
I'll start working on it next time I have a free moment, we'll see when that is 😛
c

Chris White

02/26/2020, 3:09 AM
haha yea no worries!
👍 1