https://prefect.io logo
Title
m

Michael Reynolds

07/19/2022, 5:32 PM
i have a component that is extremely costly to instantiate that needs to be used in multiple tasks. is there a way to distribute an instance of this as a singleton so that i do not need to reinitialize the costly processor each time the task is run? i'll provide a basic model of the python code demonstrating my dilemma
class MyHeavyProcessor:

	## this is a super costly operation
	def init():
		pass

	def _do_something( messages : list ) -> list:
		##
		pass


	def process( messages : list ) -> int:
		return self._do_something( messages )
		


@task( name='execute-heavy-processor' ):
def execute_processors( messages : list ):
	processor = MyHeavyProcessor()
	processor.init()
	return processor.process( messages )

@task( name = 'execute-something-else' )
def execute_something_else( message ):
	pass



@flow( name = 'my-flow' )
def my_flow():
	output = execute_processors( my_source.poll_messages() )

	for m in output:
		execute_something_else( m )
passing an instance of
MyHeavyProcessor
fails because my heavy processor has a whole bunch of stuff that is probably not easily serializable.
i come from a spark background, and usually we would accomplish this by having a singleton object with a lazily instantiated instance of
MyHeavyProcessor
wondering if there is a similar or equivalent pattern in prefect 2.0
k

Kevin Kho

07/19/2022, 6:04 PM
I think for this to happen, we’d need the
checkpoint=False
to work, which is why I think you brought that up a while ago right? We need a task that can return the singleton without pickling it
m

Michael Reynolds

07/19/2022, 6:04 PM
yea exactly @Kevin Kho!
k

Kevin Kho

07/19/2022, 6:05 PM
but yes it works in Prefect 1 as long as you don’t use Dask and you turn checkpointing off so it should be possible once we get that in
m

Michael Reynolds

07/19/2022, 6:06 PM
hmm, i have a kind of explicit guidance to target 2.0 for a variety of other reasons...
hell, i'm even willing to work on it if you could point me to the place to get started
i'm very comfortable contributing back to open source code
k

Kevin Kho

07/19/2022, 6:08 PM
Michael would be the right person (who responded earlier), but something like this is very core so I think we’d want responsibility of this piece specifically
m

Michael Reynolds

07/19/2022, 6:09 PM
gotcha
since there is no issue for it yet would it make sense for me to open an issue on github?
k

Kevin Kho

07/19/2022, 6:09 PM
You definitely can!
m

Michael Reynolds

07/19/2022, 6:09 PM
okay cool sounds good
k

Kevin Kho

07/19/2022, 6:09 PM
Might help with your personal tracking too
m

Michael Reynolds

07/19/2022, 8:21 PM
@Kevin Kho actually i did some digging and I think i found a superseding issue: https://github.com/PrefectHQ/prefect/issues/5888
should i not open it and then just comment / track against that one?
k

Kevin Kho

07/19/2022, 9:22 PM
Ah yeah that looks like the same. You can comment there