Michael Hadorn
11/09/2021, 7:55 AMMichael Hadorn
11/09/2021, 7:59 AMimport prefect
from prefect import Flow, task
@task
def say(text):
logger = prefect.context['logger']
rof = other_function()
# rof = other_function('t')
<http://logger.info|logger.info>(f"text: {text} and rof: {rof}")
def other_function():
# def other_function(my_p):
return 'any_string'
with Flow(
"Test function signature hash"
) as flow:
t1 = say('my_text')
if __name__ == '__main__':
hash = flow.serialized_hash()
print(hash)
It goes a little in this direction: https://prefect-community.slack.com/archives/CL09KU1K7/p1634800071022600
But it's not the same.
I think the signatures of methods should be included in the normal flow-hash. Or do I have to calc by myself?Anna Geller
flow.register(project_name="your_project", build=True)
), then all the changes you made to the flow will be reflected in the storage so that your flow will be run in the most up-to-date “version” of it.
However, when it comes to Versioning, it will be by default incremented only if there is any change in the Flow’s metadata and structure - i.e. new tasks, edges, or changes in storage, run configuration, schedule.
If you look at your flow, changing the input parameters on the method signature doesn’t change anything in the flow structure (with Flow() as flow…
), which is why you see the same hash.
So you would have to calculate a hash on your inputs yourself to change this behavior.Michael Hadorn
11/09/2021, 10:39 AMAnna Geller
But when I run the flow, it raise an Exception at the line of the method call complaning about wrong amount of parameters.The exception is raised at runtime based on runtime conditions. The flow’s structure and its metadata is registered at build time.
For me that shows that this flow data (incl. code of the tasks) is in the storage.Yes, if you use the default Docker storage (did I understand correctly, you use Docker storage?), then the flow’s Docker image gets built every time you register a flow, regardless of whether it would result in a new version or not, i.e. even if nothing changed in your flow, the storage still gets built.
Or why do we not hash the full pickled storage?We hash the serialized flow, not the storage. Hashing pickled storage would be difficult, especially because there are so many different storage options and that many users use script storage rather than pickle storage.
For which logic in prefect is my real python code executed (inside the docker image)?It depends how you defined your Docker storage: • this example uses the default behavior where flow gets pickled and baked into your Docker image that gets built every time during registration: https://github.com/anna-geller/packaging-prefect-flows/blob/master/flows/docker_pickle_docker_run_local_image.py • this example uses script storage without building the flow’s Docker image at registration; here the user copied all flows directly into the image beforehand, and only points to the path inside the image where the flow file has been stored within the image: https://github.com/anna-geller/packaging-prefect-flows/blob/master/flows_no_build/docker_script_docker_run_local_image.py
Michael Hadorn
11/09/2021, 1:58 PMAnna Geller
Michael Hadorn
11/09/2021, 2:11 PM