I see, thank you for the quick response.
I think in the end what we want to reduce is the overhead of spinning up a container, load models into memory and only then run inference. With a separate, horizontally scaled inference server this becomes a lot faster.
I guess what would be nice to have in our case is if prefect would be able to just send a request to the inference service instead of spinning up a container.
But maybe our use case does not fit very well to what prefect offers - I’ll have to read a bit more about it ask ask better questions, sorry