Hey Leonardo - we use Prefect extensively for computer vision tasks and also want to make sure we’re scaling up and down with demand.
However I think prefect works best at being a simple worker. That is, start the worker when a request comes in, do the job and store the result, and then have the worker be destroyed by prefect. This does mean you have a 30s-ish delay due to the prefect startup time.
I wouldn’t suggest the workers running their own HTTP server, instead, consider going outside prefect and using auto-scaling amazon ECS instances. It does exactly what you say. Not on amazon? GCP and Azure have similar concepts too