It appears that Prefect's design center is around ...
# prefect-getting-started
m
It appears that Prefect's design center is around data workflows, e.g. ETL. My application, on the other hand, is a web service that has to handle a large number of simultaneous requests. Each request is independent; so I assume that scaling would be straightforward. Each request is a pipeline of steps, e.g. preprocess some input, send off a query to an external service, postprocess the result, do some validity checking, return the results to the client. It's really just another form of data processing, but the time scale may be different from a traditional ETL workflow - here we need to get an answer back to the client in ~1 sec. Any thoughts on whether prefect would work for this use case? Any potential limitations or drawbacks? Thanks
Sometimes some of the steps may be done in parallel - others will be pipelined. The idea is to create a set of modular tasks that can then be mixed and matched as appropriate to create new features. Prefect's notions of tasks and flows seems to fit this pattern well.
c
If your processing time is reliably sub-second, then there’s often no point in doing anything more than processing the request directly. I’d think of this in terms of how important is a single run of a pipeline and what health metrics are important. Prefect is more appropriate for longer-running, coarse-grained tasks that are individually important enough to monitor very closely. You have separate logs for each flow/task and when a flow or task fails, that might be a serious issue. For a service that is handling large numbers of long-running requests, you are probably more interested in monitoring health in terms of a rate of failure and will keep/analyze an aggregate log. In that case you’d want to look at using use some sort of lighter weight task queueing around your pipeline and design your APIs and pipeline steps around that (e.g., AWS SQS, redis, rabbitmq).
m
That makes sense. I was more interested in the flow/task infrastructure that would allow for a modular toolbox of components for our various features. Some of the more advanced pieces, such as the dashboards, persistence, assets, may not be as important. Monitoring of individual flow runs is important for development and evaluation. In production, we'd probably have to do a different kind of visualization that did aggregate stats across the many runs - performance, error rates, etc.
Prefect is more appropriate for longer-running, coarse-grained tasks that are individually important enough to monitor very closely.
I do see that this is the design center, but are there actually reasons why shorter running, smaller tasks/flows would suffer?
c
I guess I’d say to go fire off 10,000 hello world flows and see what you think about the experience in the cloud UI at that point. They do a great job with the UI, but I think huge numbers of jobs still make it kind of unmanageable. I have one thing I fire off every 15 min and even that kind of buries the other activities in the UI. Beyond that, I would caution putting a SAAS in the loop for high volume critical operations - especially when it is a use case that the builders of the SAAS didn’t really set out to solve for.
Also, I’m just a user of prefect and this is all just my personal opinion. The makers of it may think about this differently and actually be interested in supporting your use case. Dunno!
m
by SASS, did you mean SAAS? Otherwise I'm not familiar with that acronym.
c
Yes
m
Thanks, Chris for your comments and candor. I appreciate all input! I just posted here because it seemed like an appropriate place. Do you have any suggestions as how I might get a more "official" opinion?
c
Dunno. I would have thought you’d have already gotten one!
m
Thanks, anyway