Laura Lorenz (she/her)

07/16/2020, 10:04 PM
Thread for your thoughts: Storage config from Cloud/Server We are also considering enabling people to list a flow storage source (like a Github Repo or an S3 bucket) directly in the UI in Cloud/Server; and in so doing basically do half of the ‘register’ step by letting Prefect know where the storage location is. As a user you would store/push your flow files to wherever and then log in to Cloud/Server to provide the location. We still need a Python interpreter to get involved at some point to parse the flow object to get out some of the metadata so technically it is not fully registered until that happens -- we would probably have the agent be responsible for finishing off the registration for anything that got pre-registered via Cloud/Server as just a pure storage location. Do you think that this would be too confusing? Do you prefer controlling registration entirely or do you like the idea of having things ‘pre-registered’ like this?


07/17/2020, 8:28 AM
Personally, I think this is confusing, at least more confusing than handling storage options alongside registration. Maybe I am failing to see the appeal of it. To me, I am going to provide a location either way, either in the UI or while constructing a
object. Users are still expected to serialize and push flows to the locations, which means location should be specified at this level anyway. Why then enter the location manually to the UI, when
passes on the location seamlessly?

Alex Cano

07/17/2020, 2:10 PM
When I first interacted with Prefect, I originally thought “man, why can’t this just be like Airflow” where you point it at a directory and there’s some mechanism of it “just working”. I think offering a way to do this is fine, but I’d think of it as a stepping stone to understanding proper flow registration and versioning rather than a replacement. After being involved with Prefect for a while, I love that flows are versioned and registered manually. I think this is partially because of a proper SWE background, so I’m probably more familiar with CI/CD practices than your average data scientist, so it might be some bias coming through. I think having some kind of recipe in the docs for either transitioning from the “my flows exist here, you figure it out” to “I’ll handle this now”, or even just a template for a CI process that handles registering and pushing the new flows would be helpful as well!
:upvote: 1

Laura Lorenz (she/her)

07/20/2020, 2:22 PM
Ok nice gotcha thanks again for your input everyone 🙂 If all goes well @emre people following this workflow would NOT personally serialize and push flows to the locations, starting an agent would ‘do it for them’, and that is fundamentally the appeal. The idea even came out of the thought experiment of “how can we get someone to run something on cloud without ever opening a terminal” (this doesn’t really get us all the way there, but does take out the independent registration step at the terminal). I think @Alex Cano hit the prospective use case on the head, this is more for an easy on-ramp that is more “clicky” to just get that first one up (maybe even off a repo of examples, before you even write any flow code yourself, so total newbies). We definitely think at some point people would need to/want to transition to the “I’ll handle this now” situation. Tbh we have been most concerned with getting in the way of people ready for or already doing the “I’ll handle this now” side of it if they try (or someone in their org tires) to mix and match registration workflows. Definitely will take these thoughts back. Since this is more of an additional workflow than totally replacing something (like the environments idea) its probably a bit lower risk to go through with it but if we confuse people more it will have done the total opposite of what we wanted 😅