Hello! I'm Elliot - technical founder of a pre-se...
# introductions
e
Hello! I'm Elliot - technical founder of a pre-seed startup looking for product/market fit, experienced software engineer and engineering manager, teller of dad jokes. I worked at Spotify for a bunch of years and my team there used our home-grown feature flagging and analytics systems. At my last company we used LaunchDarkly. I'm excited about getting to know Prefect now.
👋 9
🚀 6
🙌 7
a
What's up Elliot! Have been in your boat before, welcome to the community!
r
Hey there Eliot - Welcome! I recently worked at LaunchDarkly so can relate to joining the Prefect train. Thanks for trying us out!
e
Oh sheesh. I'm adding feature flagging and workflow orchestration and it's Friday end-of-day and wires got crossed in my head.
We went from Luigi to Flyte and Airflow on different teams.
👍 1
I'm looking at flipt / OpenFeature for feature flagging.
j
Welcome to the group, Elliot! Happy to have you here no matter what your tech needs.🎉
r
haha that's great.. I was literally saying to someone "Prefect for feature flagging?" 😅. Welcome.
e
Yeah, right?
I have a pipeline that went from a "try it in a notebook and with individual python scripts" to "one big script to do all the things" and now I'm breaking it into smaller pieces and looking to use Prefect to manage the pipeline through subflows. The different stages are used to take data, refine it with LLMs and content from web searches, insert into databases. Am I thinking about Prefect in the right way?
👍 3
yess 3
a
Yep, that's exactly the right way
e
Sweet. Thx thx.
Any suggestions for "free" web searches that I can use as an alternative to SerpAPI?
a
free search engine lookups or are you trying to scrape a particular type of the world
are you trying to go "English -> search results -> snag those pages -> extract/summarize" or something?
e
more like "<a thing some business did> press release filetype:pdf". I have a lot of those things and am looking for "real data" to back that up.
The
work pool
feature also seems friendlier than "write some celery thing and a bunch of cronjobs"
a
yeah dude it's sick. I ran a pretty large scale scraping operation at my last startup and getting from celery hell into Prefect was 🤌 (also why I later joined, haha)
for "real data" to back that up.
so Bing has an RSS feed for its news that's "free" (in the sense that it's against their TOS but they don't care until you're big, and then you can just pay folks for it legitly)
e
A question.... because we're early stage startup, I am the data team and the web team and the guy who changes the batteries on my co-founder's boyfriends TV remote control. Can you help me articulate why it makes sense to use a hosted (or digitalocean-hosted...) Prefect instance?
Hah about the Bing thing. Hopefully Microsoft acquires us before we get big enough for them to complain about that. Are you talking about a RSS news feed?
a
use a hosted Prefect instance
like vs free tier on prefect cloud
about a RSS news feed?
yeah, but tl;dr you can just hit bing.com/somethingIforget/?topic=what+you+need and it gives you like the top 10 things that relate to your query
🙌 1
e
yeah, i mean "prefect cloud" or "a prefect server instance I am running on digital ocean"
a
hosting your own means you can build indexes on queries you need to tune, which makes sense when you're really sweating the thing. If you have security / compliance stuff where all the data needs to live in your VPC, etc. prefect cloud is also multi-tenant, which means you're in some sense sharing resources with other folks. isn't really an issue, but I think that's the usual reason why folks roll their own. I ran Prefect OSS on RDS in AWS for a bit and it was fine but I ran out of AWS credits and then realized I could just run my startup on prefect cloud free-tier and it was fine.
right now OSS doesn't have "push work pools" which is a fancy phrase for "serverless / autoscaling workloads". So if you hate celery and having to host workers all the time, that's maybe an argument for the cloud life.
👌 1
e
We're hand-waving about compliance right now. It will come, but not yet.
Is there any way to adjust the frequency of running of scheduled deployments in the web interface?
a
image.png
e
thank you
a
btw as you're going through this if anything feels rough around the edges lmk
thinking a lot about the UI recently
e
I got a little worried that this thing is running running running because i didn't realize that the 2 viz on the left look are just based on the runs, not on time. and the one on the right too, i guess.
and this kinda made me think that this was running because of the "every minute every day". I tried to turn it off there, but can't. I think it's not running because it's "not ready".
a
woof, yeah I can see how that's confusing
e
Can you point me at any example "best practices" on github for structuring LLM+search+data pipelines?
Is "AskMarvin" you guys? https://www.askmarvin.ai/
a
Sure is!
on my phone so a little curt, but tl;dr wrap every HTTP call in a task so you get retries / caching / idempotency for each atomic piece of failure. Also get .map or TaskRunners which let you do massively parallelized operations.
so openai calls, give them their own task scraping a single page, give them their own task
my usual suggestion is think through where you expect the most failure, and that's usually where I start encapsulating them in a task
sorry if this is a ilttle generic 😅
e
haha. all good.
I was just thinking about using Prefect for the pipelines. It feels like it might be useful in "retrieval" part of our app too. But maybe not.
I'm using the retries in Instructor now. Not making any "direct" calls to openai. https://python.useinstructor.com/
a
yeah those retries are local or "client"-side.
which means like "if your machine dies while you're on the 3rd retry you're f'd"
might be a corner case a small scale but as you start beefing this up you want the "state" of your retry stored somewhere else
corner case at* small scale
e
you're talking about data pipelines, right?
hrm. thinking... we've got web code calling an API which does the agentic and RAG and LLM-calling stuff. So the "client" I am thinking about here is the API in my app.
thx for talking through this stuff. I need to go feed my kids.
a
🙇
y
@Elliot Winard Are you you need Sub Flows? With Prefect 3.0 you should be able to define Tasks within Tasks. Creating a new sub flow might be too much, and you loose context.
e
Thx. Not sure I need Sub Flows. I'm just getting my head around the tool.
👍 1