https://prefect.io logo
Docs
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
prefect-community
  • l

    Luuk

    07/26/2021, 4:59 AM
    Hi all small question, Is is possible to name the runs themselves? I'm using Prefect Core and it's making up the names for the run itself. I would prefer to make own myself with timestamp_variable so that I can use the same flow for multiple configs (yaml files).
    k
    1 reply · 2 participants
  • s

    Samuel Tober

    07/26/2021, 8:24 AM
    Hi all! I am having problems with running Prefect through a conda environment. I am creating the env from a .yml file, and packaging this in a docker container. However, when I build the docker container, and run a flow that imports my dependencies, the modules are not found by Prefect (ModuleNotFoundError). All works well when I just run pip install in the docker file, its when I create a conda env that things go wrong. Please find attached a screenshot of my docker and environment file. Thank you in advance!
    s
    k
    7 replies · 3 participants
  • m

    Martin Felder

    07/26/2021, 8:30 AM
    Hi! I would be very interested in this as well. I already posted a corresponding discussion topic:
  • m

    Martin Felder

    07/26/2021, 8:30 AM
    https://github.com/PrefectHQ/prefect/discussions/4812
    k
    10 replies · 2 participants
  • m

    Martin Felder

    07/26/2021, 8:31 AM
    Sorry I'm kind of a Docker NOOB. I opted for a miniconda container and installed prefect afterwards. My code is not installed via pip but simply lies in the PYTHONPATH.
  • b

    Bruno Murino

    07/26/2021, 9:06 AM
    Hi everyone — I have an ECS Service with 1 container running a Prefect ECS Agent, however it seems like the memory consumption of this process steadily increases until eventually dies of OOM and then restarts. Does anyone know why this happens and how to fix it? The container entrypoint is simply:
    import prefect
    from prefect.agent.ecs.agent import ECSAgent
    
    agent = ECSAgent(
        region_name="eu-west-1",
        cluster="knightsbridge",
        labels=["ecs"],
        launch_type="EC2",
        agent_address="<http://0.0.0.0:8009>",
    )
    
    agent.start()
    k
    4 replies · 2 participants
  • d

    Dotan Asselmann

    07/26/2021, 11:42 AM
    Hi all, is it possible to configure the Lazarus to trigger fail after N > 3 of scheduling attempts? or wait more time between reschedules? i have flows that are stuck on a queue due to compute limitation and the lazarus kills them before running
    k
    4 replies · 2 participants
  • h

    Hilary Roberts

    07/26/2021, 12:58 PM
    What is the best practise for splitting a large workflow with many upstream and downstream datasets into smaller flows? We are starting to set up our first couple of flows to manage the pipelines for our data warehouse. Like in most data infrastructures we will probably have several upstream datasets, which will feed into the pipelines for datasets further downstream, which in turn feed into more datasets even further downstream and so on. My question is how to best organise a large workflow like this? You could create one massive flow, but this wouldn’t be very nice to manage. So what is the best way to split this up in prefect? Some ideas: • Import a task from the upstream flow into the downstream flow. I don’t think this has the intended effect. You end up re-registering the upstream flow every time you do the import, and I think it ends up just duplicating the imported task, rather than making the downstream flow actually wait for the upstream flow. • Create a waiter task that succeeds once the upstream data has landed.  • Send some kind of event from the upstream flow that triggers the downstream flow. (Not completely sure how I’d do that) • Create a Parent-flow that triggers a bunch of sub-flows Is there a “correct” way to do this? What are people’s experiences? Sorry if I missed an obvious piece of documentation or discussion somewhere.
    k
    2 replies · 2 participants
  • p

    Pedro Machado

    07/26/2021, 1:39 PM
    Hi everyone. Are there any docs that explain how one might configure Prefect Cloud to support different environments (dev, stage, prod)? For example, do we separate using different tenants/projects/agent-labels? How should we manage different sets of secrets and access to those? I'd appreciate any suggestions. We are using Kubernetes for execution. Thanks!
    m
    k
    2 replies · 3 participants
  • m

    Madison Schott

    07/26/2021, 2:14 PM
    Hi, I am getting this error when trying to start my local agent. I've already authenticated using my token in the cli.
    RuntimeError: Error while contacting API at <https://api.prefect.io>
    m
    43 replies · 2 participants
  • b

    Bruno Murino

    07/26/2021, 2:42 PM
    Hi everyone — I have a flow that runs correctly with “prefect run”, however if I try to run via local agent then I get a weird error and the flow hangs in the “submitted” state
    k
    s
    53 replies · 3 participants
  • j

    Jelle Vegter

    07/26/2021, 3:39 PM
    Hi all, I'm new to Prefect and trying to set up a flow with Azure Blob Storage. How can I convert the downloaded file to a pandas dataframe? I don't think I'm properly understanding and using the Class. Thanks!
    k
    5 replies · 2 participants
  • l

    Leon Kozlowski

    07/26/2021, 4:37 PM
    Hi all, question on deploying flows with multiple agents. If I have version 1 of my flow deployed to an agent with label ‘prod’ and I want to test out a new feature on an agent with label ‘dev’. Will this new flow deployment of version 2 on agent dev have an effect on my version 1 flow already running on agent with label ‘prod’?
    k
    7 replies · 2 participants
  • p

    Pedro Machado

    07/26/2021, 4:58 PM
    Hi. When using the new tasks
    create_flow_run
     and
    get_task_run_result
    , how can I assign a different name to different instances so that I can see which task instance is running in the UI without having to look at the logs?
    k
    m
    5 replies · 3 participants
  • i

    itay livni

    07/26/2021, 6:12 PM
    Hi - Is there a way to delete an Agent from the Cloud UI? Some time ago I remember being able do it. I also tried to delete an agent using graphql ... Is this right?
    mutation{
    delete_agent(input: {agent_id:"<id>"}){success}
    }
    Thanks in advance
    k
    n
    3 replies · 3 participants
  • m

    Madison Schott

    07/26/2021, 7:32 PM
    How do we store secrets using Prefect Cloud? I can't find anything in the UI.
    k
    5 replies · 2 participants
  • b

    Bruno Centeno

    07/26/2021, 7:46 PM
    Hi, how do I run multiple tasks? I have a list of ids from my back end that I need to rerun the tasks in prefect, but seems like i need to do it one by one, is there a way to pass that list and it will create all the tasks and run them?
    k
    23 replies · 2 participants
  • a

    An Hoang

    07/26/2021, 8:58 PM
    How do I set the location of the result upon writing? Below code doesn't work, does not generate the
    df.parquet
    file:
    parquet_result = LocalResult(dir="./test_prefect", serializer = PandasSerializer("parquet")
    
    @task
    def test_task(df1, df2):
        parquet_result.write(df1, location = "df1.parquet", **context)
        parquet_result.write(df2, location = "df2.parquet", **context)
    Currently I have to set the
    location
    attribute at the time of instantiating the
    LocalResult
    object. The code below works
    parquet_result_partial = partial(LocalResult, dir="./test_prefect", serializer = PandasSerializer("parquet"))
    
    @task
    def test_task(df1, df2):
        parquet_result_partial(location = "df1.parquet").write(df1, **context)
        parquet_result_partial(location = "df1.parquet").write(df2, **context)
    So it seems the
    location
    kwargs to
    Result.write
    does not do anything. Is this by design? Or am I missing something
    k
    1 reply · 2 participants
  • b

    Billy McMonagle

    07/26/2021, 9:21 PM
    Hi there! I have a shared task that I would like to run from a bunch of different flows. The only differences between the flows are the flow names, the schedules, plus a single parameter. Is this a reasonable way to accomplish this?
    from prefect import Flow, Parameter, task
    from prefect.schedules import CronSchedule
    
    
    @task
    def my_task(my_parameter):
        print(f"my_parameter value is {my_parameter}")
    
    
    with Flow("my-flow-1", schedule=CronSchedule("0 * * * *")) as flow1:
        param = Parameter("my_parameter", default="flow-1-parameter-value")
        my_task(param)
    
    with Flow("my-flow-2", schedule=CronSchedule("1 * * * *")) as flow2:
        param = Parameter("my_parameter", default="flow-2-parameter-value")
        my_task(param)
    k
    l
    4 replies · 3 participants
  • m

    Madison Schott

    07/26/2021, 10:43 PM
    Hi, I am getting this error when trying to set up aws with ecs
    with Flow("user_brand_campaigns", storage=STORAGE, config=RUN_CONFIG) as user_profile_flow:
    TypeError: __init__() got an unexpected keyword argument 'config'
    k
    1 reply · 2 participants
  • a

    Andre Muraro

    07/27/2021, 12:50 AM
    Hey everyone, We're using a test setup very similar to the one detailed here. The only difference is that we're not using any tags for the local agent, and most flows use LocalDaskExecutor, and used compose scale to get a total of 10 agents.  We have observer a few of issues and would like to clear them before we can decide to implement prefect. We noticed several tasks are logging multiple times, in tandem with the number of running agents. When we used 3, we would get 3 log messages for each task status, or custom messages too, for that matter. When we scaled to 10 agents, we get 10 messages, as shown in the print. Not sure if related, but I noticed too that the docker logs seemed to show that a certain agent that only picked up a single flow  nevertheless seemed to deploy every flow that had been scheduled for that 20h scheduled time. So, it looks like every agent is trying to pickup every flow? I have also noticed a few cases where the server seemed to be showing statuses for tasks that would break their dependency chain (example below), showing earlier tasks being killed by zombie killer, but later ones marked as success. Also, many long running tasks show as failure, but generate their proper outputs. Maybe my basic setup can't handle all the requests being thrown at the API? For long running tasks, it looks like the only way would be to turn off zombie killer? From the source code it appears the 10 minute heartbeat delay is hard-coded, so if I have a task expected to run for 20m it'll be forced to fail every time? I appreciate if anyone can clear these up and guide me towards a proper configuration Thanks!
    k
    m
    13 replies · 3 participants
  • j

    jake lee

    07/27/2021, 8:28 AM
    Hello! prefect newbie here! our company is considering implementing prefect to our existing data pipeline process and was wondering Is it possible to configure prefect with multiple ec2 instances? we have different servers with ml training job, and batch job and would like to use one prefect to manage all tasks in different instances I saw the concepts of ‘agent’ so in this case would it be producing agent in each server and connecting them to master node? Thanks in advance!
    y
    k
    10 replies · 3 participants
  • n

    Nicholas Hemley

    07/27/2021, 10:20 AM
    Hey folks, I have a question concerning the MySQLFetch task, since it appears I can't supply a file to the query parameter e.g. task = MySQLFetch(query="./data/test-query.sql", ... Is this on the roadmap since otherwise I will have to load each file and pass into the task as a string manually ... thanks for any pointers!
    a
    2 replies · 2 participants
  • j

    Jelle Vegter

    07/27/2021, 10:31 AM
    If I use Prefect Cloud to schedule flows and have a Local agent for the execution. Do I need use .run() while registering the flow and keep this terminal open?
    y
    k
    19 replies · 3 participants
  • r

    Rinze

    07/27/2021, 12:23 PM
    Untitled.py
  • r

    Rinze

    07/27/2021, 12:23 PM
    Hi, I'm trying to connect to SQL Server but am running into some issues. This is a condensed version of my code ^^. Any ideas why I get this TypeError?
    Traceback (most recent call last):
      File "C:\Users\***\code\pipeline\src\flows_combined_entries.py", line 76, in <module>
        sqlserver.run(query='select * from julitest.entries where id = 50299',
      File "C:\Users\***\code\pipeline\venv\lib\site-packages\prefect\utilities\tasks.py", line 441, in method
        return run_method(self, *args, **kwargs)
      File "C:\Users\***\code\pipeline\venv\lib\site-packages\prefect\tasks\sql_server\sql_server.py", line 90, in run
        executed = cursor.execute(query=query, vars=data)
    TypeError: execute() takes no keyword arguments
    s
    k
    4 replies · 3 participants
  • s

    Sean Talia

    07/27/2021, 1:58 PM
    Has anyone ever seen this error code before? One of my scheduled flows failed early this AM – after the flow was successfully downloaded, it tried to start up its first task, and then ~10 minutes later failed with the error:
    prefect.utilities.exceptions.ClientError: [{'path': ['flow_run_by_pk'], 'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'type': 'system', 'errno': 'ECONNRESET', 'code': 'ECONNRESET'}}}]
    I haven't run into this one before...this is on Cloud
    k
    3 replies · 2 participants
  • p

    Pedro Machado

    07/27/2021, 2:16 PM
    Hi everyone. What is the recommended way to "forward" the parent flow's labels (at run time) to a child flow?
    k
    5 replies · 2 participants
  • m

    Mohamed Hajji

    07/27/2021, 2:23 PM
    Hello everyone, what's the default directory for the configfile in windows? i can't find it , when runing prefect config i get that it's pulling its config from C:\Users\username\.prefect ,this directory does not exit on my machine , thx
    PS C:\Users\username> prefect config
    {"debug": false, "home_dir": "C:\\Users\\username/.prefect", "backend": "server", "server": {"host": "<http://localhost>", "port": 4200, "host_port": 4200, "endpoint": "<http://localhost:4200>", "database": {"host": "localhost", "port": 5432, "host_port": 5432, "name": "prefect_server", "username": "prefect", "password": "test-password", "connection_url": "<postgresql://prefect>:test-password@localhost:5432/prefect_server", "volume_path": "C:\\Users\\Mohamed Hajji/.prefect/pg_data"}, "graphql": {"host": "0.0.0.0", "port": 4201, "host_port": 4201, "debug": false, "path": "/graphql/"}, "hasura": {"host": "localhost", "port": 3000, "host_port": 3000, "admin_secret": "", "claims_namespace": "hasura-claims", "graphql_url": "<http://localhost:3000/v1alpha1/graphql>", "ws_url": "<ws://localhost:3000/v1alpha1/graphql>", "execute_retry_seconds": 10}, "ui": {"host": "<http://localhost>", "port": 8080, "host_port": 8080, "endpoint": "<http://localhost:8080>", "apollo_url": "<http://localhost:4200/graphql>"}, "telemetry": {"enabled": true}}, "cloud": {"api": "<http://localhost:4200>", "endpoint": "<https://api.prefect.io>", "graphql": "<http://localhost:4200/graphql>", "use_local_secrets": true, "heartbeat_interval": 30.0, "check_cancellation_interval": 15.0, "diagnostics": false, "request_timeout": 15, "send_flow_run_logs": true, "logging_heartbeat": 5, "queue_interval": 30.0, "api_key": "", "tenant_id": "", "agent": {"name": "agent", "labels": [], "level": "INFO", "auth_token": "", "agent_address": "", "resource_manager": {"loop_interval": 60}}}, "logging": {"level": "INFO", "format": "[%(asctime)s] %(levelname)s - %(name)s | %(message)s", "log_attributes": [], "datefmt": "%Y-%m-%d %H:%M:%S%z", "extra_loggers": []}, "flows": {"eager_edge_validation": false, "run_on_schedule": true, "checkpointing": false, "defaults": {"storage": {"add_default_labels": true, "default_class": "prefect.storage.Local"}}}, "tasks": {"defaults": {"max_retries": 0, "retry_delay": null, "timeout": null}}, "engine": {"executor": {"default_class": "prefect.executors.LocalExecutor", "dask": {"address": "", "cluster_class": "distributed.deploy.local.LocalCluster"}}, "flow_runner": {"default_class": "prefect.engine.flow_runner.FlowRunner"}, "task_runner": {"default_class": "prefect.engine.task_runner.TaskRunner"}}}
    k
    6 replies · 2 participants
  • m

    Michael Warnock

    07/27/2021, 2:32 PM
    I'm able to run a flow on Coiled, but only if I run it with
    flow.run(executor=ex)
    as opposed to
    create_flow_run
    on a prefect client. When doing the latter, I get no apparent attempt to start the cluster, and I see task output on my docker agent (which, interestingly, is full of s3 permissions related crashes that don't happen if I don't specify my coiled/dask executor). When running with
    flow.run
    obviously the flow doesn't appear in my dashboard. What's the expected behavior? Am I supposed to have some other kind of agent running?
    k
    m
    63 replies · 3 participants
Powered by Linen
Title
m

Michael Warnock

07/27/2021, 2:32 PM
I'm able to run a flow on Coiled, but only if I run it with
flow.run(executor=ex)
as opposed to
create_flow_run
on a prefect client. When doing the latter, I get no apparent attempt to start the cluster, and I see task output on my docker agent (which, interestingly, is full of s3 permissions related crashes that don't happen if I don't specify my coiled/dask executor). When running with
flow.run
obviously the flow doesn't appear in my dashboard. What's the expected behavior? Am I supposed to have some other kind of agent running?
k

Kevin Kho

07/27/2021, 2:37 PM
Hey @Michael Warnock, yes it is expected behavior that
flow.run()
does not appear on the dashboard. This is only for local testing. When you’re ready for production, you register the flow. The
client.create_flow_run
takes in a flow id to start so you would need to register first before you can use it. After registration, you can start a flow by clicking the “Quick Run” button in the UI, that will attempt to pass the flow to an agent. In your case, you want to spin up the local agent that would execute the Flow. Through the CLI it would be
prefect agent local start
. This agent will pick up and execute the flow runs. It will also pick up the scheduled flow runs. Just make sure agent labels match the flow labels.
m

Michael Warnock

07/27/2021, 2:47 PM
I've registered and run flows that way before- my problem is with trying to use the DaskExecutor configured for Coiled. I assigned it to .executor in my flow before registering, and I don't see anywhere else to specify it (register or create_flow_run seem like the likely places). What kind of run_config should I use (I'm using DockerRun)?
k

Kevin Kho

07/27/2021, 2:50 PM
You don’t need to specify it elsewhere. When the Flow runs, it will spin up the executor. For the RunConfig, if there is a specific container you need to run the file on top of then DockerRun is good. You would need a docker agent to run the flow.
m

Michael Warnock

07/27/2021, 2:54 PM
Well, as I said, when I run it with
create_flow_run
I see no attempt to spin up or connect to the dask cluster (which log would that appear in? I also see no indication on the coiled dashboard that a cluster has been started, or any new ec2 instances), and the docker-agent logs are full of errors related to s3 permissions that I don't get if the executor isn't specified.
those s3 permissions errors also don't occur in the dask workers when I use
flow.run
- it actually works, modulo a thread safety issue I'm in the process of fixing
k

Kevin Kho

07/27/2021, 2:59 PM
Are you using S3 Storage?
m

Michael Warnock

07/27/2021, 3:03 PM
docker
s3 is for the inputs and outputs of the task
k

Kevin Kho

07/27/2021, 3:08 PM
The spinning up and the connection to the Dask cluster would appear in the Prefect UI under the Flow Logs. The executor shouldn’t be related to S3 permissions, do you have the necessary env variables in the Docker container for AWS access?
m

Michael Warnock

07/27/2021, 3:11 PM
yes- it works both under docker without the dask executor, and within the coil 'software_environment' which specifies a docker image to start from that's almost identical (uses the same dockerfile) when I use flow.run - it's only when using create_flow_run [with the dask/coil executor] that it happens, and it's not a simple credentials-not-found error- it's something about not having permission for ObjectHead - I don't have the error handy right now.
k

Kevin Kho

07/27/2021, 3:17 PM
The
flow.run()
will not spin up the container on local to run the flow so I think the environment variables that you have on your local machine are providing that authentication, and those are not in the container with is why the errors are happening.
m

Michael Warnock

07/27/2021, 3:23 PM
no. The same calls to s3 work inside the docker image that prefect builds if I do a
create_flow_task
without the dask-executor. If I fail to provide the credentials, I get an error along the lines of "credentials not found". If I do a
flow.run(executor=my-dask-executor)
an image is built by coiled, and run on the cluster it spins up. These tasks access s3 just fine. Something other than my credentials being there is wrong.
m

Michael Adkins

07/27/2021, 3:41 PM
Hey Michael, we'll need some error logs to determine what's going on here. Does this work if you disable writing task results to S3?
m

Michael Warnock

07/27/2021, 3:44 PM
ok- I passed credentials through the docker agent command line, which is running locally. I don't understand why it would be necessary, unless the tasks are running locally, and they certainly appear to be, though I have some activity on the dask cluster (which could easily be a not-cleanly-killed job I just had running)
yeah- there's no question the whole flow is running locally now (without the s3 error because of the credentials-to-docker-agent thing). It's ignoring the executor when I use create_flow_task. Do I maybe not want the DockerRun run_config? Trying that.
m

Michael Adkins

07/27/2021, 3:48 PM
How did you set the executor for your flow?
m

Michael Warnock

07/27/2021, 3:48 PM
nope- still running locally.
flow.executor = my_exec
before registering it
m

Michael Adkins

07/27/2021, 3:51 PM
And how did you register your flow? Did you set the executor before the register call?
We don't persist executor settings to the database so it'll need to be in the pickled flow object. Are you using
stored_as_script=True
with your Docker storage?
(Sorry slack displayed your 'before registering it' message after I sent mine 🤷)
m

Michael Warnock

07/27/2021, 3:52 PM
yeah, yours all just came in a lump; netsplit 🙂
flow = ford.flow

    executor = ford.get_coiled_executor(image_uri=image_uri, region_name=worker_config.region_name)
    flow.executor = executor

    #flow.run_config = ECSRun(image=docker_image)
    #flow.run_config = DockerRun()#image=docker_image)

    flow_id = flow.register(project_name='feature-generator')

    prefect_client = Client()
    prefect_client.create_flow_run(
        flow_id=flow_id,
        parameters=dict(job_spec=job_spec),
    )
    #flow.run(executor=executor, parameters=dict(job_spec=job_spec))
flow.storage = Docker(
    dockerfile="./Dockerfile",
    prefect_directory="/usr/src/app",
    stored_as_script=True,
    path="/usr/src/app/feature_generator/ford.py"
)
m

Michael Adkins

07/27/2021, 3:55 PM
Aha 😄
You are storing the flow is a script, so when your flow runs we are executing
/usr/src/app/feature_generator/ford.py
then extracting the flow from the variables in the file. Since you are setting the executor in a different file, the executor is never set on your flow run.
m

Michael Warnock

07/27/2021, 3:56 PM
oooohh 🤦
m

Michael Adkins

07/27/2021, 3:57 PM
It's tricky that some things are persisted to the backend when you call
flow.register
and some aren't. Executors are not persisted to allow more customizable options (ie we don't have to know how to serialize/deserialize it)
m

Michael Warnock

07/27/2021, 4:04 PM
ok, so, I don't know the docker image uri to pass to coiled, until I'm ready to run the flow. how do I break out of this chicken and egg?
m

Michael Adkins

07/27/2021, 4:07 PM
Perhaps something like
executor = ford.get_coiled_executor(image_uri=os.environ.get("IMAGE_URI"), region_name=worker_config.region_name)
    flow.executor = executor
I guess personally, I'd use S3 storage for your flow then just build your docker image yourself though
m

Michael Warnock

07/27/2021, 4:09 PM
how does using s3 storage for the flow help, if the executor isn't serialized?
and how do I populate the environment of the agent(?) when I only get the image-uri as part of the CI process that's starting the flow?
Sorry if I'm being dense; I haven't quite grokked the prefect architecture yet. Anyway, I need to run; will be back in an hour or so.
m

Michael Adkins

07/27/2021, 4:39 PM
If you use S3 storage then you don't have a chicken/egg issue, the image_uri is just from your prebuilt image. Yeah setting the env var would be tricky. You can set them per run, but I don't see a clear way to do it in an automated fashion. One cool thing you could do is use the prefect kv-store and set a key on registration to the image URI then pull the key when your flow is loaded from storage (it would have to be script-based storage so it executes).
m

Michael Warnock

07/27/2021, 6:14 PM
Sorry- I still don't get it. If I use S3 storage, the flow script/blob gets put there when I register, but without the executor; then I create_flow_run, and an agent grabs it and executes it (without the executor, or with the chicken and egg problem). What am I missing? Non-docker storage is a problem for other reasons, or I'd just try it.
m

Michael Adkins

07/27/2021, 6:16 PM
I'm imaging something like this
-> CI builds a docker image and pushes it to a predetermined URI -> Downstream CI task registers your flow -> Your flow script sets the executor to your predetermined URI (or you retrieve this from your earlier push step) -> Your flow script is pushed to S3 -> You create a flow run, it pulls your flow from S3, it has the executor set correctly
m

Michael Warnock

07/27/2021, 6:19 PM
what code makes step 3 happen?
m

Michael Adkins

07/27/2021, 6:21 PM
Just set the executor in the same file that your flow is defined in
m

Michael Warnock

07/27/2021, 6:21 PM
you're describing the chicken-egg scenario then; how does it being on s3 help?
m

Michael Adkins

07/27/2021, 6:22 PM
Because the docker image is not built at registration time
It's built before by you
m

Michael Warnock

07/27/2021, 6:26 PM
that's still the chicken-egg scenario. I have the uri at registration time, but it's associated with that parameterized-flow; it's created by the CI pipeline which then executes the script that registers and runs the flow. I can't add to to the flow code before it's registered, and if my flow script can be pushed to s3 by some other step, I don't see what it is (manual copying?)
the flow's tasks depend on the version of the code that's embedded in the docker image by the CI
m

Michael Adkins

07/27/2021, 6:28 PM
Prefect is building your dockerfile into a image right now at registration time; this gives you a URI, yes?
m

Michael Warnock

07/27/2021, 6:29 PM
no- coiled is building a docker image, starting with the image the CI builds and puts in ECR, which I pass in the executor config
well, I suppose prefect is probably building a docker image too, based on the local dockerfile- this is what's so confusing, but I'm not using a uri from that
m

Michael Adkins

07/27/2021, 6:32 PM
Okay. Why can it not work like this? • Build an image from your Dockerfile with your flow's requirements • Build another image derived from above using coiled • Push image to ECR; set URI in environment variable • In your flow script, set the executor to use the URI from the environment • Register your flow, use pickle-based S3 storage ◦ Your flow object will be frozen to S3 with the correctly configured executor
If you want to use script based storage • In your flow script, set the executor loading a URI from the environment • When you register your flow, attach a
RunConfig
with the environment variable set to the URI from CI
I'd recommend the second, as pickle-based storage is often very confusing.
m

Michael Warnock

07/27/2021, 6:35 PM
ok- I still have no earthly idea why I would use s3 storage, but I guess the answer I was actually looking for is that I can set the environment variable for the agent in the RunConfig
m

Michael Adkins

07/27/2021, 6:40 PM
If we are building the image for you at registration time (aka you are using Docker storage), you will not be able to set the value in the default run config for the flow. Building the image yourself and using something else to store your flow (literally any other storage) ensures that you an image URI at registration time that you can set in a run config. But 🤷 you can also separate the
build()
/
register(build=False)
steps if you're married to Docker storage.
m

Michael Warnock

07/27/2021, 6:42 PM
what does "default run config for the flow" mean exactly? If I pass env={} to DockerRun in the code I pasted above, will the flow script the agent runs not see those variables?
m

Michael Adkins

07/27/2021, 6:43 PM
If you set the
DockerRun
after you register the flow, it will not be attached to the flow
m

Michael Warnock

07/27/2021, 6:43 PM
see the code above; I call register afterward
m

Michael Adkins

07/27/2021, 6:44 PM
Then how are you going to have an image based on the one Prefect builds?
Very confused about this chicken/egg thing but it sounds like you've got it
m

Michael Warnock

07/27/2021, 6:45 PM
I'm not- I use the one my CI builds to pass to coiled as the base image for what they will build: that part works- it's only the agent that has this problem, because the executor didn't get there
so I've moved the executor to the other file as you described, and I just need to have it pull from the environment now, and I think it should work- I'll let you know in a minute
m

Michael Adkins

07/27/2021, 6:46 PM
I see; there's a third docker image in the mix 😄
m

Michael Warnock

07/27/2021, 8:04 PM
I had to make coiled credentials available and call some coiled stuff only on the agent, but it's working. Thanks for the help!
View count: 1