Sean Talia
12/02/2020, 5:24 PMDockerRun
run config, Docker
for storage) with labels, and the labels show up in the prefect UI, but not in the terminal output? e.g. I execute:
with Flow(
"docker-flow", run_config=DockerRun(labels=["data-science"]), storage=Docker()
) as flow:
and then register the flow. The flow shows up in the UI with the data-science
label, but the terminal output when registering the flow looks like:
Flow URL: <flow_url>
└── ID: <id>
└── Project: test-project
└── Labels: []
this was causing me to believe that the labels weren't getting attached to the flow, but they are in fact thereAnish Chhaparwal
12/02/2020, 5:27 PMgit_clone=ShellTask(log_stderr=True)
with Flow ('QETL') as flow:
git_url = Parameter("git_url",
default="<https://github.com/ieee8023/covid-chestxray-dataset>")
git_clone(command="git clone {url} {target}".format(url=git_url)
flow.run()
ale
12/02/2020, 6:24 PMAndrew Hannigan
12/02/2020, 6:53 PMAiden Price
12/02/2020, 10:25 PMrun_config
I can get KubernetesRun()
to work with no problems at all. But I'm unsure which run_config
to use for my pre-existing Dask cluster. I presume I need the DaskExecutor
but do I use a LocalRun
with it like I used to use a LocalEnvironment
? I'm a fan of this change by the way. Thank you all.Andrew Hannigan
12/02/2020, 11:03 PMMatt Drago
12/03/2020, 5:03 AMJoël Luijmes
12/03/2020, 12:31 PMDimitris Stafylarakis
12/03/2020, 1:22 PMrequests.exceptions.HTTPError: 404 Client Error: Not Found for url: <http+docker://localhost/v1.40/images/create?tag=latest&fromImage=>...
) and docker.errors.ImageNotFound when I try to register the flow. Others have mentioned similar errors in the past, but as far as I saw this was when trying to deploy the flow, not when building the image. Any clues/hints/tips would be appreciated! I'm on OSX using docker-desktop btw, just the default config. Thanks in advance 🙂Payam Vaezi
12/03/2020, 3:05 PMKubernetes Error: Back-off pulling image "4988c227fd72"
error when I want to pull an image I built on my local docker. I'm passing the image ID in the environment of graphql call "metadata": {"image": "4988c227fd72"}
. Any idea why my local kuberenetes agent can't pull a local image?Daniel Rothenberg
12/03/2020, 3:41 PMdask-gateway
to spin up distributed workflows. Major kudos to the team!
I have a question though about scaling out this workflow. In my particular application, I basically need to apply this training/prediction workflow for O(10,000) different "assets". Put another way - my workflows each have a Parameter
called "asset". I see two different ways of scaling my toy with one asset to the much larger set of a few thousand:
1. Use the .map()
capabilities on my tasks and simply "map" a list of "assets" for the flow to run through; or
2. Convert my "asset" Parameter
to be a list of all the assets, and do all the mapping inside each of my tasks in the workflow - by leaning on the fact that each task already has access to the dask cluster I'm running the workflow on
I'm not sure which is the "recommended" approach here. The training/modeling flow generates, for each asset, artifacts on the order of 1MB or so that have to get passed around, but I'm worried about the performance of Prefect's scheduler if it needs to map to tasks over a list of several thousand items. On the other hand, things get... complicated (with my naive workflow for saving trained models and whatnot)... if I have to manually do all the book-keeping inside each task in the flow.
Thoughts? Does anyone have experience scaling out ML training/prediction workflows on Prefect that can offer some insight?Ben Fogelson
12/03/2020, 3:56 PMChris Jordan
12/03/2020, 3:58 PM@task(name="get next batch of records")
def get_batch(result=PrefectResult()):
...
return len(records)
with Flow("import_flow") as flow:
num_of_records = get_batch()
if num_of_records.read() > 0: # this particular syntax doesn't do it, and I'm asking for how to read this here (if I should be)
kickoff_task = StartFlowRun(project_name="imports", flow_name="import_flow") # StartFlowRun also doesn't seem to spawn a new task - is this the right way to call this?
Brian Mesick
12/03/2020, 5:05 PMUsage: prefect execute [OPTIONS] COMMAND [ARGS]...
Try 'prefect execute -h' for help.
Error: No such command 'flow-run'.
Looking through the history of the Slack it looks like this was a known issue with 0.13.0 agents, but I’m not seeing any other possible explanations. Any hints?Julio Venegas
12/03/2020, 5:35 PMENDRUN(Finished(message="all rows from file read"))
when it has finished reading all the lines in a big file (I’m not using the LOOP signal, the Flow is an exercise to properly use Result
objects).
Should I create a state handler and trigger a final reference task or is there something like ENDRUN
but at the Flow level? Any other alternatives?Richard Hughes
12/03/2020, 7:35 PMAlex Joseph
12/03/2020, 8:06 PMtarget
option, but I don't seem to be seeing any option to add the function arguments to template the output.
For example, I'd like to have:
@task(target="{single_pattern}_{config['a']}.txt", checkpoint=True, result=LocalResult(dir="cache/"), )
def run_spark_job(single_pattern, config):
...
I can template parameters, date etc, but I don't seem to have way to template the arguments themselves. What's the correct way to handle this situation? Thanks 🙂Lee Mendelowitz
12/03/2020, 10:33 PMJD Margulici
12/04/2020, 1:14 AMAndrew Hannigan
12/04/2020, 1:44 AMSeverin Ryberg [sevberg]
12/04/2020, 9:23 AMNeeraj Sharma
12/04/2020, 11:59 AMDavid Kuda
12/04/2020, 4:03 PMprefect server start --postgres-port=5435 --ui-port=8081
-> that works.
And now I want to change it so the config.toml
handles it. So in my home directory/.prefect (~/.prefect) I have the file `config.toml`with following content:
toml
[server]
[database]
host = "localhost"
port = 5435
host_port = 5435
url = "<https://localhost:5435>"
connection_url = "<postgresql://prefect:test-password@localhost:5435/prefect_server>"
name = "prefect_server"
username = "prefect"
password= "test-password"
volume_path = "/Users/david/.prefect/pg_data"
I have tried many things with this config.toml file … but nothing has worked. When I try prefect server start
, I receive an error: ERROR: for t_postgres_1 Cannot start service postgres: Ports are not available: listen tcp 0.0.0.0:5432: bind: address already in use
. What’s going wrong?
When I write something wrong, say I use a colon instead of an equal sign, prefect server start
returns yet another error, which is good.
I have examined the possible config options with `prefect config`; I did not see any other setting for database / postgres.
I am following the docs very precisely, I even did the YouTube tutorials which were hosted by Laura, which were all great.
Best Regards from Berlin!Hui Zheng
12/04/2020, 6:38 PMflow.storage = Docker()
for build our flow deployment.
2. we then use flow.storage.build(push=False)
to build the docker container locally and test it run locally.
3. lastly, we use flow.register()
to deploy the flow to prefect-cloud
All 3 steps works fine on my local machine. I could deploy the flow to prefect-cloud successfully. My colleagues wants to do the same on their local machine, however, when two my colleagues did it, they ran into a healthcheck issue at step 3
(See issue detail in the thread).
we are using the same code base and using the same build-context environment. We have setup another docker container in which we build and deploy prefect flow, so that it ensures that we all have the same run-time and libraries when doing build-and-deploy. This is a high priority issue for us, because currently only I could do the flow deployment. It put high risk on our production downtime if my colleagues could not path and deploy the prefect flow when there is emergency incident situation.
cc: @jars @Julie Sturgeon
prefect version:
prefecthq/prefect:0.13.15-python3.8
DJ Erraballi
12/04/2020, 6:54 PMDJ Erraballi
12/04/2020, 6:56 PMDJ Erraballi
12/04/2020, 6:56 PMDJ Erraballi
12/04/2020, 6:59 PMNo heartbeat detected from the remote task; marking the run as failed.
Daniel Nussbaum
12/04/2020, 7:18 PMAlex Joseph
12/04/2020, 7:41 PMS3List
with prefix
and delimiter
(trying to list "folders" in an S3prefix, say) I get an empty array:
S3List(config['input_bucket_name']).run(prefix=test, delimiter="/", max_items=10)
If I run it manually using boto, I'm able to get results:
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
result = paginator.paginate(Bucket=config['input_bucket_name'], Delimiter='/', Prefix=test)
for prefix in result.search('CommonPrefixes'):
print(prefix.get('Prefix'))
Is this expected behavior?Alex Joseph
12/04/2020, 7:41 PMS3List
with prefix
and delimiter
(trying to list "folders" in an S3prefix, say) I get an empty array:
S3List(config['input_bucket_name']).run(prefix=test, delimiter="/", max_items=10)
If I run it manually using boto, I'm able to get results:
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
result = paginator.paginate(Bucket=config['input_bucket_name'], Delimiter='/', Prefix=test)
for prefix in result.search('CommonPrefixes'):
print(prefix.get('Prefix'))
Is this expected behavior?CommonPrefixes
rather than the Contents
(or get both)nicholas
12/04/2020, 9:40 PMS3List
task for that? 🙂Alex Joseph
12/05/2020, 5:45 PM