prefect-community #prefect-community

Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and 
keep data on workers

    future = client.submit(func, big_data)    # bad

    big_future = client.scatter(big_data)     # good
    future = client.submit(func, big_future)  # good

I was wondering, what would be the prefect pattern here to scatter the object ahead of time?

Jacob Blanco

01/19/2022, 1:52 AM

Is it possible to have more than 1 license admin per license?

Akharin Sukcharoen

01/19/2022, 7:46 AM

I try to upgrade prefect but i faced this error. Pulling postgres ... done Pulling hasura ... download complete Pulling graphql ... download complete Pulling towel ... download complete Pulling apollo ... download complete Pulling ui ... download complete ERROR: for graphql failed to register layer: symlink ../753a933895769d16a9f50e547a17525d552c3f71551ee123b7fc73ca9fefbc69/diff /var/lib/docker/overlay2/l/B5VLSRNEDH3NABDXEFXCIRNLR2: no such file or directory ERROR: for towel failed to register layer: symlink ../753a933895769d16a9f50e547a17525d552c3f71551ee123b7fc73ca9fefbc69/diff /var/lib/docker/overlay2/l/B5VLSRNEDH3NABDXEFXCIRNLR2: no such file or directory ERROR: for hasura failed to register layer: symlink ../9e83ca61c81268bc9fb45be76dec1d0897716ce0633ce79aabbb49c98c9802bc/diff /var/lib/docker/overlay2/l/DQRRSWXFNYBRCBO2G4WTIKOQSO: no such file or directory ERROR: for apollo failed to register layer: symlink ../0027aacacd89f06a1944e4198231f069a200aada88a290932f113a0cb9ab04b1/diff /var/lib/docker/overlay2/l/DAHDL3BQWHWRBVM7IV5DDI36H3: no such file or directory ERROR: for ui failed to register layer: symlink ../ebf2b0dd143761035387a4197bd103c4238b4d3fe664cf622ea2b3126b2eaba7/diff /var/lib/docker/overlay2/l/AHLFWYQ3FLDKVRIL7PPFZM3XXZ: no such file or directory ERROR: failed to register layer: symlink ../ebf2b0dd143761035387a4197bd103c4238b4d3fe664cf622ea2b3126b2eaba7/diff /var/lib/docker/overlay2/l/AHLFWYQ3FLDKVRIL7PPFZM3XXZ: no such file or directory Exception caught; killing services (press ctrl-C to force) Removing network prefect-server WARNING: Network prefect-server not found. How can I fix it? Thank you.

Thomas Pedersen

01/19/2022, 8:14 AM

Hey, Has there been any significant changes in Git storage between docker containers prefecthq/prefect:0.15.7-python3.9 and prefecthq/prefect:latest-python3.9? My flow runs fine under 0.15.7 but fails under latest with "No git repository was found at https://@git.*************" Update: After several hours of debugging I found that the issue was introduced in dulwich version 0.20.29. Downgrading dulwich to 0.20.28 in the docker image fixes it.

Samay Kapadia

01/19/2022, 8:59 AM

Hi prefects. I’m trying to execute a RunNamespacedJob task in my kubernetes setup but I’m running into this error

Copy code

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "jobs.batch \"dummy\" is forbidden: User \"system:serviceaccount:default:default\" cannot get resource \"jobs/status\" in API group \"batch\" in the namespace \"default\"",
  "reason": "Forbidden",
  "details": {
    "name": "dummy",
    "group": "batch",
    "kind": "jobs"
  },
  "code": 403
}

For context, I’ve applied the yaml from

prefect agent kubernetes install --rbac

so all the permissions should work in theory. I'm stuck at what could be wrong

Muddassir Shaikh

01/19/2022, 9:23 AM

How to avoid this unwated task in logs and taskruns (in GUI as well):

Copy code

[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...
[2022-01-19 14:45:50+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...
[2022-01-19 14:45:50+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...

Muddassir Shaikh

01/19/2022, 9:25 AM

https://prefect-community.slack.com/archives/CL09KU1K7/p1642584221233500

Tony Waddle

01/19/2022, 10:14 AM

Hi, A question around best practices when setting up flows... In one scenario I have a flow which will typically run several steps. Some of those steps are used by other processes. To keep the Flows 'DRY' I have created three separate flows so I can call them in isolation. I have then created a parent flow which can run all three child flows. This works fine with wait_for_flow_run/set_upstream and allows the parent flow to give a view of the child flows and their progress/logs. I wonder if this is the best approach as despite it functioning well I lose observability. I've enabled stream_logs so the parent Flow sees all of the child logging, but the names aren't very intuitive and the cloud doesn't easily allow you to navigate to the child flows. See attached screenshot. You end up with a long list of 'create_flow_run'/'wait_for_flow_run' instead of the friendly task names. I can rename the create_flow_run however the new name only shows on the child flow instance, not on the parent. Because of this it feels I am going against the grain with Prefect. So am I barking up the wrong tree by keeping Flows DRY - should I instead create multiple flows and only worry about keeping the Tasks DRY? (they already are so this would be easy to change)

Yueh Han Huang

01/19/2022, 10:16 AM

Hey! What is the most lightest/easiest way to run Prefect agent remotely? I have a script that runs within 20 seconds, doesn’t required too much computation power or storage. And it’s a personal project that have low stake. My current option is renting servers on digital ocean ($5/mo), but I wonder if there’s better option for that (maybe heroku dyno? haven’t explore it yet)

👋 2

Muddassir Shaikh

01/19/2022, 12:01 PM

Hi i am trying to register a flow from agent . On the GUI the agent is connected but when i try to register the flow I get this error:

Copy code

File "/home/infra/prefect_server/lib/python3.8/site-packages/prefect/client/client.py", line 603, in _send_request
    response = <http://session.post|session.post>(
  File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3be9b9e700>: Failed to establish a new connection: [Errno 111] Connection refused'))

Example: My GUI server is hosted on Machine A and one agent is Machine B, the code present on Machine B is to be run and registered on Machine B but should show its task on the Machine A GUI.

Samay Kapadia

01/19/2022, 1:13 PM

[solved] When I create a flow that runs a namespaced job, prefect actually creates 2 jobs.

dummy

runs on the

aks-spot

node but

prefect-job

runs on the

aks-system

node (and I don’t want it running on the system node pool). Is there a way to configure tolerations and affinities for the

prefect-job

pod?

Thomas Opsomer

01/19/2022, 2:44 PM

Hello community ! Is it possible to limit/prevent some tasks to run at the same time ? I've read the doc on concurrency limits, but it doesn't seem to handle my use case. For instance if y a have 2 tasks A and B, I'd like to either prevent them to run at the same time or to limit the number of A and B that can run simultaneously.

Luis Aguirre

01/19/2022, 3:09 PM

Hi all, my team and I are developing a web-app to load and process dataframes, we're willing to use prefect orion instead of jupyter kernls. Is it possible to separate contexts/sessions using prefect? Example: User A creates a variable "var_a" and User B creates a variable "var_b". User A shouldn't be able to access to "var_b". Both variables should be accessible trough several flows from the same user until their session on our app is closed. Am I missing something?

Michail Melonas

01/19/2022, 3:10 PM

I’m curious as to whether it is possible to update the state of a running Flow according the outcome of a Task. That is, if at some point during the execution of a Task a condition is met, can I update the state of the Flow/Task to Failure?

Tomek Florek

01/19/2022, 3:41 PM

Hey community! I’m trying to set up Prefect with GreatExpectations, using the official task. I’m trying to run a validation on an in-memory Dataframe (result of previous task) but have trouble to set it up correctly. I’m trying to use v3 API of GE, set up the expectation suite, checkpoint according to the guide (until _context.run_checkpoint)_ but struggle to pass said dataframe. Would anyone be able to offer some guidance?

Tom Shaffner

01/19/2022, 4:23 PM

Is there a way just to execute a git pull command at the start of a run, assuming the local machine has authentication already set up? My company uses Azure Devops, and as discussed in https://github.com/PrefectHQ/prefect/issues/4850, I set up a storage to pull from this via a hard-coded URL. Unfortunately the personal access token (PAT) in that URL keeps expiring in only 30 days (think I'm hitting some corporate policy limit). The standard Azure approach is to use the Git authentication manager https://docs.microsoft.com/en-us/azure/devops/repos/git/set-up-credential-managers?view=azure-devops, not SSH. I have that set up on the machines already, and a git pull would use that more reliably. Possible?

🕉️ 1

Jake

01/19/2022, 4:42 PM

Hi everyone, I’m new to prefect and I ran into an issue today. I have the kubernetes agent running in a cluster, the logs show that it is registered and waiting for flow runs. However, this agent does not appear on Prefect Cloud. Any reason why this might be?

Muddassir Shaikh

01/19/2022, 5:35 PM

Hi here the tasks on Left branch are running fine but the tasks on the Right Branch are stuck in Mapped state. Both have the same functions called and mapped to run a task.

Suresh R

01/19/2022, 6:41 PM

Hi, I am running a flow in cloud which has two paralell task but only one task is running at a time, What might be the reason?

brian

01/19/2022, 7:03 PM

Hi prefectionists! I’m wanting to use a link to a flow run that I’m running in prefect cloud as metadata of sorts, so I can trace long-running queries back to the flow run easily. Are there env vars populated automatically (e.g. FLOW_RUN_ID, FLOW_ID, etc) that would allow me to construct this link, or more generally, is there any way for a task to automatically get info about it’s own identity?

brian

01/19/2022, 7:57 PM

Is anyone seeing issues with prefect cloud rn? I have a task run that says

Trigger was "all_successful" but some of the upstream tasks failed.

but all the upstream tasks were successful. Am I missing something?

Martim Lobao

01/19/2022, 11:33 PM

it is possible for a task to get its own task run id? say if I wanted each task to send a notification with its task run id for example, would this be possible? and do task run ids persist across retries of the same flow run? (if a flow fails at a task and i restart the flow, will that task have the same task run id as before?)

Tony Yun

01/19/2022, 11:51 PM

Hi, one of our production jobs just failed twice next to each other. No changes are made from our side. Also it’s the first time we see this error. Is it anything special wrong that you know? I have restarted the job the second time. Please see in the thread for log exceptions.

Varuna Bamunusinghe

01/20/2022, 6:06 AM

Is there a way to download a file from S3 and save to a location, and skip re-downloading the file if it's already saved.

Suresh R

01/20/2022, 7:01 AM

Hi, How i can store all task result of flow in specific S3 prefix?

Anurag Bajpai

01/20/2022, 9:15 AM

Hi, we're running into some issues using Bitbucket cloud storage with a branch ref (as opposed to a specific commit ref). It looks like the

client.get(f"repositories/{self.workspace}/{self.repo}/refs/branches")

call to get the list of branches actually returns a paginated list, and the method is not able to find the hash corresponding to the branch if the branch is not included in the first page. Additionally, the error raised in case the branch is not found is not formatted properly (it's a string instead of an f-string).

Stefan Rasmussen

01/20/2022, 9:37 AM

Orion License: In This commit the claim that "As Orion matures, most or all of its components will be released under the Apache 2.0 license." was removed from the Orion docs. Does this mean that Orion will not be released under Apache 2.0?

👀 1

Philipp Eisen

01/20/2022, 2:32 PM

Hey I’m running prefect with a kuberentes agent and temporary dask cluster I’m quite frequently getting this error:

No heartbeat detected from the remote task; marking the run as failed.

Is there some obvious things to look for?

Thomas Opsomer

01/20/2022, 2:48 PM

Hi, another Prefect + K8S question here 🙂 Like the previous post we're frequently seeing the message

No heartbeat detected...

. Usually It happens in 2 situations: • the pod that run the tasks gets evicted / OOM killed • the pod was running on a preemptible node that gets removed and replaced. Is there something on the k8s agent, k8s job specification, or something else to configure to allow k8s to reschedule the job and let prefect know about it, so that the flow would continue ?!

Florian Kühnlenz

01/20/2022, 4:40 PM

Hi, I have a question about DockerStorage I did not manage to figure out on my own. I would like to pass a variable to the docker build so that it is replaced inside the docker file. It seems the env_vars in the DockerStore are not the right place. Any hints on how to do this?