vinoth paari
02/09/2022, 7:50 AMMuddassir Shaikh
02/09/2022, 8:59 AMMichail Melonas
02/09/2022, 10:33 AMKubernetesRun
run configuration, I want to specify a custom Kubernetes Job spec in order to update the envFrom:
value. Is there an appropriate template spec that I can use as a starting point?Alexander Melkoff
02/09/2022, 10:57 AMPeter Peter
02/09/2022, 2:28 PMAndreas
02/09/2022, 2:53 PMAlex Furrier
02/09/2022, 4:11 PMNiels Prins
02/09/2022, 4:25 PMflow.run()
it works fine. When I register the same flow to a local server and run it with a local prefect agent I get the following error.
File "/home/prinsn/Code/prefect/repo/gima-prefect-cicd/.env/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
return Pickler.dump(self, obj)
TypeError: 'NoneType' object is not callable
I' m using:
ge api V3
prefect==0.15.13
great_expectations==0.14.5
Any pointers? =DSean Talia
02/09/2022, 4:33 PMDockerRun
flows will or won't run on that beefed-up EC2 instance
2. Using ECS on EC2, and then simply manage the execution of the flows on the EC2 instance by configuring an ECSRun
flow to use an ECS task that executes on EC2
The former option seems a little more straight-forward, and we could get it up and running pretty quickly, but it would involve overhead of managing labels, new API keys for the agents, etc. The latter is probably more flexible in the end, but there's more up front work for us since ECS on EC2 is not a workflow that we currently support. Has anyone ever deliberated over this issue or experimented with it?Chris Reuter
02/09/2022, 7:55 PMChris White
Tony Yun
02/09/2022, 8:21 PME Li
02/09/2022, 8:33 PMFuETL
02/09/2022, 8:42 PM Failed to load and execute Flow's environment: StorageError('An error occurred while unpickling the flow:\n ModuleNotFoundError("No module named \'testing_flow\'")\nThis may be due to one of the following version mismatches between the flow build and execution environments:\n - cloudpickle: (flow built with \'2.0.0\', currently running with \'1.6.0\')\n - python: (flow built with \'3.9.7\', currently running with \'3.7.10\')\nThis also may be due to a missing Python module in your current environment. Please ensure you have all required flow dependencies installed.')
Farid
02/09/2022, 8:58 PMGITLAB
as storage and Kubernetes
as agent to run prefect flows. I noticed I get dependency errors when a custom pip package is used inside the flow, in this case soda:
Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'sodasql'")
Is there a way to address these dependency issues without using Docker images to store the flows?Kevin Mullins
02/09/2022, 9:53 PMcreate_flow_run
, wait_for_flow_run
, and map
to fan-out via sub-flows. Is there any way to get more friendly task names using these tasks so I can more easily identify what child task is still running? Right now I have to go into the parent flow, look under logs for the URL for the created child flow run and then navigate separately.
Ideally (maybe in Orion some day) it would be nice if the lineage between parent and child flows could be visualized; however, I realize this may be difficult thing to do. I’m just trying to find ways to make it as user friendly for an engineer to go in and understand which sub-flows failed or are running/etc.Joseph Mathes
02/10/2022, 1:12 AMJoseph Mathes
02/10/2022, 1:12 AMMarcos Lopes Britto
02/10/2022, 2:20 PMThomas Hoeck
02/10/2022, 3:34 PMChris Arderne
02/10/2022, 4:02 PMDaskExecutor
!
If I run the code below without the DaskExecutor
, it gets to 40/50GB before being Killed because of OOM. However, if I run it as-is (i.e., on DaskExecutor
), it fails at 10GB, tries three times, and then gives a distributed.scheduler.KilledWorker
. Does Prefect do anything to the Dask worker memory limits? I'm facing this issue running with KubernetesRun and a DaskExecutor with a KubeCluster backend, where the worker spec has 50GB of memory for both k8s and the dask worker limit. It still fails around the 10GB mark. (In practise this is while loading PyTorch models and predicting.)
import numpy as np
import prefect
from prefect.executors import DaskExecutor
from prefect import Flow, task
@task
def test_memory_limits():
logger = prefect.context.get("logger")
for size in [1, 2, 5, 10, 20, 30, 40, 50]:
logger.warning(f"Creating array with {size=} GB")
a = np.ones((size * 1000, 1000, 1000), dtype=np.uint8)
logger.warning(f"Created with size {a.nbytes/1e9=} GB")
del a
logger.warning("Deleted")
with Flow(
"test-memory",
executor=DaskExecutor(),
) as flow:
_ = test_memory_limits()
Andrew Lawlor
02/10/2022, 4:25 PMChristopher
02/10/2022, 5:58 PMECSAgent
running (in ECS, using the base prefect image with prefect agent ecs start
), but the tasks it spins up are in the wrong subnet so I want to customise the task definition. It looks like I can pass in a task definition path but that's a bit troublesome because now I need to store file somewhere accessible to the agent. Is there a way to pass a task definition ARN instead? It looks like I can pass that to ECSRun
but the subnet ID is generated by Terraform so I can't figure out how to get it into the Python...Christopher
02/10/2022, 6:10 PMJames Sutton
02/10/2022, 8:46 PMDaniel Nilsen
02/11/2022, 7:40 AMmutation MyMutation($input: any) {
create_flow_run(
input: $input
) {
id
}
}
Faisal k k
02/11/2022, 8:07 AMMichael Hadorn
02/11/2022, 8:21 AMWilliam Edwards
02/11/2022, 12:45 PMcreate_flow_run
in the client, or should the client be doing anything else?Robert Kowalski
02/11/2022, 2:28 PMdocker run --mac-address="70:ca:9b:ce:67:ae" IMAGE
So I want set constant mac address when define docker storage as:
storage = Docker(
env_vars={"PYTHONPATH": "$PYTHONPATH:/pipeline"},
files={f'{parent_dir}': '/pipeline'},
image_tag=os.environ.get('IMAGE_TAG'),
image_name=flow_name,
stored_as_script=True,
path='/pipeline/flow.py',
extra_dockerfile_commands=[]
)
Someone has tried to achieve something similar ?