Paul
06/08/2020, 9:19 PMSTOPPED (CannotPullContainerError: Error response from daem)
To my understanding the launched Fargate Instance does not have any access to the Docker Image of the Flow. Concerning the solution from
https://docs.prefect.io/orchestration/execution/storage_options.html#docker
# Non-Docker Storage for Containerized Environments
for rapid deployment, which image would the
metadata={"image": "repo/name:tag"}
refer to in the example given?Darragh
06/08/2020, 9:27 PMprefecthq/prefect:python3.7
and installs a bunch of other junk in there too. The Flow builds and registers, but when I kick it off I get the following:
Failed to set task state with error: ConnectionError(MaxRetryError("HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql/alpha
Searching in this channel I see an issue that was identified around this and fixed in April, so I’m not sure if I should still be seeing it, or if there’s config that needs to be passed to the Docker agent to run it? It’s running on Docker agent locally on Macbook if that’s any help.. Prefect version installed locally is 0.11.1, [upgrading now to test] and python in local and the docker image is 3.7
UPDATE: Never mind, upgrade to 0.11.5 did it!Dan DiPasquo
06/08/2020, 10:24 PMSanjay Patel
06/09/2020, 1:30 AMBen Davison
06/09/2020, 11:44 AM- name: PREFECT__LOGGING__FORMAT
value: '{"level": "%(levelname)s", "message": "%(message)s"}'
And I can see the pod has the environment variable set.
kubectl exec -it prefect-scheduler-6b96994c6-qtqh5 --namespace=data -- /bin/sh -c 'echo "password: $PREFECT__LOGGING__FORMAT"' <aws:default>
password: {"level": "%(levelname)s", "message": "%(message)s"}
But the logs are still in the default format:
kubectl --namespace data logs prefect-scheduler-6b96994c6-qtqh5 -f <aws:default>
[2020-06-09 11:36:54,302] INFO - prefect-server.Scheduler | Scheduler will start after an initial delay of 275 seconds...
Does anyone have any idea? Or even better, I'm trying to get logs in datadog to be parsed correctly.Preston Marshall
06/09/2020, 1:56 PMHoward Cornwell
06/09/2020, 3:15 PMPrefectResult
? I’ve hit a wall and flows are failing with:
Failed to set task state with error: HTTPError('413 Client Error: Payload Too Large for url: http://???:4200/graphql/graphql/alpha')
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/prefect/engine/cloud/task_runner.py", line 119, in call_runner_target_handlers
state = self.client.set_task_run_state(
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1096, in set_task_run_state
result = self.graphql(
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 213, in graphql
result = <http://self.post|self.post>(
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 172, in post
response = self._request(
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 318, in _request
response.raise_for_status()
File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 413 Client Error: Payload Too Large for url: http://???:4200/graphql/graphql/alpha
Hassan Javeed
06/09/2020, 4:45 PM04:15:18 UTC
INFO
prefect-cloud.Lazarus.FlowRun
Rescheduled by a Lazarus process. This is attempt 1.
04:15:45 UTC
ERROR
agent
HTTPSConnectionPool(host='172.20.0.1', port=443): Read timed out. (read timeout=None)
Darragh
06/09/2020, 5:36 PMKevin Weiler
06/09/2020, 5:38 PMwith Flow("toy_flow") as flow:
a = Parameter("a")
b = Parameter("b")
c = Parameter("c")
job1_task = ShellTask(name="job1", command=f"""echo {a} {b} {c}""")
Dan DiPasquo
06/09/2020, 6:44 PMTask 'run_...': 1 candidate cached states were found
11:19:32 PDT
INFO
GCSTmpFileHashResultHandler
Starting to download result .....
INFO
GCSTmpFileHashResultHandler
Finished downloading result ....
DEBUG
CloudTaskRunner
Task 'run_...': Handling state change from Pending to Cached
Log from same task with subsequent flow runs neither shows the file download nor logs that no valid cache results were used:
Task 'run_...': Starting task run...
11:23:43 PDT
DEBUG
CloudTaskRunner
Task 'run_...': 3 candidate cached states were found
11:23:43 PDT
DEBUG
CloudTaskRunner
Task 'run_...': Handling state change from Pending to Cached
11:23:43 PDT
DEBUG
CloudTaskRunner
Task 'run_...': can't set state to Running because it isn't Pending; ending run.
11:23:43 PDT
INFO
CloudTaskRunner
Task 'run_...': finished task run for task with final state: 'Cached'
Suggestions for tracing this further would be appreciated -Barry Roszak
06/09/2020, 7:51 PMout_a = task_a(input)
out_b = task_b.map(out_a)
out_c = task_c.map(out_b)
Now the flow is waiting for task_b to end and then starts with task_c. Is it possible to change that behavior and create a Flow where one element of out_a is flowing through the full pipe before the next element is taken by the worker?Christian
06/09/2020, 8:53 PMasm
06/09/2020, 11:01 PMMatthias
06/10/2020, 8:01 AMdistributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 3.43 GB -- Worker memory limit: 4.18 GB
All the runs complete successfully and I have the feeling, but still the memory used does not get released. I am not really sure where to start debugging. Is there a way to force a memory release? The only other option I currently see is to force a dask-worker restart after the flow run finishes, but that feels very hacky.Matias Godoy
06/10/2020, 10:14 AMpip install prefect --upgrade
3. Run prefect server start
Is this correct? Would that be it or am I missing some steps?Thomas Hoeck
06/10/2020, 12:43 PMZach
06/10/2020, 3:48 PMBrett Naul
06/10/2020, 4:58 PMgoodsonr
06/10/2020, 8:22 PM@task
def a ():
< do something that might succeed or fail >
@task (trigger = always_run)
def b():
< if task a status==FAIL .. do something >
< if task b status=SUCCESS .. skip >
with Flow as flow:
res1 = a()
b(res1)
flow.run()
This is part of a larger flow with other tasks ahead and after a&b. I tried trigger=any_failed
on task b, but that causes task b to fail if task a succeeds (due to trigger not satisfied) .. which is not what I want. I want task b to always show success. Again .. sorry if this is obvious and its just a newbie-thing. Feel free to just point me to the right place in the doc. Thanks in advanceDarragh
06/10/2020, 9:03 PM"secrets":[{"name": "CREDS", "valueFrom": "arn:aws:secretsmanager:eu-west-1:11111111:secret:local/aws/credentials-abcd"}]
In Flow, I read like this:
creds = prefect.context.secrets.CREDS
But I keep getting the following:
AttributeError: 'dict' object has no attribute 'CREDS'
Confused face.Josh Lowe
06/11/2020, 2:12 AMFailed to load and execute Flow's environment: TypeError("object NoneType can't be used in 'await' expression")
I'm able to run the flow over a local cluster with two workers just fine, it's only when I register a flow using the Dask Cloud Provider environment that I have issues 🤔Sven Teresniak
06/11/2020, 7:03 AMSven Teresniak
06/11/2020, 10:08 AMSandeep Aggarwal
06/11/2020, 12:31 PMTraceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 668, in complete_value_catching_error
return_type, field_nodes, info, path, result
File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 733, in complete_value
raise result
File "/prefect-server/src/prefect_server/graphql/states.py", line 73, in set_state
task_run_id=state_input["task_run_id"], state=state,
File "/prefect-server/src/prefect_server/api/states.py", line 91, in set_task_run_state
f"State update failed for task run ID {task_run_id}: provided "
graphql.error.graphql_error.GraphQLError: State update failed for task run ID 63293e14-b1d4-4d2e-ae21-e9aeb8edfade: provided a running state but associated flow run 73a41de3-adc3-4a48-9b57-9b7bdb6094f7 is not in a running state.
So my workflow involves running some commands inside docker containers. The workflow itself aren't huge but the docker execution can take several seconds (should be under 1min though). I am currently running with couple of dask workers with limited memory i.e. 500MB.
Workflow works fine for small no. of requests but as I start hitting multiple requests, workers starts dying and I see this error in logs prefect server logs.
Although this is just a testing system and actual prod environment will have higher memory limits but still would like to know if this error is expected and if there is any way to avoid/handle this?jorwoods
06/11/2020, 1:15 PM0.11.5+134.g5e4898dde
I am running on Win 10 and have verified I have the environment variable PREFECT__FLOWS__CHECKPOINTING=true
from prefect import Flow, task, unmapped, Parameter
from prefect.engine.results import LocalResult
from prefect.engine.executors import LocalDaskExecutor
from prefect.engine.cache_validators import all_parameters
lr = LocalResult(location='{flow_name}-{task_name}-{x}-{y}.pkl',
validators=all_parameters
)
@task(log_stdout=True, checkpoint=True)
def add(x, y):
print(f'add ran with {x} {y}')
try:
return sum(x) + y
except TypeError:
return x + y
with Flow('iterated map', result=lr) as flow:
y = unmapped(Parameter('y', default=7))
x = Parameter('x', default=[1,2,3])
mapped_result = add.map(x, y=y)
out = add(mapped_result, y)
flow.run(executor=LocalDaskExecutor())
Ben Davison
06/11/2020, 1:15 PMflow.run()
do you need to have the prefect server up? As my test just seems to hang once it hits that part.John Ramirez
06/11/2020, 2:02 PMJon Page
06/11/2020, 4:55 PMboto3.Session().get_credentials().access_key
vs.
PrefectSecret("AWS_CREDENTIALS")["ACCESS_KEY"]
Pretty sure I followed these instructions: https://docs.prefect.io/core/concepts/secrets.html#default-secrets.
Both values are keys, but the one in the boto3 session is not one that I recognize.Darragh
06/11/2020, 5:34 PMDarragh
06/11/2020, 5:34 PMDylan
06/11/2020, 5:38 PMCancelled
state follows a “best attempt” design pattern. It’s extremely difficult to kill arbitrary python processes, especially ones that are running on shared Dask Clusters or the like. Here’s the PR where we added the “cancellation lite” functionality to Prefect Server (which I believe you’re using): https://github.com/PrefectHQ/prefect/pull/2535Cancelled
state may not stop the run immediately, but it should stop any new Task Runs from starting. Once Running
Task Runs enter a Finished state, the flow run should stopDarragh
06/11/2020, 5:40 PMDylan
06/11/2020, 5:41 PMCancelled
state
2. Interact with Fargate to delete the run infrastructureMarvin
06/11/2020, 5:46 PMDarragh
06/11/2020, 5:50 PMDylan
06/11/2020, 5:52 PMRunning
state to resolve on their ownDarragh
06/11/2020, 5:54 PMDylan
06/11/2020, 5:57 PMRunning
Task Run to stop, you’ll need to kill its execution infrastructure manuallyDarragh
06/11/2020, 5:58 PMDylan
06/11/2020, 5:58 PMRunning
Task Runs enter finished states, then setting the Flow Run to Cancelled
will do the trickPedro Machado
06/11/2020, 8:57 PMDylan
06/11/2020, 8:57 PMPedro Machado
06/11/2020, 9:00 PMDylan
06/11/2020, 9:01 PMPedro Machado
06/11/2020, 9:02 PMCancelled
or is there a way to send a signal to a task that is already running to tell it to stop? This would not work if the task is hung but if it's running as expected, it could decide that it needs to stop because the external signal was sent.Dylan
06/12/2020, 3:12 PMDarragh
06/12/2020, 7:28 PMDylan
06/12/2020, 7:28 PM