https://prefect.io logo
Title
s

Santiago Gonzalez

11/03/2021, 9:46 PM
Hey. I have an issue. After click on manual step, the flow has been stuck for 40 minutes. Is there anything we can do to wake the process?
a

Anna Geller

11/03/2021, 9:48 PM
yes, there’s definitely something we can do about it 🙂 it’s stuck in which state?
can you perhaps show the screenshot of the UI, logs, or give more information otherwise?
s

Santiago Gonzalez

11/03/2021, 9:50 PM
It is in state
resume
a

Anna Geller

11/03/2021, 9:58 PM
did someone configure manual approval for this flow? Can you share a bit larger screenshot or tell a bit more about your use case? otherwise it’s hard to tell why it would get stuck in a Resume state
s

Santiago Gonzalez

11/03/2021, 10:06 PM
Ok. So, the flow executed a set of commands in EC2 Instance, then it came paused waiting for the manual approval, which has been waiting for an hour. Then It received the confirmation, and until now, it is stuck
a

Anna Geller

11/03/2021, 10:11 PM
oh, I see - so the approval did work, but the flow is now stuck in a Resume state? It could be that some of your tasks don’t have the input data required to resume the task. Did you configure any Result class in your flow? Can you perhaps show your flow definition?
s

Santiago Gonzalez

11/03/2021, 10:13 PM
I don’t think so. I executed the same flow yesterday, and it didn’t get stuck.
I configured the result storage in tasks which output is going to be used after the resume
a

Anna Geller

11/03/2021, 10:16 PM
Can you share a minimal flow example that we could reproduce? hard to tell otherwise at which step something may be wrong
s

Santiago Gonzalez

11/03/2021, 10:20 PM
batch_matching_full_execution_commands = create_batch_matching_ec2_commands(export_version, next_ekata_dv, False)
    batch_matching_full_execution_commands.name = 'Create batch matching ec2 commands for full execution'
    batch_matching_full_execution_commands.checkpoint = True
    batch_matching_full_execution_commands.result = S3Result(bucket=S3_BUCKET, location=S3_LOCATION_PATTERN)

    run_batch_matching_export_notebook = notebook_run(databricks_conn_secret=conn,
                                                      json=notebook_submit_config)
    run_batch_matching_export_notebook.set_upstream(notebook_created)

    instance_id = create_ec2_instance(volume_size=2000)
    instance_id.set_upstream(run_batch_matching_export_notebook)
    instance_id.skip_on_upstream_skip = False
    instance_id.checkpoint = True
    instance_id.result = S3Result(bucket=S3_BUCKET, location=S3_LOCATION_PATTERN)

    # Execution for 5 minutes, so Ekata can scale up theirs services.
    batch_matching_warm_up_execution = execute_job_in_ec2_instance(instance_id=instance_id,
                                                                   commands=batch_matching_warm_up_commands,
                                                                   s3_dir_prefix='batch-matching-warm-up')
    batch_matching_warm_up_execution.name = 'Warm up ekata services in ec2 instance for 5 minutes'
    batch_matching_warm_up_execution.set_upstream(run_batch_matching_export_notebook)
    batch_matching_warm_up_execution.skip_on_upstream_skip = False

    ekata_verified_their_services = has_ekata_verified_their_services() # Manual Step
    ekata_verified_their_services.set_upstream(batch_matching_warm_up_execution)

    # Full Execution
    batch_matching_full_execution = execute_job_in_ec2_instance(instance_id=instance_id,
                                                                commands=batch_matching_full_execution_commands,
                                                                s3_dir_prefix='batch-matching',
                                                                execution_timeout=60 * 60 * 24 * 7)
    batch_matching_full_execution.name = 'Batch Matching Full execution in ec2 instance'
    batch_matching_full_execution.set_dependencies(upstream_tasks=[batch_matching_warm_up_execution,
                                                                   ekata_verified_their_services])
As you can see, the result storage is configured in some task
z

Zanie

11/03/2021, 10:43 PM
Do you have an agent running? What kind of run config are you using?
👀 1
s

Santiago Gonzalez

11/03/2021, 11:19 PM
I am ending the day now. I will answer you tomorrow. Thanks for the reply
Yes, we do. We have a docker agent, running in a EC2 Instance
a

Anna Geller

11/04/2021, 3:35 PM
@Santiago Gonzalez can you share your run configuration or your entire flow definition? alternatively, could you build a minimal example that we could use to reproduce the issue?
z

Zanie

11/04/2021, 3:41 PM
Is the flow run container still running?
:upvote: 1
s

Santiago Gonzalez

11/04/2021, 4:13 PM
No, we cancelled it yesterday
a

Anna Geller

11/04/2021, 4:14 PM
thx for update, let us know if you need any help with that in the future