I currently would like to add prefect code to my current setup: shell script that does a spark-submit on a python file. I can use shell-task in order to call the shell script and have this registered as a flow. But how would I go if I want also to monitor or assign tasks to my code inside the python code. Basically registering also the python code as a flow. In the end have two registered flows: Flow A for the shell script and Flow B for the python script.
01/07/2021, 2:56 PM
Hi @Javier Velez!
This sounds pretty straightforward. Let us know if you have any questions or run into any issues!
01/07/2021, 7:31 PM
hi @Dylan, how would I go setting this up?
So here are the steps I believe will have to follow:
1. Create prefect_shell_script.py which will call the .sh file (that does the spark-submit command) via ShellTasks or possibly just call all the commands the .sh file does but using ShellTasks (basically omitting the .sh and going straight to the commands). Either way of submitting the spark-submit request will use a python file named spark_job.py
2. At the end of the prefect_shell_script.py register all the tasks in thy py file to be a flow
3. In the code of spark_job.py it should contain prefect code (task and code to register the flow as well)
So I know call the flow registered to run the prefect_shell_script.py code (lets call it flow A) but how will the flow that was registered for the spark_job.py file (flow B) be fired?
Flow A does the spark-submit but it calls a python code, how would the Flow B be triggered it has not been specifically requested? I will assume in the UI it will only Flow A appear as running but Flow will not.