ey !!! what do you think about that? it’s a crazy ...
# ask-community
a
ey !!! what do you think about that? it’s a crazy idea? imposible? … I am a bit lost
I don’t have the ability to understand the correct way to … separate the different tasks (that could be really large) in different python files usen only github as store because the caso of use try to show if we can use github actions to do everithing with prefect and after with dask …
a
I think you’re a bit too fast 🙂 it would be much easier if you approach it step by step. 1. First, you can focus on building your basic functionality in a flow, 2. Then learn how to register it and use it with your chosen storage and agent. 3. Then maybe look at CI/CD with Github actions (we have just published this post showing an example CI/CD with CircleCI). Once all that is done you can look at how to streamline the process by modularizing some components. I think this would make troubleshooting and everything else much easier. Once that’s done, perhaps move on to the next step and adding Dask to parallelize some work.
e.g. your flow on the left has some Python code which is not packaged into a task
the create_flow_run and wait_for_flow_run are normal Prefect tasks, so you should move them out of the @task and put them into the “with Flow” constructor. Here is an example from this post
a
Thanks so much !!! I already test usen just one file one flow with more than one task and its working well to now the idea is make a flow like a watcher of sftp server to get the files, categorized (because we have 3 different use cases) commit them to gihub to start actions … each task to process that 3 different files are really large by number of lines, so we need only exec the task that process one of them after that process will have to match a field (street address) to another task that will ask to an API the GPS coordinates to add that info to the DS finally that DS will be process with eland to store on a elasticsearch cluster I am really convinced use prefect and dask will be the perfect fit to our use case …. I need to learn more and to understand because, a part of the support I received by you and this channel … I can’t understand the documentation and I fell maybe will be better, Idk Sorry to this and thanks so much for your help I appreciated
sorry about my english … I make a lot of mistakes … my dictionary change some words …
always ai have to read a lot when you help me heheheh 🙂 thanks so much
a
I understand. You should definitely start by building your entire use case in Python and once that’s working you can think about how you can break it into Prefect tasks and organize into a flow
a
hummm I have already the code in python
I am trying to make the second step, build a flow with tasks in diferentes python files
maybe I am crazy, maybe not I am crazy 🙂
😂 1
just one more questions… have sense what I am trying ? I mean … its correct from prefect point of view?
a
if you already have code for that, then all you need to do is to package it into e.g. functions and call those functions within your tasks and build a DAG. You can import a function in a Prefect flow as with any other Python script, e.g.:
Copy code
from prefect import task, Flow
from yourcustommodule import extract_logic, transform_logic, load_logic


@task
def get_list_of_files_from_sftp():
    extract_logic()


@task
def transform_and_load(file):
    transform_logic(file)
    load_logic(file)


with Flow("etl") as flow:
    list_of_files = get_list_of_files_from_sftp()
    transform_and_load.map(list_of_files)
Does it help? LMK if you have any specific question I can help with
🙌 1
a
Yes !!! of course !!! you are helping me a lot and I appreciated sorry if I am so clumsy
🙌 1
but let me share a perspective… If I use github actions to register flow on prefect cloud, when I run this flow … if the flow have that import… that will be lose when run in the agent … If I du the run in the same container from github actions work well …
a
and once you want to run it in parallel, you can attach a local dask executor:
Copy code
from prefect.executors import LocalDaskExecutor

with Flow("etl", executor=LocalDaskExecutor()) as flow:
    list_of_files = get_list_of_files_from_sftp()
    transform_and_load.map(list_of_files)
👀 1
Github action would only register the flow, your agent would run it
a
With LocalDaskExecutor the agent will get all python files ?
a
yes
🙌 1
a
wow !!! I dont know that before 🙂
so … thats will be correct ?
first there are custom_module with the function to select the correcto use case and after that run another flow (previously register) that need the key “type” to do anything and return something allocated in a var called data_processed_wait_task ?
I have 5 little vm with prefect agent (local) it is compatible with “LocalDaskExecutor” ?
this agents are connect to the cloud
oooo, the flow of flows working well !!! but the custom_modules does’t work …
What am i doing wrong?
k
I was just going through earlier messages. I think I answered this in the thread below. But yes you can use the LocalDaskExecutor with local agents. I assume they are different machines. I don’t think there is a use case where it makes sense to run multiple LocalAgents on the same machine
upvote 1