https://prefect.io logo
Docs
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
prefect-community
  • m

    Mark McDonald

    04/02/2020, 8:11 PM
    Hi - I'm using cloud and docker storage. I'm wrapping my head around "automatic" input caching specifically as it relates to task retries. According to the docs "Input caching is an automatic caching. Prefect will automatically apply it whenever necessary." This caching happens within our container and not on your machines, correct? Are there limits to the size of this cache (like if a task takes as an input some large dataframe generated by an upstream task)
    c
    7 replies · 2 participants
  • m

    Manuel Aristarán

    04/02/2020, 9:44 PM
    Probably a stupid question: how do I get the value of a
    Parameter
    . My use case is building a shell command:
    data_source_id = Parameter("data_source_id")
    # ...
    t = shell_task(command=f"some_script {data_source_id}")
    j
    j
    +1
    17 replies · 4 participants
  • s

    Scott Zelenka

    04/03/2020, 12:00 AM
    Are there plans to have an async executor? When mapping over a list of IO blocking tasks, it'd be nice to spawn them off in an event loop rather than threads or processes.
    c
    2 replies · 2 participants
  • k

    Kamil Okáč

    04/03/2020, 7:07 AM
    Hello again, all! I'm probably missing something, but when I'm trying to run simple mapping example through agent, it won't finish. This is the flow definition:
    from prefect import Flow, task
    
    @task
    def add_ten(x):
        return x + 10
    
    with Flow('simple map') as flow:
        mapped_result = add_ten.map([1, 2])
    
    flow.register()
    flow.run_agent()
    When running the flow from UI, the task "add_ten (Parent)" is stuck in state "Mapped" (with description "Preparing to submit 2 mapped tasks"). What's wrong?
    s
    j
    18 replies · 3 participants
  • d

    David Ojeda

    04/03/2020, 4:28 PM
    Hi ! I am very curious and eager to use the new open-source ui. I tried the example on the blog post and now I am wondering how our code base can adapt to use it. My main confusion today is where does a DaskScheduler and their related workers fit on this? We have a kubernetes cluster with one dask scheduler a horizontal pod scaler for the workers, and several cronjobs to trigger jobs. Last time I went deep into prefect, there were no agents (or at least I don’t remember them), so I guess that agents and flow configuration can replace cronjobs on my side, but I don’t see where workers fit in now.
    a
    d
    19 replies · 3 participants
  • b

    Benjamin Filippi

    04/03/2020, 4:32 PM
    Hi, I am playing with the local UI server. Is there away to specify to the agent the IP of the server to connect to? prefect backend server --help Usage: prefect backend [OPTIONS] API Switch Prefect API backend to either
    server
    or
    cloud
    Options: -h, --help Show this message and exit.
    z
    s
    3 replies · 3 participants
  • b

    Benjamin Filippi

    04/03/2020, 4:32 PM
    The prefect backend does seem to accept IP for a remote server
  • c

    Chris Hart

    04/03/2020, 4:45 PM
    I have started using the PostgresExecutor task and after reading the source and trying to understand how it works, I'm not clear on transactions and exactly how lots of separate tasks get rolled into them.. (https://github.com/PrefectHQ/prefect/blob/master/src/prefect/tasks/postgres/postgres.py#L174-L203)
    j
    10 replies · 2 participants
  • b

    Ben Fogelson

    04/03/2020, 6:34 PM
    Is there an easy way to run part of a flow? I.e. run up to and including task
    x
    ?
    l
    m
    +1
    6 replies · 4 participants
  • d

    David Hogarty

    04/03/2020, 6:38 PM
    hi, I'm considering using prefect both for a short term need and a longer term project
  • d

    David Hogarty

    04/03/2020, 6:39 PM
    I was attempting to solve some automation problems related to standing up a hypervisor, putting vms on it, and configuring those VMs
  • d

    David Hogarty

    04/03/2020, 6:39 PM
    I was using ansible for this, but increasingly became frustrated with it (it seemed to be getting in the way rather than helping)
  • d

    David Hogarty

    04/03/2020, 6:40 PM
    what do people who have used prefect think about using it for automation of these kinds of tasks?
  • d

    David Hogarty

    04/03/2020, 6:40 PM
    is there anything builtin for making shell execution and ssh execution more convenient, or would I just leverage the best I can find in terms of python libraries for that?
    d
    n
    3 replies · 3 participants
  • d

    David Hogarty

    04/03/2020, 9:05 PM
    I have a data structure question that I couldn't clearly resolve from your docs
  • d

    David Hogarty

    04/03/2020, 9:05 PM
    do you have an idiom for running flows within flows?
    i
    b
    6 replies · 3 participants
  • p

    Priya Rao

    04/03/2020, 10:05 PM
    Hi Prefect Team, we are evaluating a few workflow solutions at my workplace and have a few questions on perfect that I think might get answered on this slack channel. 1. Do you guys have an estimate of the level of effort needed for some current airflow users to migrate to prefect? If you have specific real world examples, it would be great. 2. I have read Prefect is pretty fast and real-time. Do you have some metrics around it for comparison or to look at the numbers? I believe we will have information on what kind of workflows were used to derive these metrics. 3. How do prefect core users handle state management (specifically storing / managing dag state) today? Are there any examples for us to look at? 4. We tried searching for but did not come across any video tutorials of Prefect. My team is fond of video tutorials:) so checking if there are any available on the internet already that I might have not looked at. Please share any links that might address the above questions. Thank you so much!
    👋 5
    k
    m
    3 replies · 3 participants
  • m

    Manuel Aristarán

    04/03/2020, 11:33 PM
    Has anyone else had issues with VSCode not picking up
    prefect
    typing information when importing the module?
    mypy
    is also complaining…
    n
    c
    +1
    4 replies · 4 participants
  • r

    Ricky

    04/04/2020, 5:15 AM
    the usage of this community seems like it would fit better with a discourse forum, allowing tagging and searching. so many posts with detailed discussion and i can't really discover relevant posts, discern between novice/expert questions, or search for questions i may also have.
    j
    5 replies · 2 participants
  • p

    Pierre CORBEL

    04/04/2020, 5:01 PM
    Hello 👋, I register my flow to the cloud and start the agent directly from a python process. The flow is directly build in a docker container by my own. However, at each build and new invocation, a different agent is spinning up in the cloud UI. How can I tell the UI it's the same one which have disconnected and then reconnected?
    c
    4 replies · 2 participants
  • p

    Pierre CORBEL

    04/04/2020, 5:07 PM
    And another complete different question 🙋 I have set up the LocalDaskExecutor (via the env variable) and my flow is run by the cloud. However, tasks are run one by one. How can I achieve parallelism and run more than one in parallel?
    c
    s
    6 replies · 3 participants
  • j

    Jeremiah

    04/04/2020, 6:52 PM
    set the channel topic: Welcome to the Prefect community! Please use threads if possible so we can archive helpful conversations to our GitHub knowledge base: https://github.com/PrefectHQ/prefect/issues?q=label%3A%22Prefect+Slack+Community%22+
  • d

    David Haines

    04/05/2020, 6:24 AM
    Have observed a significant difference in overall execution time between running a flow manually and running the same flow via a local agent on the same machine (triggered via prefect cloud). Consistently 20-25 times slower for this particular flow (lots of mapped tasks). Expected behaviour?
    c
    s
    13 replies · 3 participants
  • i

    Ilay Gordon

    04/05/2020, 12:28 PM
    Hi guys, quick question - I would like to run a task conditioned by an optional
    Parameter
    . Which construct should I use?
    ifelse
    doesn't seem fit for this, since I don't have another task for the false condition. In airflow it was solved using a
    DummyOperator
    and it always looked somewhat redundant. Is there an idiomatic way with Prefect? UPDATE: nevermind, just read about the SKIP signal 🙂
    c
    1 reply · 2 participants
  • d

    Darren Fleetwood

    04/05/2020, 3:49 PM
    Hi all, first of all thanks for what looks like a great product so far! I'm trying to use the core server to submit jobs to the docker agent. I can register flows ok and when I run them they are picked up by the agent. But they sit at 'submitted'. Looking at the logs for the docker containers I get this:
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: /graphql/alpha (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7739090a90>: Failed to establish a new connection: [Errno -2] Name or service not known'))
    . I'm guessing it means the container can't find the graphql api? I'm running under Ubuntu. I'm ssh tunnelling into a jupyter notebook on an Azure VM and running from there - could that be making a difference? thanks!!
    j
    n
    +1
    21 replies · 4 participants
  • k

    Kaz

    04/05/2020, 7:23 PM
    Hey guys. Big fan of the product, it’s really cut down on our manual ETL work load. 1 issue: setting the time out in a
    ShellTask
    doesn’t seem to work for me. I have an S3 sync that occasionally hang (working to figure that out), and as a temporary solution I was hoping to use a timeout + retry. This has solved the issue in the past. I’ve set the timeout to 600 (600 seconds, 10 minutes), and yet I’m getting a task that runs for 6 hours+ when the shell task hangs. On Cloud (Scheduler tier), I can see that my retry and retry delay parameters are set, but I’m not sure if there’s a way to check whether a timeout has been configured on a task. It’s a fairly simple task (top level, only 1 dependency, unmapped), and everything works just fine if I manually restart the task on the Cloud UI (I even see the retry count increment). I’m wondering if there are any additional inputs I needed to configure other than timeout + retry in order to get the timeout to work, or if this is an issue other people have seen before?. Thanks!
    c
    3 replies · 2 participants
  • r

    Rugzo

    04/06/2020, 2:47 PM
    Hey there, first time here. Firstly, fantastic product you have built, seems to have massive potential, coming from someone who used to work with Airflow. The documentation has been super helpful and I would like to see more video demos in the future. My question is, what would you recommend would be the best way to run Agents in a production environment, say EC2, when using Prefect's Cloud feature? I mean: • should the Agent run continuously in the background of an EC2 instance? • should the Agent be embedded in the Python Flow script and run as a container • should we have several Agents running and associated to flows based on
    labels
    Basically, some best practices around Agents in a production setting. Thanks
    👍 1
    s
    1 reply · 2 participants
  • r

    Richard Gu

    04/06/2020, 5:08 PM
    Hello friends, I started looking into generating some performance benchmarks with prefect+dask, and I was wondering if anyone here has done that before and could provide some pointers. Right now, I only have my laptop but am willing to figure out how to set up a proper dask cluster to do this.
    z
    1 reply · 2 participants
  • l

    liren zhang

    04/06/2020, 6:35 PM
    Hi, I am just installing Prefect on my Windows 10 PC and have received the following error. I have tried many things to solve my issue and no luck so far. Would like to get some help on this.... I was running the install
    pip install prefect
    Building wheels for collected packages: pendulum
      Building wheel for pendulum (PEP 517) ... error
      ERROR: Command errored out with exit status 1:
       command: 'c:\users\zhang\appdata\local\programs\python\python38\python.exe' 'c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\zhang\AppData\Local\Temp\tmpjjs207ax'
           cwd: C:\Users\zhang\AppData\Local\Temp\pip-install-2pd94jc4\pendulum
      Complete output (24 lines):
      Traceback (most recent call last):
        File "setup.py", line 2, in <module>
          from setuptools import setup
      ModuleNotFoundError: No module named 'setuptools'
      Traceback (most recent call last):
        File "c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\pip\_vendor\pep517\_in_process.py", line 257, in <module>
          main()
        File "c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\pip\_vendor\pep517\_in_process.py", line 240, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\pip\_vendor\pep517\_in_process.py", line 181, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "C:\Users\zhang\AppData\Local\Temp\pip-build-env-e3fac8fx\overlay\Lib\site-packages\poetry\core\masonry\api.py", line 57, in build_wheel
          return unicode(WheelBuilder.make_in(poetry, Path(wheel_directory)))
        File "C:\Users\zhang\AppData\Local\Temp\pip-build-env-e3fac8fx\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 56, in make_in
          wb.build()
        File "C:\Users\zhang\AppData\Local\Temp\pip-build-env-e3fac8fx\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 82, in build
          self._build(zip_file)
        File "C:\Users\zhang\AppData\Local\Temp\pip-build-env-e3fac8fx\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 102, in _build
          self._run_build_command(setup)
        File "C:\Users\zhang\AppData\Local\Temp\pip-build-env-e3fac8fx\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 130, in _run_build_command
          subprocess.check_call(
        File "c:\users\zhang\appdata\local\programs\python\python38\lib\subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['c:\\users\\zhang\\appdata\\local\\programs\\python\\python38\\python.exe', 'setup.py', 'build', '-b', 'build']' returned non-zero exit status 1.
      ----------------------------------------
      ERROR: Failed building wheel for pendulum
    Failed to build pendulum
    ERROR: Could not build wheels for pendulum which use PEP 517 and cannot be installed directly
    c
    k
    6 replies · 3 participants
  • m

    Manuel Aristarán

    04/06/2020, 6:41 PM
    Hi! Does anyone have any tips on running Docker tasks in a Flow running on a Docker agent? Mounting the hosts’ Docker socket does not seem to be a good practice…
    z
    4 replies · 2 participants
Powered by Linen
Title
m

Manuel Aristarán

04/06/2020, 6:41 PM
Hi! Does anyone have any tips on running Docker tasks in a Flow running on a Docker agent? Mounting the hosts’ Docker socket does not seem to be a good practice…
z

Zachary Hughes

04/06/2020, 8:23 PM
Hi Manuel, are you running into any problems in particular? Always happy to open an issue if there's room for us to improve our Docker tasks.
m

Manuel Aristarán

04/06/2020, 9:56 PM
I’ve started using DockerStorage for my flows, which I initially developed using the local agent. They use Docker tasks, which worked fine when ran locally. Now that the flows are being run in a container, they of course need access to the Docker daemon in the host.
Solution: exposing the TCP socket of the Docker daemon that runs on the host
z

Zachary Hughes

04/07/2020, 2:13 AM
Hi Manuel, sorry— stepped away at the end of the day and missed your response. But I’m glad to hear you found a solution!
View count: 1