https://prefect.io logo
Docs
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
prefect-community
  • g

    George Coyne

    03/26/2020, 4:12 PM
    I have a lambda that executes flow runs that is getting 403 Errors starting this morning, I generated a new tenant token to check if there was an issue with my creds but the issue persisted. I am digging deeper now. Has anything changed on the prefect side?
    šŸ‘€ 1
    z
    t
    12 replies Ā· 3 participants
  • j

    Jeff Brainerd

    03/26/2020, 6:43 PM
    Hi folks, quick API question — in this query I can grab a particular parameter (awesome) but would love to be able to filter on a particular value, e.g. ā€œcompany_slug=acmecorpā€ — is this possible today? Thanks šŸ™
    query {
      flow_run {
        name
        parameters(path: "company_slug")
      }
    }
    k
    6 replies Ā· 2 participants
  • s

    Scott Zelenka

    03/26/2020, 8:19 PM
    I'm registering a Flow with two Schedules, my intent is to have the same Flow run twice but with different Parameters for each run:
    schedule=Schedule(
                clocks=[
                    IntervalClock(
                        start_date=pendulum.datetime(2020, 1, 1, tz='UTC'),
                        interval=datetime.timedelta(days=1),
                        parameter_defaults=dict(
                            sfdc_listener_name='CustomerService',
                            sfdc_report_url='...'
                        )
                    ),
                    IntervalClock(
                        start_date=pendulum.datetime(2020, 1, 1, tz='UTC'),
                        interval=datetime.timedelta(days=1),
                        parameter_defaults=dict(
                            sfdc_listener_name='OneSupport',
                            sfdc_report_url='...'
                        )
                    )
                ]
            ),
    However, it seems that when this registered with Prefect Cloud, only the first Schedule is setup. Do I need to create two different Flows to have the same logic execute with different parameters on a schedule?
    k
    c
    +1
    6 replies Ā· 4 participants
  • b

    Ben Fogelson

    03/26/2020, 11:02 PM
    Is there a way to have a
    Parameter
    whose default is the value of another
    Parameter
    ? Something like
    from prefect import task, Flow, Parameter
    
    @task
    def add(x, y):
        return x + y
    
    with Flow('flow') as flow:
        a = Parameter('a', default=0)
        b = Parameter('b', default=a)
        c = add(a, b)
    c
    m
    8 replies Ā· 3 participants
  • c

    Christopher

    03/27/2020, 8:35 AM
    Hi folks, I lead an ML and data science team, and I’m evaluating a bunch of tools in this space. Prefect looks very nice indeed, but I am having trouble understanding exactly what I get with a paid plan vs the open source core. I have looked at all the feature lists I can find, but I guess I’m missing one of those classic ā€œfeature comparison tablesā€ that one often sees. Can someone point me in the direction of something like that?
    j
    a
    +1
    10 replies Ā· 4 participants
  • d

    dherincx

    03/27/2020, 7:39 PM
    Hello everyone! I'm using a
    switch
    . Is it possible to access the results of the switch in a subsequent task? For example, I have a list of data (some of which contain latitude/longitude and others don't). My switch consists of getting lat/long for the records that are missing it, while records that do have coordinates are simply returned. How can I access the final, joined list after the switch?
    k
    2 replies Ā· 2 participants
  • p

    Pierre CORBEL

    03/27/2020, 10:24 PM
    Hello there šŸ‘‹, I'm a new user of Prefect, coming from the Airflow world when I happily used it for 3+ years. šŸŽ‚ I want to begin by saying that the project, the quality of the code and the quality of the documentation are outstanding 🤩 But I need some help about finding the good way to use it and the good practice šŸ¤“ I have a standard ELT flow with a big json file (1GB) as input. For my task to run successfully on my medium machine, I combine ijson and iterator to read and write the file on disk chunk by chunk and not overload the memory (I can't stuck a 1GB json dict in memory) Then I load the file directly into my DB, without passing via python. What is the prefect-way of handling a similar usecase here? šŸ¤” Prefect encourage passing data from task to task in-memory but here, I offload it to disk and only pass the path of the file between task. Is there a way to pass an iterator between task instead of a single object? One way I'm thinking of doing it in a industrialized way is maybe to share a file cache between tasks. What do you guys think about it?
    :marvin: 1
    c
    1 reply Ā· 2 participants
  • s

    Scott Zelenka

    03/27/2020, 11:16 PM
    Has anyone had success using a large Docker base image? I'm trying to convert a Selenium script into a Prefect Flow .. for Selenium, it requires a full install of Google Chrome which makes the base image around 700MB with a ton of layers
    selenium/standalone-chrome
    . But when I try to build through the CLI, it's giving me trouble (essentially spun the CPU fan for an hour, and eventually gave up because
    Traceback (most recent call last):=============================>]  143.3MB/143.3MB
      File "example-selenium.py", line 473, in <module>
        parameters=dict(
      File "/opt/anaconda3/envs/fastapi-async-sqlalchemy/lib/python3.7/site-packages/prefect/core/flow.py", line 1419, in register
        no_url=no_url,
      File "/opt/anaconda3/envs/fastapi-async-sqlalchemy/lib/python3.7/site-packages/prefect/client/client.py", line 623, in register
        serialized_flow = flow.serialize(build=build)  # type: Any
      File "/opt/anaconda3/envs/fastapi-async-sqlalchemy/lib/python3.7/site-packages/prefect/core/flow.py", line 1228, in serialize
        storage = self.storage.build()  # type: Optional[Storage]
      File "/opt/anaconda3/envs/fastapi-async-sqlalchemy/lib/python3.7/site-packages/prefect/environments/storage/docker.py", line 282, in build
        self._build_image(push=push)
      File "/opt/anaconda3/envs/fastapi-async-sqlalchemy/lib/python3.7/site-packages/prefect/environments/storage/docker.py", line 312, in _build_image
        self.pull_image()
      File "/opt/anaconda3/envs/fastapi-async-sqlalchemy/lib/python3.7/site-packages/prefect/environments/storage/docker.py", line 520, in pull_image
        raise InterruptedError(line.get("error"))
    InterruptedError: write /var/lib/docker/tmp/GetImageBlob079036145: no space left on device
    c
    j
    +1
    6 replies Ā· 4 participants
  • p

    Pierre CORBEL

    03/29/2020, 3:26 PM
    Hello there šŸ‘‹, I got my first ELTC (Extract, Load, Transform, Cleanup) flow working on Prefect Core 🤩 But now I got a question: I would like to run the flow at the launch of the python app, but also to apply a schedule for every night at 3. So basically, I want a
    CronClock
    combine to a launch at startup. Is there an easy way to achieve this? 🧐 I think the fact that the scheduler wait for the next schedule raise the need for a "fire_at_start" parameter. In my case, if my schedule is programmed every day at 3 o'clock, I don't want to wait X hours before my DB is loaded up with data. šŸ•’ Does it make sense?
    s
    2 replies Ā· 2 participants
  • s

    Scott Zelenka

    03/29/2020, 5:18 PM
    Hi Cloud users, interested in your suggestions. I have a Flow that executes great on my local dev machine. However, some of the logic within the Flow leverages a different application (XVBF & Google Chrome) that's also installed on my local dev machine. When I attempt to serialize the Flow into a Docker image, it doesn't run because the required application(s) aren't installed in the Docker image the
    .register
    generates. I can get the dependent application(s) to run inside a Docker image with supervisord, but that would require the entrypoint of Docker to somehow trigger supervisord before it attempts to execute the Flow. I couldn't find an API in Prefect to allow for this type of functionality? Another option is to provide the dependent application(s) as a native cloud service, and communicate over the network to access the same way I do it on my local dev machine.. but that'd be overkill for this one Flow. Curious if anyone has been successful in wrapping another (non-Python) application inside a serialized Flow to deploy on Prefect Cloud.
    c
    2 replies Ā· 2 participants
  • c

    Chris O'Brien

    03/29/2020, 10:38 PM
    Hi Prefect Team, we have noticed that even without caching, that the results from tasks in a scheduled flow don’t fall out of scope (and therefore eligible for garbage collection) until the 3rd run. Is there a way to stop this behaviour? With large extract functions this can mean a lot of memory being held.
    c
    3 replies Ā· 2 participants
  • m

    Mohit Kumar Agarwal

    03/30/2020, 11:39 AM
    Hi Prefect Team, I am new to Prefect. I am facing a very basic issue where in i am unable to start the prefect core server on my local machine. I am able to create/run basic prefect flow in a python IDE. But unable to start prefect core server and Web UI on local machine. is there any prerequisites i need to install to run prefect core server? i dont have docker installed on my machine.
    šŸ‘ 1
    j
    1 reply Ā· 2 participants
  • j

    Joe Schmid

    03/30/2020, 2:03 PM
    Hi everyone, I'm sure there will be an official announcement too, but just wanted to drop a quick congratulations to the Prefect team on getting the 0.10.0 release out that includes the UI and database! Last night I did a quick
    pip install -U prefect
    then
    prefect server start
    and brought up the UI in a browser without any trouble. This is a huge milestone and one that the community will benefit from greatly. Anyone can now run the fully open source Prefect Core and have a UI to manage and monitor flows. This gives folks a greater spectrum of options for running Prefect, from fully open source and no cost with Core to commercial with additional features with Cloud. Great work to the Prefect team!
    šŸ‘ 2
    :prefect: 9
    😊 7
    :marvin: 1
    🤩 5
    ā¤ļø 11
    j
    1 reply Ā· 2 participants
  • b

    Bob Colner

    03/30/2020, 4:06 PM
    seeing issues with the docs today
    l
    j
    3 replies Ā· 3 participants
  • s

    Scott Zelenka

    03/30/2020, 4:08 PM
    How would you debug a problem with the serialized Flow, after it's been packaged into a Docker storage? I have a Flow that executes fine when running locally, but when I do a
    flow.register()
    and it gets orchestrated to run in Cloud, it fails with the
    result_handler
    . I'd like to simulate running the serialized flow within the Docker image it generated locally, to see if I can figure out what's different between that environment and my local machine. Is there a proper way to do this?
    j
    m
    4 replies Ā· 3 participants
  • l

    Leo Meyerovich (Graphistry)

    03/30/2020, 5:15 PM
    Hi all! I saw that some folks here are interested in contributing their expertise to #COVID efforts, and we've been ramping up http://ProjectDomino.org on scalable social techniques for behavior change and combating misinformation, taking inspiration by techniques from public health (HIV), social media marketing, & security/fraud threat intelligence + autoresponse . We recently adopted Prefect.io as part of our orchestration layer (continuous data integration + autoresponse to alert community leaders & safety teams platforms on misinformation before the 'blast radius' gets big), and chances are, if you're here, you can have a big impact, esp. as we're getting the pipeline layers smoothed out šŸ™‚ If relevant, just drop into the Slack channel linked above. Stay safe everyone!
    :upvote: 4
  • m

    Mike Lutz

    03/30/2020, 6:36 PM
    Prefect-team, do you have extra guidance you can provide (or point me at examples for other open core projects ) of how to understand your "using the Software, or any derivative works thereof, to make available any software-as-a-service [etc etc]" langauge? What I'm trying to understand where "multi-user" crosses over to "excluded" - I would already assume "selling something" as excluded, but what about if I set up in my companies cloud and multiple people in my company use it? That loosely feels like it might be software-as-service, just that there isn't money (though in a company setting I could imagine department chargebacks and the like) - is that use in appropriate? (while I'm mostly talking about the contract, I'm also would like to know what you intended the spirit to be, I.E. even if this is allowable by the letter of the contract, is that something you intended? (you-all have done good work, and I want to honor what you intend too! šŸ™‚)
    j
    3 replies Ā· 2 participants
  • m

    Manuel AristarƔn

    03/30/2020, 6:39 PM
    Congratulations and thanks for opensourcing Prefect UI! Question: is there a way of setting an alternative port for the postgres instance started by
    prefect server?
    I have an instance of postgres running on the default port…
    šŸŽ‰ 1
    c
    j
    +1
    6 replies Ā· 4 participants
  • p

    Preston Marshall

    03/30/2020, 6:47 PM
    Really happy to see prefect opened up, even if it's not a real FOSS license this makes it way easier to adopt. I do wonder how you are planning on making money now though.
  • a

    Arkady Kleyner

    03/30/2020, 6:59 PM
    hello team prefect! We are reviewing snowflake task and are finding a number of limitations. Having to connect each time creates quite a bit of overhead and we are unable to run multisql statements as per https://docs.snowflake.com/en/user-guide/python-connector-api.html#
    z
    b
    +1
    7 replies Ā· 4 participants
  • k

    Kamil OkĆ”Ä

    03/31/2020, 9:22 AM
    Hello! I just tried local UI with "prefect server start", but http://localhost:8080/ returns just a blank page. Console says:
    vue-apollo.js:14 Uncaught TypeError: Cannot read property 'substr' of undefined
        at Module.56d7 (vue-apollo.js:14)
        at c (bootstrap:89)
        at Object.0 (app.e76c6ad2.js:1)
        at c (bootstrap:89)
        at t (bootstrap:45)
        at bootstrap:267
        at app.e76c6ad2.js:1
    āœ‹ 1
    āœ… 3
    m
    b
    +4
    14 replies Ā· 7 participants
  • j

    John Faucett

    03/31/2020, 1:31 PM
    Hi, does anyone know how I can get the result of one task to be passed to another task further downstream with the imperative api?
    flow = Flow('foo', tasks=[t1,t2,t3, t4])
    
    flow.add_edge(t1,t2, key='x')
    flow.add_edge(t2,t3)
    flow.add_edge(t3,t4)
    
    # now t4 needs the results of t1 not t2 or t3
    j
    j
    +1
    8 replies Ā· 4 participants
  • p

    Preston Marshall

    03/31/2020, 2:28 PM
    What is an unreasonable number of tasks to be mapping over? If I have a table with millions of rows and need to transform each one of them (for example), can I generate a task for each row? Or would I need to export "pages" and process each of these in a task?
    a
    2 replies Ā· 2 participants
  • j

    Jeff Brainerd

    03/31/2020, 4:11 PM
    Question on the Q&A Fireside chat tomorrow — will it be recorded for those of us that cannot attend ā€œin personā€? Thanks! šŸ™
    j
    2 replies Ā· 2 participants
  • f

    Fabian Thomas

    03/31/2020, 4:17 PM
    Hey everybody, first of all: having access to an open-source UI is great, big thank you to the team! šŸ‘šŸŽŠ I've managed to start the server and registering my flow, but trying to run a local agent using
    prefect agent start
    gives me this exception:
    prefect.utilities.exceptions.AuthorizationError: No agent API token provided.
    Any suggestions? I had already registered an agent with Prefect Cloud and an API token before.
    c
    j
    +1
    4 replies Ā· 4 participants
  • b

    Benjamin Filippi

    03/31/2020, 4:19 PM
    Hi Run my first hello world on local server. Success ! with the new local instance server, are you planning to get the local agent to be called by the server rather than the agent polling the server Thanks
    z
    5 replies Ā· 2 participants
  • b

    Ben Fogelson

    03/31/2020, 4:22 PM
    Is there a way to activate a conda environment inside of a Docker image before Prefect tries to pip install prefect? And also before it tries to run the flow in that container?
    s
    1 reply Ā· 2 participants
  • t

    Thomas La Piana

    03/31/2020, 5:27 PM
    I would love to spin up the prefect webserver in k8s, but all of the tutorials point to using docker + docker-compose...is there any way that I can bootstrap it myself without having to run docker within docker in k8s?
    j
    s
    +1
    13 replies Ā· 4 participants
  • m

    Maxime Lavoie

    03/31/2020, 5:54 PM
    Anyone could help me with this seemingly simple hurdle? I have
    with Flow("My Flow") as flow:
        my_param = Parameter("my_param", default='a default value')
        ...
    
    flow.run()
    # flow.run(parameters={"my_param": "overwritten value"})
    When I run this, I get
    raise ValueError(
    ValueError: Flow.run received the following unexpected parameters: my_param
    What am I missing?
    k
    j
    +1
    4 replies Ā· 4 participants
  • d

    David N

    03/31/2020, 7:33 PM
    I'm sorry to ask, its probably really dumb. I cloned the git repo, did pip --upgrade, ran "prefect backend server".. Copied the hello-world script, instead of flow.run(), I did flow.register(). I got this message back:
    Result Handler check: OK
    /home/ec2-user/venv/prefect/lib64/python3.7/site-packages/prefect/core/task.py:258: UserWarning: DEPRECATED: all cache_* options on a Task will be deprecated in 0.11.0, and removed in 0.12.0; the options will be moved to a Task's prefect.engine.Result object.
      UserWarning,
    Flow: <http://localhost:8080/flow/ef63906a-6e2b-4b7b-aba5-6e9303560e5f>
    But I dont see anything changing in the UI, and dont see anything under the "Flows" link. I have this running on an ec2 machine so I dont use "localhost", but can reach the UI.. Just looks like it knows nothing about the flow. What have I missed?
    s
    j
    +2
    25 replies Ā· 5 participants
Powered by Linen
Title
d

David N

03/31/2020, 7:33 PM
I'm sorry to ask, its probably really dumb. I cloned the git repo, did pip --upgrade, ran "prefect backend server".. Copied the hello-world script, instead of flow.run(), I did flow.register(). I got this message back:
Result Handler check: OK
/home/ec2-user/venv/prefect/lib64/python3.7/site-packages/prefect/core/task.py:258: UserWarning: DEPRECATED: all cache_* options on a Task will be deprecated in 0.11.0, and removed in 0.12.0; the options will be moved to a Task's prefect.engine.Result object.
  UserWarning,
Flow: <http://localhost:8080/flow/ef63906a-6e2b-4b7b-aba5-6e9303560e5f>
But I dont see anything changing in the UI, and dont see anything under the "Flows" link. I have this running on an ec2 machine so I dont use "localhost", but can reach the UI.. Just looks like it knows nothing about the flow. What have I missed?
s

Scott Zelenka

03/31/2020, 8:05 PM
you still need to trigger a run after you register it. So if you navigate to that link, and click "Quick Run" or "Run" and you have an Agent running.. then it'll start to do work
This helped me understand the process better:

https://www.youtube.com/watch?v=1qyDh6CH4Foā–¾

d

David N

03/31/2020, 8:23 PM
Thanks. Im doing all those steps as well, just like the video. I run my python.etl script (in my case, hello.py) which ends in flow.register(), I get back the same DEPRECATED cache_* warning, and a Flow: url response. But when I pull up the UI and try to search for the flow, like in the video, I get no result, there is nothing there like what the video is showing. ("Flow!" created in the UI, before he starts the agent)
j

josh

03/31/2020, 8:38 PM
@David N are you serving the Prefect server API on
<http://localhost:4200>
? I believe that’s what the UI is currently set to look for. In the original message you mention that you don’t use ā€œlocalhostā€ so I’m not sure what you’re serving on
from prefect import config
print(config.server.endpoint)
d

David N

03/31/2020, 8:46 PM
its all running on an ec2 machine running amazon linux 2, so no, I access it via the local ip address, not localhost
but this is probably my issue
from prefect import config
print(config.server.endpoint)
Output:
<http://localhost:4200>
j

josh

03/31/2020, 8:48 PM
Yeah when registering the flow it’s sending the metadata payload to that endpoint so I’m wondering why the UI isn’t able to access it. Just to confirm you’re running with
prefect server start
?
d

David N

03/31/2020, 8:50 PM
yep, and its still running. When I do flow.register(), i get back a "success" and flow url like expected
and when i use the CLI, I see the flow registered
(prefect) [ec2-user@ip-10-118-33-125 dbt_workflow]$ prefect get flows
NAME        VERSION    AGE
hello-flow  3          22 minutes ago
j

josh

03/31/2020, 8:53 PM
Ah since it looks like the Upcoming Runs tile is loading I think it has something to do with the UI not being able to talk to that server endpoint. cc @nicholas Have you seen this before?
šŸ‘€ 1
n

nicholas

03/31/2020, 8:58 PM
@David N can you open up the browser console when you're on your UI? (⌘ + ⌄ + i on a Mac) and send a screnshot?
d

David N

03/31/2020, 8:58 PM
standby
n

nicholas

03/31/2020, 8:58 PM
You may need to refresh if it's not populated
d

David N

03/31/2020, 8:59 PM
Im sorry I didnt think of that
index.js:111 POST <http://localhost:4200/graphql/> net::ERR_CONNECTION_REFUSED
(anonymous) @ index.js:111
e @ Observable.js:197
value @ Observable.js:279
(anonymous) @ bundle.esm.js:13
Promise.then (async)
(anonymous) @ bundle.esm.js:12
e @ Observable.js:197
....etc....
n

nicholas

03/31/2020, 9:00 PM
Yup so the UI endpoint that you're serving isn't proxied to your ec2 instance
d

David N

03/31/2020, 9:03 PM
Any easy advice? I dont really understand how a server cant refer to itself as localhost.. but I'm also not a networking / vpc expert either
n

nicholas

03/31/2020, 9:05 PM
I think the quickest solution would be to modify your local machine's
/etc/hosts
file to point
localhost:4200
to your server's endpoint
Ah as for the server, it's not the server that's referring to itself incorrectly, it's that you're using the UI on your local machine, which means that the UI, which only runs on the client (since it's a static build) and not on the server, is trying to communicate to the localhost of the machine on which it's running
d

David N

03/31/2020, 9:12 PM
i see, the ui, which is running in my browser, is making a call to "localhost" via javascript, and my laptop rejects it..
n

nicholas

03/31/2020, 9:12 PM
Exactly
i

ishan

04/03/2020, 3:22 PM
ty nicholas, helped me out too!
šŸ™Œ 1
View count: 1