https://prefect.io logo
Title
t

Tom Shaffner

11/17/2021, 4:53 PM
Hi all, I'm new to Prefect and trying to set up a local server on an Azure (linux) VM inside a VNET. I'm having issues getting the up and running in the first place. I'm in a corporate environment; unfortunately this means MANY blocked ports. If I do an NMAP scan on the VM I get the first picture below. To even see the server I have to use the expose flag, and set the ui port to 80. With some testing it seems if I set the server port to 1720 it all starts okay (
prefect server start --expose --ui-port=80 --server-port=1720
), I can at least get to the UI then, but the UI gives me a failure message about not connecting to graphql and a link to this slack (second picture below). I've tried setting the graphql port flag 5060 or 443 or even also to 1720 or using the no port mapping flag (https://docs.prefect.io/api/latest/cli/server.html), all to no avail. I'm new to this so I'm probably doing something stupid but I can't figure out what; any thoughts? In case it's related, I'm also unable to start a user agent. When I try
prefect agent local start
I get
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=1720): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1d60a34e20>: Failed to establish a new connection: [Errno 111] Connection refused'))
. I'm unclear on whether the server is automatically starting agents of its own or if I'd have to do this myself in addition to the server. I could potentially have the option to open some additional ports, but it takes several days, a long approval process that might get rejected, and I'm not even clear on which I'd need to open or why. After all, these are mostly internal ports to the same machine, right? I know that's a lot but I'm a bit at a loss here; any help?
k

Kevin Kho

11/17/2021, 5:00 PM
This is a bit hard but just some thoughts. First, the server does not start agents of its own. Agents are started independently and it’s the
config.toml
that points them where to connect to. It looks like you were on the right track there. In general, I think the approach here would be to make sure the UI is working first and correcting to graphQL. I am not sure if setting it to port 1720 will work because that might be a reserved port. I think in general you want to set your ports to ports not in use. It says it’s trying to connect to 4200 in your second image. Could you show me what your
config.toml
looks like?
t

Tom Shaffner

11/17/2021, 5:08 PM
I just checked; my config.toml, at least the one at ~/.prefect/config.toml, is blank. I've just been using flags to test so haven't set one up; should I? I have a backend.toml there which has only
backend="server"
in it. And agreed on getting the UI working first; that was my plan too. Once it's up and working I'll focus on the agents. Setting it to ports not in use would mean I should actually stop using these flags, right? If I try only
prefect server start --expose --ui-port=80
I get, among many other things, these results:
Pulling postgres ... done
Pulling hasura   ... done
Pulling graphql  ... done
Pulling apollo   ... done
Pulling towel    ... done
Pulling ui       ... done
Creating network "prefect-server" with the default driver
Creating tmp_postgres_1 ... done
Creating tmp_hasura_1   ... done
Creating tmp_graphql_1  ... done
Creating tmp_towel_1    ... done
Creating tmp_apollo_1   ... done
Creating tmp_ui_1       ... done
Attaching to tmp_postgres_1, tmp_hasura_1, tmp_graphql_1, tmp_apollo_1, tmp_towel_1, tmp_ui_1
apollo_1    | Checking GraphQL service at <http://graphql:4201/health> ...
apollo_1    | Checking GraphQL service at <http://graphql:4201/health> ...
apollo_1    | Checking GraphQL service at <http://graphql:4201/health> ...
postgres_1  | The files belonging to this database system will be owned by user "postgres".
postgres_1  | This user must also own the server process.
postgres_1  |
postgres_1  | The database cluster will be initialized with locale "en_US.utf8".
postgres_1  | The default database encoding has accordingly been set to "UTF8".
postgres_1  | The default text search configuration will be set to "english".
postgres_1  |
postgres_1  | Data page checksums are disabled.
postgres_1  |
postgres_1  | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres_1  | creating subdirectories ... ok
towel_1     | {"severity": "INFO", "name": "prefect-server.Scheduler", "message": "Scheduled 0 flow runs."}
graphql_1   |
graphql_1   | Running Alembic migrations...
graphql_1   | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
graphql_1   | INFO  [alembic.runtime.migration] Will assume transactional DDL.
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade  -> 27811b58307b, Create extensions and initial settings
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 27811b58307b -> 72e2cd3e0469, Initial database tables migration
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 72e2cd3e0469 -> c4d792bdd05e, Add flow run idempotency key
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade c4d792bdd05e -> 3398e4807bfb, Add traversal functions
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 3398e4807bfb -> b9086bd4b962, Create message table
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade b9086bd4b962 -> c1f317aa658c, Remove state_id foreign keys
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade c1f317aa658c -> 6611fd0ccc73, Simplify run state update triggers
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 6611fd0ccc73 -> 70528cee0d2b, Add agent persistence
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 70528cee0d2b -> 9cb7539b7363, Add index on agent_id
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 9cb7539b7363 -> e148cf9f1e5b, Add task run name
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade e148cf9f1e5b -> 850b76d44332, Add flow run config
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 850b76d44332 -> 24f10aeee83e, Add label column to flow runs
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 24f10aeee83e -> 3c87ad7e0b71, Add artifact api
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 3c87ad7e0b71 -> 57ac2cb01ac1, Add index for task run names
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 57ac2cb01ac1 -> 7ca57ea2fdff, Add run_config to flow runs and flow groups
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 7ca57ea2fdff -> 9116e81c6dc2, Add description to flow group table
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 9116e81c6dc2 -> 459a61bedc9e, Improve run triggers to handle same-version states
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade 459a61bedc9e -> a666a3f4e422, Add unique index for idempotency key
graphql_1   | INFO  [alembic.runtime.migration] Running upgrade a666a3f4e422 -> ac5747fb571c, Add unique constraint on edge table for task IDs
graphql_1   | Applied Hasura metadata from /prefect-server/services/hasura/migrations/metadata.yaml
graphql_1   |
graphql_1   | Database upgraded!
hasura_1    | {"type":"pg-client","timestamp":"2021-11-17T17:05:00.057+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(0)."}}
hasura_1    | {"type":"pg-client","timestamp":"2021-11-17T17:05:00.057+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(1)."}}
hasura_1    | {"type":"startup","timestamp":"2021-11-17T17:05:00.057+0000","level":"error","detail":{"kind":"catalog_migrate","info":{"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (172.25.0.2) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}}}
hasura_1    | {"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (172.25.0.2) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}
postgres_1  | selecting default max_connections ... 100
postgres_1  | selecting default shared_buffers ... 128MB
postgres_1  | selecting default timezone ... Etc/UTC
postgres_1  | selecting dynamic shared memory implementation ... posix
postgres_1  | creating configuration files ... ok
postgres_1  | running bootstrap script ... ok
postgres_1  | performing post-bootstrap initialization ... ok
postgres_1  | syncing data to disk ... ok
postgres_1  |
postgres_1  | Success. You can now start the database server using:
postgres_1  |
postgres_1  |     pg_ctl -D /var/lib/postgresql/data -l logfile start
postgres_1  |
postgres_1  |
postgres_1  | WARNING: enabling "trust" authentication for local connections
postgres_1  | You can change this by editing pg_hba.conf or using the option -A, or
postgres_1  | --auth-local and --auth-host, the next time you run initdb.
postgres_1  | waiting for server to start....2021-11-17 17:04:58.984 UTC [49] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1  | 2021-11-17 17:04:59.089 UTC [50] LOG:  database system was shut down at 2021-11-17 17:04:58 UTC
postgres_1  | 2021-11-17 17:04:59.107 UTC [49] LOG:  database system is ready to accept connections
postgres_1  |  done
postgres_1  | server started
postgres_1  | CREATE DATABASE
postgres_1  |
postgres_1  |
postgres_1  | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
postgres_1  |
postgres_1  | 2021-11-17 17:05:00.291 UTC [49] LOG:  received fast shutdown request
postgres_1  | waiting for server to shut down....2021-11-17 17:05:00.303 UTC [49] LOG:  aborting any active transactions
postgres_1  | 2021-11-17 17:05:00.305 UTC [49] LOG:  background worker "logical replication launcher" (PID 56) exited with exit code 1
postgres_1  | 2021-11-17 17:05:00.307 UTC [51] LOG:  shutting down
postgres_1  | 2021-11-17 17:05:00.609 UTC [49] LOG:  database system is shut down
postgres_1  |  done
postgres_1  | server stopped
postgres_1  |
postgres_1  | PostgreSQL init process complete; ready for start up.
postgres_1  |
postgres_1  | 2021-11-17 17:05:00.638 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
postgres_1  | 2021-11-17 17:05:00.638 UTC [1] LOG:  listening on IPv6 address "::", port 5432
postgres_1  | 2021-11-17 17:05:00.659 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1  | 2021-11-17 17:05:00.696 UTC [77] LOG:  database system was shut down at 2021-11-17 17:05:00 UTC
postgres_1  | 2021-11-17 17:05:00.709 UTC [1] LOG:  database system is ready to accept connections
ui_1        | Missing the PREFECT_SERVER__BASE_URL environment variable.  Using default
ui_1        | :space_invader::space_invader::space_invader: UI running at localhost:8080 :space_invader::space_invader::space_invader:
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: using the "epoll" event method
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: nginx/1.20.1
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: built by gcc 8.3.0 (Debian 8.3.0-6)
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: OS: Linux 5.11.0-1021-azure
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: getrlimit(RLIMIT_NOFILE): 1048576:1048576
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: start worker processes
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: start worker process 12
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: start worker process 13
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: start worker process 14
ui_1        | 2021/11/17 17:05:09 [notice] 11#11: start worker process 15
graphql_1   | {"severity": "INFO", "name": "prefect-server.GraphQL Server", "message": "Using uvicorn log level = 'debug'"}
graphql_1   | INFO:     Started server process [8]
graphql_1   | INFO:     Waiting for application startup.
graphql_1   | INFO:     Application startup complete.
graphql_1   | INFO:     Uvicorn running on <http://0.0.0.0:4201> (Press CTRL+C to quit)
apollo_1    | Checking GraphQL service at <http://graphql:4201/health> ...
graphql_1   | INFO:     172.25.0.5:52090 - "GET /health HTTP/1.1" 200 OK
apollo_1    | {"status":"ok","version":"2021.11.09"}
apollo_1    | GraphQL service healthy!
apollo_1    |
apollo_1    | > @ serve /apollo
apollo_1    | > node dist/index.js
apollo_1    |
apollo_1    | Building schema...
graphql_1   | INFO:     172.25.0.5:52118 - "POST /graphql/ HTTP/1.1" 200 OK
apollo_1    | Building schema complete!
apollo_1    | Server ready at <http://0.0.0.0:4200> :rocket: (version: 2021.11.09)
apollo_1    | Sending telemetry to Prefect Technologies, Inc.: {"source":"prefect_server","type":"startup","payload":{"id":"d0f3dca5-469c-4e76-9ad7-ba50f304d71c","prefect_server_version":"2021.11.09","api_version":"0.2.0"}}
graphql_1   | INFO:     172.25.0.5:52136 - "POST /graphql/ HTTP/1.1" 200 OK
graphql_1   | INFO:     172.25.0.5:52148 - "POST /graphql/ HTTP/1.1" 200 OK
WELCOME TO
_____  _____  ______ ______ ______ _____ _______    _____ ______ _______      ________ _____
|  __ \|  __ \|  ____|  ____|  ____/ ____|__   __|  / ____|  ____|  __ \ \    / /  ____|  __ \
| |__) | |__) | |__  | |__  | |__ | |       | |    | (___ | |__  | |__) \ \  / /| |__  | |__) |
|  ___/|  _  /|  __| |  __| |  __|| |       | |     \___ \|  __| |  _  / \ \/ / |  __| |  _  /
| |    | | \ \| |____| |    | |___| |____   | |     ____) | |____| | \ \  \  /  | |____| | \ \
|_|    |_|  \_\______|_|    |______\_____|  |_|    |_____/|______|_|  \_\  \/   |______|_|  \_\
Visit <http://localhost:80> to get started, or check out the docs at <https://docs.prefect.io>
Ugh, sorry, I thought it would give a collapse option to the code paste. I'm new to Prefect so in this case, the fact that the UI is up and running and the above results seems like it should be working, but in the UI it still tells me it can't connect to the Prefect server.
k

Kevin Kho

11/17/2021, 5:13 PM
No worries. That helps a lot. So server is a bunch of services as you noticed. There is a UI, Database, API. These logs say that postgres was not able to start because it could not use port 5432. Even if the UI can render, the issue is the database can’t even spin up in this environment so there is nothing for the API to connect to and no API for the UI to connect to. Does that make sense?
t

Tom Shaffner

11/17/2021, 5:15 PM
It does make sense. Given my port limitations though, is there a way for me to get that spun up as is? Or am I required to start a firewall exception request to open that port? And if so, what's the source IP of that request? Presumably it's coming from any UI usage to the VM, right? So would that be any and all IPs as the source?
k

Kevin Kho

11/17/2021, 5:20 PM
I honestly don’t think so. The port restrictions are pretty heavy. For the database to even just spin up, it’s not an external request. It’s just localhost can’t even occupy that port. But for external requests to your VM in the future if you have agents from other devices working with your server, those would go through port 443 from the IP of the agents. Does that make sense?
t

Tom Shaffner

11/17/2021, 5:29 PM
Hmm, not sure. This makes it sound like the issue is not an external ports issue, but an internal one. I just doublechecked that the linux firewall was open for ports 5432 and 4200 (e.g.
sudo ufw allow 4200
), but if I start the server after I still get the same error. If this is internal, and this is a standard ubuntu VM, what might cause the port to be blocked like this?
At this point I'm no longer even sure this is a port issue, but I've basically followed along all the tutorials/instructions on a setup that is supposed to be easy and nothing is working here; am I doing something wrong or crazy?
k

Kevin Kho

11/17/2021, 5:49 PM
I wouldn’t know what would cause the port to be blocked. Am doubting what I said. You’re not doing something wrong since you are already using the
--expose
flag. Sorry I think I was wrong the first time around. I re-read the logs and it looks like it was able to start. Just to make sure, the error you see if that the UI can’t connect to the API?
Will ask around
a

Anna Geller

11/17/2021, 5:58 PM
@Tom Shaffner did you know btw that Prefect Cloud is free to use for 20000 tasks each month and that there is an out-of-the-box Prefect agent on Azure you could use? maybe you could get started that way and later compare whether Server or Cloud works better for you?
t

Tom Shaffner

11/17/2021, 6:59 PM
@Kevin Kho, correct, the second picture in the first post shows the error.
@Anna Geller, I do know, but storing any corporate data or data about corporate data outside the corporate environment is something I'd need to check with and get approval for in advance. We have an Azure subscription and the VNET means the infrastructure I'm using is still within the corporate environment. If I'm able to set this up and it proves useful then scaling in that direction might be the way to go one day, but I saw no point in going through a lengthy and cumbersome approval process before I even had a chance to test. Want to make sure this fits our needs first and is usable.
postgres_1  | PostgreSQL init process complete; ready for start up.
postgres_1  |
postgres_1  | 2021-11-17 19:53:52.121 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
postgres_1  | 2021-11-17 19:53:52.121 UTC [1] LOG:  listening on IPv6 address "::", port 5432
postgres_1  | 2021-11-17 19:53:52.132 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1  | 2021-11-17 19:53:52.172 UTC [76] LOG:  database system was shut down at 2021-11-17 19:53:52 UTC
postgres_1  | 2021-11-17 19:53:52.187 UTC [1] LOG:  database system is ready to accept connections
postgres_1  | 2021-11-17 19:53:56.594 UTC [84] ERROR:  duplicate key value violates unique constraint "pg_extension_name_index"
postgres_1  | 2021-11-17 19:53:56.594 UTC [84] DETAIL:  Key (extname)=(pgcrypto) already exists.
postgres_1  | 2021-11-17 19:53:56.594 UTC [84] STATEMENT:  CREATE EXTENSION IF NOT EXISTS pgcrypto SCHEMA public
ui_1        | Missing the PREFECT_SERVER__BASE_URL environment variable.  Using default
ui_1        | 👾👾👾 UI running at localhost:8080 👾👾👾
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: using the "epoll" event method
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: nginx/1.20.1
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: built by gcc 8.3.0 (Debian 8.3.0-6)
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: OS: Linux 5.11.0-1021-azure
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: getrlimit(RLIMIT_NOFILE): 1048576:1048576
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: start worker processes
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: start worker process 19
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: start worker process 20
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: start worker process 21
ui_1        | 2021/11/17 19:54:01 [notice] 18#18: start worker process 22
towel_1     | {"severity": "ERROR", "name": "prefect-server.HasuraClient", "message": "Encountered internal API exception: [Errno 113] Connect call failed ('192.168.16.3', 3000)", "exc_info": "Traceback (most recent call last):\n  File \"/prefect-server/src/prefect_server/utilities/exceptions.py\", line 87, in reraise_as_api_error\n    yield\n  File \"/prefect-server/src/prefect_server/utilities/graphql.py\", line 64, in execute\n    timeout=30,\n  File \"/usr/local/lib/python3.7/site-packages/httpx/_client.py\", line 1385, in post\n    timeout=timeout,\n  File \"/usr/local/lib/python3.7/site-packages/httpx/_client.py\", line 1148, in request\n    request, auth=auth, allow_redirects=allow_redirects, timeout=timeout,\n  File \"/usr/local/lib/python3.7/site-packages/httpx/_client.py\", line 1169, in send\n    request, auth=auth, timeout=timeout, allow_redirects=allow_redirects,\n  File \"/usr/local/lib/python3.7/site-packages/httpx/_client.py\", line 1196, in send_handling_redirects\n    request, auth=auth, timeout=timeout, history=history\n  File \"/usr/local/lib/python3.7/site-packages/httpx/_client.py\", line 1232, in send_handling_auth\n    response = await self.send_single_request(request, timeout)\n  File \"/usr/local/lib/python3.7/site-packages/httpx/_client.py\", line 1269, in send_single_request\n    timeout=timeout.as_dict(),\n  File \"/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py\", line 153, in request\n    method, url, headers=headers, stream=stream, timeout=timeout\n  File \"/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py\", line 65, in request\n    self.socket = await self._open_socket(timeout)\n  File \"/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py\", line 86, in _open_socket\n    hostname, port, ssl_context, timeout\n  File \"/usr/local/lib/python3.7/site-packages/httpcore/_backends/auto.py\", line 38, in open_tcp_stream\n    return await self.backend.open_tcp_stream(hostname, port, ssl_context, timeout)\n  File \"/usr/local/lib/python3.7/site-packages/httpcore/_backends/asyncio.py\", line 234, in open_tcp_stream\n    stream_reader=stream_reader, stream_writer=stream_writer\n  File \"/usr/local/lib/python3.7/contextlib.py\", line 130, in __exit__\n    self.gen.throw(type, value, traceback)\n  File \"/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py\", line 12, in map_exceptions\n    raise to_exc(exc) from None\nhttpcore._exceptions.ConnectError: [Errno 113] Connect call failed ('192.168.16.3', 3000)"}
To ensure there weren't any simple corruption issues, I deleted and recreated the virtual environment and tried to initialize the server again. The error messages I'm getting changed as a result. The full list was too long to paste here, but I pasted the above log which shows the last bit that was working and then the first error message (it's long). That same error then appears to repeat, with minor variations, five more times.
k

Kevin Kho

11/17/2021, 8:02 PM
Hey sorry, didn’t get a chance yet to spin up a server and compare logs. Will surely do in a bit though.
t

Tom Shaffner

11/17/2021, 8:21 PM
Actually, I wiped my virtual environment again, wiped my docker settings back to factory defaults, and then verified that I had full permissions to all the virtual environment and to another folder where I could point the volume flags and it seemed to work! All the errors disappeared from the startup. When I log into the UI though, it still can't connect to the server. 😕
k

Kevin Kho

11/17/2021, 8:23 PM
Ok so I chatted with an engineer here. When you spin up Prefect Server, sometime the logs will show errors but that doesn’t necessarily mean it failed to start. They can just be noisy. So the first thing we can do is check that the API is functioning properly without the UI. So what you can do after doing
prefect server start --expose
is going to the GraphQL Playground. The default is
localhost:4200
.
You will get something like this and then you can do the following query to check the API is healthy
t

Tom Shaffner

11/17/2021, 8:26 PM
Hmm, this is on a linux VM without a browser though so I have to put in the IP address and then the port. If I do that in this case, and replace the port I was using with 4200, it still brings up the same UI
Tried it multiple times; it basically redirects me back to the same UI:
And I still get this:
k

Kevin Kho

11/17/2021, 8:27 PM
Oh. Are you trying to connect from another machine? Or to localhost from the same machine?
t

Tom Shaffner

11/17/2021, 8:27 PM
Correct; that's why I needed the --expose flag, right?
I don't have a browser on that machine. No gui
k

Kevin Kho

11/17/2021, 8:28 PM
Ah ok we need to set the endpoint in your
config.toml
because it’s getting the default localhost. One sec will get that
t

Tom Shaffner

11/17/2021, 8:28 PM
Oh! Nice, okay.
k

Kevin Kho

11/17/2021, 8:29 PM
Replace the 4200 with the port your API lives on. This is on the machine connecting to server
Also on the machine with server, you may need this in the config.toml when you start it:
These are taken from this article
t

Tom Shaffner

11/17/2021, 8:36 PM
Hmm, when you say the port my API lives on, what does this refer to?
I'm starting just with the server here; no clients going yet even
k

Kevin Kho

11/17/2021, 8:41 PM
So there are two things here. The first is the
[server]
endpoint = "YOUR_MACHINES_PUBLIC_IP:4200/graphql"
in the config.toml of the machine you are connecting from. This is to register flows and communicate with the server API. You can test if this connection is good with
prefect agent local start
and it should be able to connect. The second is the UI. The UI needs to be configured to point the API of your Server (port 4200 by default). The default is the
localhost:4200/graphql
. If this is not changed, your local machine goes to the Prefect UI and then gets directed to
localhost:4200/graphql
, which is wrong. You want to direct it to the server API. To change this, you can do:
[server]
  [server.ui]
    apollo_url = "<http://YOUR_MACHINES_PUBLIC_IP:4200/graphql>"
in the
config.toml
file that lives on the server. Do this before you start the server so that it points to the API. When you connect to the UI from your local machine, it will talk to the right API endpoint also.
t

Tom Shaffner

11/17/2021, 8:43 PM
I'm a bit worried we're going to be back to my original problem at this point. I added both the flags to my config file, as pictured, but something isn't working because when I load the UI it not only doesn't connect, the error doesn't reflect the correct address (second picture). And in both cases, the connection IS now going across the network, right? So I'd need port 4200 opened between the VM and all possible sources for this to work?
k

Kevin Kho

11/17/2021, 8:44 PM
Yes to opening port 4200 and 8080 on the VM
Did you restart server after adding the flags?
t

Tom Shaffner

11/17/2021, 8:45 PM
Right, but not just on the VM, within the corporate network, correct? I'd need firewall exceptions to them both? Though actually I can skip 8080 because I can just redirect it to port 80 with the flag; possible to do that with 4200 too?
I did restart it; wondering if there's a typo or something? Or somewhere I need to turn on a flag to get it to use that file?
k

Kevin Kho

11/17/2021, 8:46 PM
Yes to firewall. I am not sure with the 4200 moving port.
t

Tom Shaffner

11/17/2021, 9:32 PM
Just to close this out; Keven kindly offered to chat with me over the phone and it sounds like I will need to open the relevant ports in the firewall and try again after. Thanks again Kevin for taking the time!
👍 2
:upvote: 1
@Kevin Kho, finally got ports 8080 and 4200 opened in my network and now I can connect to the UI fine. When I try to set up the graphql connection though, using the article you'd sent at https://medium.com/the-prefect-blog/prefect-server-101-deploying-to-google-cloud-platform-47354b16afe2, I'm still getting failures. My config.toml looks like the attached, as per the instructions at that link, but when I connect to the UI the server still seems to be attempting to connect to localhost and failing (second picture). Have the settings to update the IP for the graphql server changed since that article?
k

Kevin Kho

11/29/2021, 3:57 PM
It hasn’t. This recent thread has a better explanation than I had btw. Did you have this
config.toml
on the VM when you started Prefect Server? Let me re-read some stuff
So you mean going to
http://*YOUR_MACHINES_PUBLIC_IP*:8080
works right?
t

Tom Shaffner

11/29/2021, 4:03 PM
It does. And going directly to the graphql at http://MACHINE_IP:4200 works too (brings up the playground now). Though I should note perhaps, this is inside a VNET so the "public" IP isn't actually public, but since all machines attempting to access it are inside the corporate network too I figure it works as effectively public within the network.
k

Kevin Kho

11/29/2021, 4:06 PM
I think it should work if the UI is working. Just seems like it’s not configured right if it’s pointing to
localhost
instead of your IP. Just clarifying, do you have two separate
config.toml
files? One on the VM and one on local?
t

Tom Shaffner

11/29/2021, 4:09 PM
Yeah, I'd think so too, keep tweaking my config file thinking maybe there's some typo in it that's causing an sisue? It's weird. I am using --expose here too, but that's needed on the VM anyway so that shouldn't make a difference, right? Yes, just one config.toml. Here's an
ls
of the folder and contents attached, in order. When you say local, what do you mean? I'm trying to run this all from the VM, and the only other connection to it is via the web browser; I wouldn't need a config.toml on the machine with the web browser, would I?
k

Kevin Kho

11/29/2021, 4:11 PM
Yes you need a `config.toml`on the machine with the web browser because the default is to use that
localhost:4200/graphql
.
t

Tom Shaffner

11/29/2021, 4:11 PM
Oops, here's the one with the ls
Hmm, okay, the local machine in that case is a Windows PC; where do I put the file? Also, that means every user will have to set up that file for the UI to work? Is there not some way I can set the server to always use that IP address regardless of the computer accessing the UI?
k

Kevin Kho

11/29/2021, 4:14 PM
It would be in the
~/.prefect
folder of the machine you are connecting from. I would say yes in general you need the
config.toml
file for every computer that will connect to server because that the very least, you’ll see to point it to the server API to register flows.
:upvote: 1
t

Tom Shaffner

11/29/2021, 4:20 PM
Hmm, the
~
folder in Windows is the user account folder in
C:\Users
, correct? So
C:\Users\USERNAME
? I just tried it, restarted the server and reloaded the UI in a fresh browser window with the
.prefect
folder added to my local user account and a
config.toml
in it; same error.
k

Kevin Kho

11/29/2021, 4:24 PM
I am not sure, but I think that is right. If you type in
prefect backend server
, this should create the home directory I think. Also if you do this in Python,:
from os.path import expanduser
expanduser("~")
the output should be the home directory where your
.prefect
folder should live,
a

Anna Geller

11/29/2021, 4:26 PM
Alternatively, perhaps you can add this env variable to your Windows env variables, then restart your machine and try again? PREFECT__SERVER__ENDPOINT=“YOUR_SERVER_PUBLIC_IP:4200/graphql”
t

Tom Shaffner

11/29/2021, 4:27 PM
Hmm, this makes me a bit concerned. Setting it up this way seems a bit odd; the code of the webpage (i.e. coming from the server) must include somewhere the default setting to check localhost for the graphql server. Is there not some way I can change that default on the server rather than needing to have every user set up a file in their user account? Otherwise, to do the above, I'd need to have every user install python locally (admin rights required), execute the above command (have to teach them how to access python and install prefect and then run the abaove), add the config file, and if I ever end up having to change the IP of the server I'd have to have every single user update their local file too, rather than changing it in a single location. All that somewhat obviates the value I'm going for here, which is giving less technical users access to the UI to kick off flows manually when they need an update. Why is this setting set on the user level instead of changeable from the server end?
@Anna Geller, that would certainly be easier, I'll try that. It still seems strange to me that the users need to do this rather than having a way to do it from the server though; if that's truly required I'd think this should be considered for a change.
a

Anna Geller

11/29/2021, 4:31 PM
I agree that it’s not too convenient when it comes to user management. If you want an easier and more convenient setup, I can recommend Prefect Cloud which works out of the box. I know you mentioned privacy concern with Cloud but with the hybrid execution model your code and data remain entirely private.
t

Tom Shaffner

11/29/2021, 4:32 PM
@Anna Geller, do I need that in my system variables or is it sufficient to put it in my user environment variables?
k

Kevin Kho

11/29/2021, 4:32 PM
I think user should work.
t

Tom Shaffner

11/29/2021, 4:35 PM
@Anna Geller, if this is a setting required on the user side, wouldn't the hybrid model have the same problem? Or is there functionality in the hybrid model that the version I'm using lacks?
a

Anna Geller

11/29/2021, 4:38 PM
yeah, I think it would need to be set in the same session from which you want to register your flows and interact with the backend API. Setting it in the system env variables would be more convenient since you would only need to set it once. No, Prefect Cloud wouldn’t have this problem since Cloud provides user management incl. multi-tenant environments, single sign-on and RBAC. User would need to generate an API key and authenticate only once from their development machine using:
prefect auth login --key XXX
k

Kevin Kho

11/29/2021, 4:38 PM
No this is all handled because you just connect to Prefect Cloud and Prefect hosts the API for you by default. There is no configuration needed. Did you try configuring from this screen btw?
:upvote: 1
t

Tom Shaffner

11/29/2021, 4:43 PM
I just tried the environment variable approach, it didn't work. 😕 @Kevin Kho, I had no tried that though! Testing it now, it seems like maybe that is helping!! Still working through a full test once I've set up that way to see if it all connects.
@Anna Geller, honestly, I'm not going to initiate all the cumbersome work needed to set up that unless I can verify this all works. Right now I'm still working through these issues, which so far are a lot more complicated than the documentation indicated, and I'm still a bit concerned that something as simple as an
if
command can break the flow structure itself too (separate thread); that seems a far cry from the advertised simplicity of being able to add some decorators to my existing python code. The purpose here is a test of this system to see if it works for us, and if the setup is so complex that we'd have to buy a solution to get it working...well, that's not a dynamic I'd feel comfortable recommending. I still like the prospect here, I'm still hopeful I can get set up and start transitioning some of our current ETL processes over, but unless/until I can do that and verify they work reliably for some time I'm not going to push the company to go through the time and effort to review this from a security and data privacy perspective.
a

Anna Geller

11/29/2021, 4:55 PM
In that case, Perhaps you can start with Prefect Core only? You can just
pip install prefect
and start building and executing flows locally, without the backend API configured. You can then run:
flow.run()
or with the CLI “prefect run -p flow.py” to run the flow locally without having to talk to the backend API.
t

Tom Shaffner

11/29/2021, 4:56 PM
@Anna Geller, that's what I'm trying to do, but if I want it to be on a machine that's available 24/7 it needs to be one of our Azure VMs; we don't have any other machines that are always connected to the network. And unfortunately that means I needed to open the ports (as Kevin helped me discover) and go through all this setup. Life inside the corporate IT restrictions can be complex. 😕
a

Anna Geller

11/29/2021, 4:57 PM
I see. In that case, perhaps you can SSH to that instance and install Prefect Core there without the Server complexity?
t

Tom Shaffner

11/29/2021, 4:59 PM
Hmm, maybe? Now I'm not sure I understand the distinction between Server and Core; is there somewhere I can go to understand this better? The documentation I saw, particularly https://docs.prefect.io/orchestration/server/overview.html#prefect-server-vs-prefect-cloud-which-should-i-choose, seemed to focus on Server vs. Cloud.
a

Anna Geller

11/29/2021, 5:00 PM
I absolutely understand and sympathize with that, I worked briefly in Audit before so I know what you’re talking about 🙂 Would it be possible to chat with your supervisor and check if you could do a Cloud PoC without using any sensitive data? Just building flows with demo data to test the functionality? Perhaps this would make it easy to test out all the features before going down the approval rabbit hole.
t

Tom Shaffner

11/29/2021, 5:12 PM
haha, I wish it worked like that. The review I'm concerned about is the one IT would need to do to open any communications between internal and external, regardless of direction, and to verify data privacy is maintained. I'd expect that will be a major fight and take several months so I'm not starting it unless/until I know this works for our needs. Is there documentation comparing core and server? Or is the difference between Core and Cloud discussed at https://docs.prefect.io/core/faq.html#what-is-the-difference-between-prefect-core-and-prefect-cloud basically the same thing?
k

Kevin Kho

11/29/2021, 5:13 PM
That would be the same thing. Am spinning up server on GCP to fully go through this end-to-end. Will report back.
t

Tom Shaffner

11/29/2021, 5:18 PM
@Kevin Kho, I actually think this might be working now. I had to tweak the config file a bit to get the agent connecting but that seems to be working now, and while it's clunky, dropping in an address on that page at connection time is MUCH better than needing to set up a local config file. That does seem to have me up and running for the moment, which is a big step forward! So I'm back to the other thread; the lambda function you'd suggested hasn't worked for me yet, still debugging to make sure I have it set up right. I have a question on that perhaps later, but I'll follow up on it in the other thread. For the moment, dropping the right address in the UI seems the simplest solution here; thanks so much for the help!
Incidentally, the config file seemed, in the end, to potentially be somewhat irrelevant on the server side; since everything is running on the same machine I don't think I even needed it; just that address in the web GUI.
k

Kevin Kho

11/29/2021, 5:20 PM
Oh ok that’s good to hear. Could you comment in that thread so I get a notification and can revisit there? Gotcha
👍 1