https://prefect.io logo
Title
s

Simon

05/22/2023, 6:20 PM
Am having trouble loading a .env file using python-dotenv within Prefect. Within
Deployment.build_from_flow
, am setting
is_schedule_active=ENV == "production"
based upon whether the env is production, or not. This is determined by an import from another file:
from common.config import ENV
and in that file is:
# have also tried dotenv.find_dotenv() without success
dotenv_path = os.path.abspath(os.path.join(__file__, "../../../.env"))
if not os.path.isfile(dotenv_path):
    raise RuntimeError(dotenv_path)
This loads happily in a local ipython shell. But when run as a prefect flow, the folder structure appears to be different:
Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/private/var/folders/d8/sqvg_pms3pd0xvcl2tvvxhqr0000gn/T/tmp8e6a8h47prefect/prefect_daily_internal_metric.py", line 9, in <module>
    from common.config import ENV
  File "/private/var/folders/d8/sqvg_pms3pd0xvcl2tvvxhqr0000gn/T/tmp8e6a8h47prefect/common/config.py", line 9, in <module>
    raise RuntimeError(dotenv_path)
RuntimeError: /private/var/folders/d8/sqvg_pms3pd0xvcl2tvvxhqr0000gn/T/.env
n

Nate

05/22/2023, 6:32 PM
hi @Simon are you saying you're having trouble trying to access your env file while creating the deployment or while running the flow? I would think you wouldn't want the env file to be cloned when running the flow, rather have certain values set the in env for deployment's infrastructure / passed in as flow run parameters
s

Simon

05/22/2023, 6:41 PM
The deployment gets applied without error on the server. It’s when a flow run is triggered that the error is thrown. I was hoping to avoid setting env vars in say a venv activate file on the server and instead load them from a .env file on the server.
n

Nate

05/22/2023, 6:46 PM
what infrastructure are you using for your deployment? each variety should have an
env
field where you can populate an arbitrary dictionary so that the env vars are directly accessible to the flow run without cloning the actual env file
s

Simon

05/22/2023, 6:49 PM
For prod, four ubuntu 22.04 machines. One running prefect server and three running prefect agents.
n

Nate

05/22/2023, 6:49 PM
sorry, I meant Infrastructure as in the prefect proper noun 🙂
s

Simon

05/22/2023, 6:51 PM
Ah. It will be Process infrastructure. So are you saying the env vars need to be hardcoded and in version control somewhere, rather than being loaded from the server’s actual environment?
I suspect that it is something fundamental in the move from Prefect 1 to 2 which I have yet to grasp!
This is the relevant part of the flow ‘registration’ file:
deployment = Deployment.build_from_flow(
    flow=main,
    name=PIPELINE_STANDARD_DEPLOYMENT_NAME,
    work_pool_name=PIPELINE_STANDARD_WORK_POOL_NAME,
    work_queue_name=PIPELINE_STANDARD_WORK_QUEUE_NAME,
    schedule=(
        IntervalSchedule(
            interval=timedelta(hours=24),  # Daily
            anchor_date=datetime(
                2020, 5, 1, 4, 0, tz="Europe/London"
            ),  # At 4am London time
        )
    ),
    is_schedule_active=ENV == "production",
)

if __name__ == "__main__":
    deployment.apply()
I guess my question is, can I load ENV from a dot_env file?
And I guess there is a follow-up question about whether the agents will have any trouble loading their .env files?
n

Nate

05/22/2023, 7:05 PM
hmm you shouldn't have any trouble accessing env vars at deployment creation time (while calling
build_from_flow
above) right? the issue with trace at the beginning of this thread I think was the fact that you were in a temp dir, which is the default behavior when the agent submits a flow run as a new process somewhere I'm suggesting that you should not have to: • hardcode env values into version control • clone your env file for a given flow run how I would approach this using infra blocks is to create a
Process
infra block per unique environment
prod
,
dev
,
stage
etc and populate the
env
on each, so that at deployment creation time (like calling
build_from_flow
above), you can pass an additional kwarg
infrastructure=Process.load("my-process-with-infra-specific-env-vars-set")
does that make sense?
also just a heads up, projects (currently in beta) are becoming the default way to manage deployments (replacing the infra block paradigm), so if you're just getting started, I might recommend checking them out
s

Simon

05/22/2023, 7:06 PM
Ha I just yesterday stripped out a bunch of our Prefect v1 projects code
n

Nate

05/22/2023, 7:07 PM
projects are a different proper noun in prefect 2 😅 they're basically just a place to share configuration for building potentially many deployments. there's a couple useful files for templating in config like we've been discussing
s

Simon

05/22/2023, 7:15 PM
you shouldn’t have any trouble accessing env vars at deployment creation time (while calling
build_from_flow
above) right?
Correct. When creating the deployment, these lines in common/config.py are triggered when the import is made and execute without error:
dotenv_path = os.path.abspath(os.path.join(__file__, "../../../.env"))
if not os.path.isfile(dotenv_path):
    raise RuntimeError(dotenv_path)
find_dotenv(raise_error_if_not_found=True)  # belt and braces check for this debugging


load_dotenv(dotenv_path, verbose=True)
n

Nate

05/22/2023, 7:19 PM
so maybe even easier than creating a separate Process infra block per execution environment would be dynamically populating your
env
field of your process infra block at deployment creation time based on whichever environment you're trying to create a deployment for at that time
env = {"prod_key": "prod_value"} if os.getenv("prod_indicator") else {} # not suggesting hardcoding, just that it can be set dynamically

deployment = Deployment.build_from_flow(
    flow=main,
    name=PIPELINE_STANDARD_DEPLOYMENT_NAME+"PROD",
    work_pool_name=PIPELINE_STANDARD_WORK_POOL_NAME,
    work_queue_name=PIPELINE_STANDARD_WORK_QUEUE_NAME,
    env=env,
    schedule=(
        IntervalSchedule(
            interval=timedelta(hours=24),  # Daily
            anchor_date=datetime(
                2020, 5, 1, 4, 0, tz="Europe/London"
            ),  # At 4am London time
        )
    ),
    is_schedule_active=ENV == "production",
)

if __name__ == "__main__":
    deployment.apply()
s

Simon

05/22/2023, 7:26 PM
the issue with trace at the beginning of this thread I think was the fact that you were in a temp dir, which is the default behavior when the agent submits a flow run as a new process somewhere
So it is the agent which is in a temp dir, and not the server (as only the agent executes the flow run) i.e. the agent loads the file the deployment `flow=`line has pointed it to (which in this case happens to be the same file containing the deployment code block, but no matter).
OK I think your suggestion will work well. Instead of sending copies of the .env file to the workers as well during CI/CD deployment (an ansible playbook), the env=env line will distribute whatever is in the server’s copy of the file (including secrets) to the agent at flow run preparation time?
n

Nate

05/22/2023, 7:28 PM
the agent process itself shouldnt be in a tempdir unless you started the process in one, im saying the agent would spawn new flow runs (if its deployment had Process infrastructure) in a temp dir by default
👍 1
s

Simon

05/22/2023, 7:32 PM
Ok I need to separately work out how to tell the agent to spawn runs in a known place as it will need to interact with the filesystem it’s running on. Don’t want to derail the issue at hand, so will have a go at finding that myself.
👍 1
n

Nate

05/22/2023, 7:32 PM
the env=env line will distribute whatever is in the server’s copy of the file (including secrets) to the agent at flow run preparation time?
whatever you pass as
env
while creating your deployment, as shown above, will become env vars directly accessible by any flow run created from that deployment so for example, if you had a dot_env file for each execution environment, and you have dev stg prd then you could do something like
for environment in {"dev", "stg", "prd"}:
   env_dict = env_dict_from_key(environment) # some random helper that loads your dotenv as a dict

   # call `Deployment.build_from_flow` with env_dict specified as above
   ...
s

Simon

05/22/2023, 7:33 PM
ok cool. I am planning to have an entirely separate staging prefect server, but it’s nice to know it’s possible
👍 1
Thanks for the help, I very much appreciate you taking the time, and your patience
n

Nate

05/22/2023, 7:36 PM
Ok I need to separately work out how to tell the agent to spawn runs in a known place as it will need to interact with the filesystem it’s running on.
you can set the `working_dir` on the Process infra block (which should be better documented)
🙏 1
sure thing!