Hello, small and likely stupid question. I have so...
# ask-community
p
Hello, small and likely stupid question. I have something like this:
Copy code
@task
  def get_n_newest_files_for_pattern(pattern: str, path: str, n: int) -> list:
      """
      Task to get the n newest files for a given pattern.
      """
      logger = prefect.context.get("logger")
      <http://logger.info|logger.info>(f"Getting {n} newest files in {path} for pattern {pattern}")
      path_files = os.listdir(path)
      files_with_path = [os.path.join(path, f) for f in path_files]
      files = [pathlib.Path(f) for f in files_with_path if re.search(pattern, f)]
      <http://logger.info|logger.info>(f"Found {len(files)} files for pattern {pattern}")
      logger.debug(f"Files: {files}")
      <http://logger.info|logger.info>("Sorting files by modification time")
      files.sort(key=lambda x: x.stat().st_mtime, reverse=True)
      <http://logger.info|logger.info>(f"Returning the {n} newest files")
      return files[:n]
I am getting file not found errors with:
Copy code
FileNotFoundError: [Errno 2] No such file or directory: '<Parameter: Path to the top level of the experiment tree>/outdata/fesom'
I thought that once it was loaded, any
Parameter
would behave as whatever type it is supposed to be? It is defined like this:
Copy code
path = Parameter(name="Path to the top level of the experiment tree")
I one time before do an f-string conversion:
Copy code
outdata_path = f"{path}/outdata/fesom"
Not having f-strings would be possible, but a bit annoying
z
Hey @Paul Gierz — can you show how you’re calling your task?
p
Copy code
18 with Flow(
   17     "Regridded Timmean of Newest N Files for a FESOM 2D Variable (ESM Tools Layout)"
   16 ) as flow:
   15     # Get the experiment ID
   14     expid = Parameter(name="Experiment ID")
   13     # Get the main path of the output directory from the top of the experiment tree from the user:
   12     path = Parameter(name="Path to the top level of the experiment tree")
   11     # Get the 2D variable as a user parameter
   10     varname = Parameter(name="FESOM Variable Name")
    9     # Get the number of files to average as a user parameter
    8     nfiles = Parameter(name="nfiles", default=30)
    7     # Get the regrid size from the user:
    6     lat_size = Parameter("Latitude Size (e.g 1 for a 1x1 degree grid)", default=1.0)
    5     lon_size = Parameter("Longitude Size (e.g 1 for a 1x1 degree grid)", default=1.0)
    4
    3     lons = np_arange(-180, 180, lon_size)
    2     lats = np_arange(-90, 90, lat_size)
    1     output_dir = path + "/outdata/fesom"
  68      pattern = finalize_pattern(varname)
    1     # Get all files in the output directory
    2     files = get_n_newest_files_for_pattern(pattern, output_dir, nfiles)
or, well, that is the fixed version. Before I had:
Copy code
output_dir = f"{path}/outdata/fesom"
needing to mix up pathlib and put paths is not really clean, but for right now I’m just playing around and trying to learn what works and what doesn’t
z
So here you’re doing operations on parameters within the
Flow
block. Any mutation needs to happen in a task e.g.
output_dir = path + "/outdata/fesom"
will throw an error because
path
is still a
Parameter
type until it is sent to a task at runtime.
p
so, so any “Pure Python” is not allowed. got it. I guess that is not the case if I use the declarative API?
z
In either case, no pure Python. You’re just defining the structure of your DAG.
The Orion project changes this, moves closer to pure Python (which is part of why we’re really excited about it) — but in Prefect <1.0 all of your code that works with real “values” should be in tasks.
p
I read online early 22. After the Christmas break I have about 8-12 weeks of work I can fill with containerisation and (if we are lucky) building our new HPC system. Would you think it's worth waiting until that is all the way ready?
z
It’ll still be under development during that timeline. Although most of the major features should be done by the end of it, I definitely can’t make any promises as we want to get things right 🙂 Do you mean that you will have other work to do for 12 weeks and could wait that long to look into orchestration?
p
Exactly, yes. By the time our new computer is actually ready for users it'll be April-Mid May. It depends a little bit on the colleague at Cray and the ongoing chip shortage…
z
I think that Orion will be a lot more powerful / easy to use, but it’s a work in progress so you’ll have to account accordingly 🙂