https://prefect.io logo
v

Viet Nguyen

08/01/2022, 2:57 AM
Hi all, I'm trying to implement the first concept in the diagram, but got this error, the Lambda function was assigned to an almost admin permissions IAM role, I'm not sure how to deal with it (e.g
Failed to create the Prefect home directory at /home/sbx_user1051/.prefect
) The full error below: I didn't execute any Prefect flow tho, just tried to import prefect package with the dummy Lambda handler function. We aim to do things with server-less infrastructure using AWS services. Our actual pipeline may involve processing very large amount of NetCDF files, but many occasions, will be just a few newly uploaded file. So my questions, how to overcome the above error? And is the second option doable? We would use Fargate cluster for Dask client, but the environment where Dask client is created needs to have sufficient memory ~10GB etc. Many thanks.
a

Anna Geller

08/01/2022, 10:41 AM
can you move code blocks and text to the thread and only specify the main problem in your main message? thanks a lot in advance
🙌 1
c

Cole Murray

08/02/2022, 4:07 AM
So my questions, how to overcome the above error?
The specific error is from trying to write to the /home directory on a lambda function. Lambda functions only have write access to the /tmp dir (unless using EFS). The Prefect error is just logging as warning. It is the numba error that is crashing the funciton It looks like you’re using numba, with a cache=True flag based on the error. You can override the default directory here: https://numba.pydata.org/numba-doc/dev/reference/envvars.html#envvar-NUMBA_CACHE_DIR
🙌 1
💯 1
RE the design: I wouldn’t suggest using Lambda as an agent in Prefect. Lambda isn’t really designed to be long-running (and becomes more expensive after a point). If you’re looking to stay serverless you can explore: • ECS w/Fargate or EC2's (can run prefect agents) • AWS Batch (Use a prefect agent running on EC2 / ECS and launch an AWS Batch task to run your actual work. This moves prefect to be primarily for orchestration purposes, which may or may not meet your use-case)
🙌 2
🙏 1
v

Viet Nguyen

08/02/2022, 4:15 AM
Hi @Cole Murray thanks heaps for your comments, we ended up to choose EC2 as the option, Lambda has maximum timeout 15mins, and 10GB memory, which might not enough for some of our data collections
👍 2
a

Anna Geller

08/02/2022, 11:06 AM
@Viet Nguyen I would still appreciate if you could follow the rules and move the code blocks to the thread 🙂 @Cole Murray thanks so much, as always awesome answers
v

Viet Nguyen

08/02/2022, 11:07 AM
@Anna Geller sorry but how will I do it? Happy to revise the post.
a

Anna Geller

08/02/2022, 11:08 AM
edit the post, cut the code block and paste it here into the thread
the main message should only specify the problem, all details should be in thread
v

Viet Nguyen

08/02/2022, 11:28 AM
Python test code:
Copy code
import json
import requests
from prefect import flow, task, get_run_logger
from functools import partial
import xarray as xr
import io
import boto3
# import s3fs
import numpy as np
import time
import dask
import awswrangler
# from dask.distributed import Client, LocalCluster


# @task
# def inc(x: int) -> int:
#     return x + 1


# @flow
# def cal():
#     logger = get_run_logger()
#     result = inc(2)
#     print(f"Result: {result}")
#     <http://logger.info|logger.info>("-------- COMPLETED --------")
#     # return result


def handler(event, context):
    # TODO: prefect flow
    # cal()

    return {
        'headers': {'Content-Type': 'application/json'},
        'statusCode': 200,
        'body': json.dumps({
            'message': 'My Lambda container works',
            'event': event
        })
    }
Copy code
START RequestId: 694ca06b-ab6c-45b6-b52b-9b13cf8fa66f Version: $LATEST
/var/task/prefect/context.py:461: UserWarning: Failed to create the Prefect home directory at /home/sbx_user1051/.prefect
with SettingsContext(profile=profile, settings=new_settings) as ctx:
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[WARNING]	2022-08-01T02:00:29.589Z		Matplotlib created a temporary config/cache directory at /tmp/matplotlib-xy76zuy8 because the default path (/home/sbx_user1051/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
[ERROR] RuntimeError: cannot cache function 'step_count': no locator available for file '/var/task/numpy_groupies/aggregate_numba.py'
Traceback (most recent call last):
  File "/var/lang/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/var/task/app.py", line 5, in <module>
    import xarray as xr
  File "/var/task/xarray/__init__.py", line 1, in <module>
    from . import testing, tutorial
  File "/var/task/xarray/testing.py", line 10, in <module>
    from xarray.core.dataarray import DataArray
  File "/var/task/xarray/core/dataarray.py", line 28, in <module>
    from ._reductions import DataArrayReductions
  File "/var/task/xarray/core/_reductions.py", line 17, in <module>
    import flox
  File "/var/task/flox/__init__.py", line 4, in <module>
    from .aggregations import Aggregation  # noqa
  File "/var/task/flox/aggregations.py", line 7, in <module>
    import numpy_groupies as npg
  File "/var/task/numpy_groupies/__init__.py", line 42, in <module>
    from .aggregate_numba import aggregate as aggregate_nb, step_count, step_indices
  File "/var/task/numpy_groupies/aggregate_numba.py", line 459, in <module>
    def step_count(group_idx):
  File "/var/task/numba/core/decorators.py", line 212, in wrapper
    disp.enable_caching()
  File "/var/task/numba/core/dispatcher.py", line 863, in enable_caching
    self._cache = FunctionCache(self.py_func)
  File "/var/task/numba/core/caching.py", line 601, in __init__
    self._impl = self._impl_class(py_func)
  File "/var/task/numba/core/caching.py", line 337, in __init__
    raise RuntimeError("cannot cache function %r: no locator available "
END RequestId: 694ca06b-ab6c-45b6-b52b-9b13cf8fa66f
REPORT RequestId: 694ca06b-ab6c-45b6-b52b-9b13cf8fa66f	Duration: 35527.19 ms	Billed Duration: 35528 ms	Memory Size: 10240 MB	Max Memory Used: 303 MB	
Unknown application error occurred
error log
@Anna Geller thanks Anna done!
🙌 1
🙏 1
a

Anna Geller

08/02/2022, 11:32 AM
I appreciate it - looks great!
c

Christopher Boyd

10/11/2022, 4:36 PM
By default, only /tmp is writable in lambda; Therefore this would work:
Copy code
filepath = '/tmp/' + key
If you need to eprsist results with matplotlib, you’ll need to force the path
5 Views