https://prefect.io logo
Title
q

Q

09/22/2022, 5:32 PM
šŸ‘‹ I'm trying to use RemoteFileSystem block with WebHDFS. I'm running into 2 problems: 1. When I call
RemoteFileSystem(basepath="<webhdfs://basepath>", settings=params).write_path(path="subpath/filename", content=content)
I get an exception like this:
null for uri: <https://hdfs.example.com>:port/webhdfs/v1webhdfs%3A//basepath/subpath?op=MKDIRS
Which is unsurpring, since URL above is clearly not valid. The problem here is that `fsspec.WebHDFS`'s methods makedirs/mkdir expect a slash-prefixed path w/o scheme (in this case
/basepath/subpath
), while
prefect.RemoteFileSystem
just passes
<scheme://basepath/subpathdir>
and expects fsspec to do the rest. 2. A lot of fsspec.AbstractFileSystem methods rely on fsspec.infer_storage_options, most importantly
open
, which is used by
prefect.RemoteFileSystem.write_path
and
prefect.RemoteFileSystem.read_path
.
infer_storage_options
calls urlsplit and returns
urlsplit(path).path
, i.e. strips away scheme+netloc. Here the problem is that netloc is whatever comes after the scheme and before the first slash, e.g. for
<webhdfs://home/user/project>
home
would get stripped away erroneosly. This means one would have to prepend an extra segment to basepath to be sacrificed to fsspec's implementation (e.g.
<webdhfs://thisgetslost/home/user/project>
). I can subclass
RemoteFileSystem
and override
write_path
to solve both problems. But then I would, it seems, need to make this class definition available and import it before calling
Block.load
. What I don't understand is how I can make an agent use this to fetch code. Right now it just fails with
KeyError: "No class found for dispatch key 'subclassname' in registry for type 'Block'."
. Any ideas? Maybe there is a better way of doing this that doesn't involve subclassing?
āœ… 1
z

Zanie

09/22/2022, 5:39 PM
Hi! I’d welcome an issue / pull request to address the parsing issues with the remote file system. It seems a bit tricky though. You need the block to be imported to be usable. One way is to register your module as a plugin e.g. https://github.com/PrefectHQ/prefect-aws/blob/main/setup.py#L30-L31
q

Q

09/22/2022, 5:54 PM
Alright, I'll try it out and get back to you, thanks! Will create an issue when I'm done, hopefully will have a better understanding.
Adding an entrypoint worked. Created an issue: https://github.com/PrefectHQ/prefect/issues/6957