Q
09/22/2022, 5:32 PMRemoteFileSystem(basepath="<webhdfs://basepath>", settings=params).write_path(path="subpath/filename", content=content)
I get an exception like this:
null for uri: <https://hdfs.example.com>:port/webhdfs/v1webhdfs%3A//basepath/subpath?op=MKDIRS
Which is unsurpring, since URL above is clearly not valid.
The problem here is that `fsspec.WebHDFS`'s methods makedirs/mkdir expect a slash-prefixed path w/o scheme (in this case /basepath/subpath
), while prefect.RemoteFileSystem
just passes <scheme://basepath/subpathdir>
and expects fsspec to do the rest.
2. A lot of fsspec.AbstractFileSystem methods rely on fsspec.infer_storage_options, most importantly open
, which is used by prefect.RemoteFileSystem.write_path
and prefect.RemoteFileSystem.read_path
. infer_storage_options
calls urlsplit and returns urlsplit(path).path
, i.e. strips away scheme+netloc.
Here the problem is that netloc is whatever comes after the scheme and before the first slash, e.g. for <webhdfs://home/user/project>
home
would get stripped away erroneosly.
This means one would have to prepend an extra segment to basepath to be sacrificed to fsspec's implementation (e.g. <webdhfs://thisgetslost/home/user/project>
).
I can subclass RemoteFileSystem
and override write_path
to solve both problems.
But then I would, it seems, need to make this class definition available and import it before calling Block.load
.
What I don't understand is how I can make an agent use this to fetch code. Right now it just fails with KeyError: "No class found for dispatch key 'subclassname' in registry for type 'Block'."
. Any ideas?
Maybe there is a better way of doing this that doesn't involve subclassing?Zanie
09/22/2022, 5:39 PMQ
09/22/2022, 5:54 PM