Danny Vilela
12/15/2021, 11:31 PMHdfsResult
class (a la `luigi.contrib.hdfs.target.HdfsTarget`; docs) but I’m stuck on the patterns the prefect
codebase uses for the location
attribute. It seems like the Result
base class implements an interface that allows location
at both initialization time and when calling exists(...)
or read(...)
. I guess my question is: why? Is it not enough to restrict the user to only pass location
at initialization time and use that value throughout the exists
and read
methods?
Edit: follow-up question: why does the prefect
codebase follow the pattern of creating a new Result
instance during Result.read(…)
? As opposed to updating the current value (self.value
) instance?Kevin Kho
self
, it will not be thread safe and with multiprocessing like on a LocalDaskExecutor, you may find that some files are overwritten an others are not written out because they ran very close to each other.
Taking in a value gives your the flexibility to update it and be thread safeKevin Kho
Danny Vilela
12/16/2021, 12:11 AMDataFrame
objects, I’d want to set new.value = spark.read.parquet(…)
. But from the other Result.read(…)
examples, it seems like prefect
actually wants us to serialize that data. Is that correct? If so, why?
I know there’s a PandasSerializer
(docs), but I’m not sure of the equivalent SparkSerializer
🤔Danny Vilela
12/16/2021, 12:14 AMResults
page (link) it says:
In addition, you can specify aThis makes the serializer seem optional — is that correct? I’m not sure it’s sensible to try pickling a Spark DataFrame based on the distributed memory model?that transforms Python objects into bytes prior to being written to storage by aSerializer
. The sameResult
will be used to recover the object from bytes later.Serializer
Kevin Kho
Kevin Kho
Danny Vilela
12/16/2021, 12:35 AMResult
class enforces a serializer (i.e, Result.__init__
defaults self.serializer
to None
if not provided). Would you use a NoOpSerializer
just so that there technically “is” a serializer for the result? Is there some other prefect interface that does prefer a result having a serializer (even if it’s a no-op)?Kevin Kho
Danny Vilela
12/16/2021, 12:37 AMNoOpSerializer
🙂 Thanks again @Kevin Kho!Kevin Kho