Danny Vilela
12/15/2021, 11:31 PMHdfsResult class (a la `luigi.contrib.hdfs.target.HdfsTarget`; docs) but I’m stuck on the patterns the prefect codebase uses for the location attribute. It seems like the Result base class implements an interface that allows location at both initialization time and when calling exists(...) or read(...). I guess my question is: why? Is it not enough to restrict the user to only pass location at initialization time and use that value throughout the exists and read methods?
Edit: follow-up question: why does the prefect codebase follow the pattern of creating a new Result instance during Result.read(…)? As opposed to updating the current value (self.value) instance?Kevin Kho
self, it will not be thread safe and with multiprocessing like on a LocalDaskExecutor, you may find that some files are overwritten an others are not written out because they ran very close to each other.
Taking in a value gives your the flexibility to update it and be thread safeKevin Kho
Danny Vilela
12/16/2021, 12:11 AMDataFrame objects, I’d want to set new.value = spark.read.parquet(…). But from the other Result.read(…) examples, it seems like prefect actually wants us to serialize that data. Is that correct? If so, why?
I know there’s a PandasSerializer (docs), but I’m not sure of the equivalent SparkSerializer 🤔Danny Vilela
12/16/2021, 12:14 AMResults page (link) it says:
In addition, you can specify aThis makes the serializer seem optional — is that correct? I’m not sure it’s sensible to try pickling a Spark DataFrame based on the distributed memory model?that transforms Python objects into bytes prior to being written to storage by aSerializer. The sameResultwill be used to recover the object from bytes later.Serializer
Kevin Kho
Kevin Kho
Danny Vilela
12/16/2021, 12:35 AMResult class enforces a serializer (i.e, Result.__init__ defaults self.serializer to None if not provided). Would you use a NoOpSerializer just so that there technically “is” a serializer for the result? Is there some other prefect interface that does prefer a result having a serializer (even if it’s a no-op)?Kevin Kho
Danny Vilela
12/16/2021, 12:37 AMNoOpSerializer 🙂 Thanks again @Kevin Kho!Kevin Kho