I'm excited that targets have been added for file-...
# prefect-community
p
I'm excited that targets have been added for file-based caching, but I am having trouble with a very simple example. I have a task that merely copies a file from one place to another. I want that task to only run if the file doesn't exist at the destination. The code below does not work.
Copy code
Traceback (most recent call last):
  File "C:\Users\pblankenau\Anaconda3\envs\gis\lib\site-packages\prefect\engine\runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "C:\Users\pblankenau\Anaconda3\envs\gis\lib\site-packages\prefect\engine\task_runner.py", line 661, in check_target
    new_res = result.read(target.format(**prefect.context))
  File "C:\Users\pblankenau\Anaconda3\envs\gis\lib\site-packages\prefect\engine\results\local_result.py", line 79, in read
    new.value = cloudpickle.loads(f.read())
EOFError: Ran out of input
Copy code
class Copy(Task):
    def run(self):
        src = Path(r"D:\test\dumb.txt")
        dst = Path(r"D:\test\copy_in_here\dumb.txt")
        shutil.copyfile(src, dst)
        return dst

with Flow("cp") as flow:
    dst = Path(r"D:\test\copy_in_here\dumb.txt")
    result = Copy(target=str(dst), checkpoint=True, result=LocalResult())()

flow.run()
z
Hi @Philip Blankenau! Do you mind sharing which version of Prefect you're running? I'm having some difficulty recreating the issue.
p
0.11.5. Maybe the wrong version?
z
Very odd-- I can't recreate on a Mac, even using 0.11.5. We generally see EOFErrors when there's an issue with an improper path or empty file. It may sound silly, but do the paths you're supplying look correct for your system? If so, happy to open an issue so we can triage the issue.
p
I just deleted the dumb.txt file from copy_in_here\. After that the flow ran fine once. The second run produced a new error:
Copy code
Traceback (most recent call last):
  File "C:\Users\pblankenau\Anaconda3\envs\gis\lib\site-packages\prefect\engine\runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "C:\Users\pblankenau\Anaconda3\envs\gis\lib\site-packages\prefect\engine\task_runner.py", line 661, in check_target
    new_res = result.read(target.format(**prefect.context))
  File "C:\Users\pblankenau\Anaconda3\envs\gis\lib\site-packages\prefect\engine\results\local_result.py", line 79, in read
    new.value = cloudpickle.loads(f.read())
_pickle.UnpicklingError: could not find MARK
I read somewhere in the docs that if the target exists already than instead of running the task, the data are are returned. So prefect must be reading the file. Perhaps that's where this is going wrong?
As an aside, if that's what prefect does do targets only work with text data? What if I have a png or jpeg as a target? How would it return that data?
z
You read correctly! That's the intended behavior here. And I'm honestly not immediately sure about JPEGs and the like-- let me ask the team!
Okay, back with an answer about JPEGs. While the intended use case for targets is internal caching, you could theoretically handle this. To do so, you'd need to implement a serializer for writing the JPEG, then set that for the result class. As for not being able to find the file, do you mind trying one more thing? Could you specify your target using a template like in this example? Want to make sure targets in general are working for you, given that this appears to be a path issue. https://docs.prefect.io/core/idioms/targets.html#using-result-targets-for-efficient-caching
p
Ah, I see. I read it was emulating some of the functionality of Luigi and I thought you could cache based on any file created by your script. You're saying that can be done with custom result classes? I tried running the flow with the target template and it ran just fine!
z
If you wanted to do something with the JPEG you created after the run, you'd likely want to write the JPEG as part of the task itself. But if you want it for caching purposes, the hurdle is making sure that JPEG is serializable/deserializable. That's why you'd need to implement a serializer for the JPEG, then set that for your result. Does that make sense?
p
Yep, makes sense. In my actual prefect flow I am writing to disk as part of the tasks and returning filepaths from tasks to pass to subsequent tasks.
z
Solid! And to close the loop on this, I would expect the target to respect an arbitrary path specified. Do you mind opening an issue so we can take a closer look at this?
p
Sure!
After writing up the issue (and not submitting) it seems to me that prefect is behaving as expected. The intent of the targets is to cache python data structures right? Prefect is supposed to do the actual writing (serialization) of the data. In the example I provided I have a file that was not serialized by prefect so prefect doesn't know how to deserialize it. Does that sound right?
z
Ah, I may have misunderstood: I thought Prefect was failing to find the file entirely. If it's finding it but failing with serialization, you're absolutely right.
p
👍 Thanks for you help!
z
Absolutely! Any time. 👍