has anyone worked with preprocessing large videos ...
# prefect-community
a
has anyone worked with preprocessing large videos in a data pipeline?
j
Hi Aaron and welcome to Prefect. I don't have any experience here but hoping others in the community will be able to help you.
a
Hi Aaron, we do. By preprocessing you mean transcoding, splitting, basically any ffmpeg operation?
a
@Alexander Hirner thats correct. Right now, converting a 14gb mp4 to frames, into a webm, and uploading all that data to s3 is taking forever. Are there any tools with prefect that could make this process easier?
a
Usually ffmpeg is using all cores for transcoding. Does that step max out all your cores, is the mp4 file already on a bucket or distributed file system and does uploading max out your connection bandwidth?
a
it's in a google cloud storage, so the current process is to download it to my local, then run everything here, then upload it to s3
a
That sounds like a great basis for parallelizing. We start ffmpeg tasks directly on signed gcs https urls and hence save one roundtrip. If download is a bottleneck, the tricky part would be to parallelize transcoding and keep it in the cloud. Either you need to seek into chunks deterministically or pre-chunking to 15-60min files before upload. We do the latter to avoid any gaps and overlaps. If pre-chunked, each video I/O task could be a prefect task.