has anyone worked with preprocessing large videos in a data pipeline?
06/12/2020, 6:08 PM
Hi Aaron and welcome to Prefect. I don't have any experience here but hoping others in the community will be able to help you.
06/14/2020, 8:40 AM
Hi Aaron, we do. By preprocessing you mean transcoding, splitting, basically any ffmpeg operation?
06/14/2020, 6:01 PM
@Alexander Hirner thats correct. Right now, converting a 14gb mp4 to frames, into a webm, and uploading all that data to s3 is taking forever. Are there any tools with prefect that could make this process easier?
06/14/2020, 8:09 PM
Usually ffmpeg is using all cores for transcoding. Does that step max out all your cores, is the mp4 file already on a bucket or distributed file system and does uploading max out your connection bandwidth?
06/14/2020, 8:56 PM
it's in a google cloud storage, so the current process is to download it to my local, then run everything here, then upload it to s3
06/15/2020, 9:57 AM
That sounds like a great basis for parallelizing. We start ffmpeg tasks directly on signed gcs https urls and hence save one roundtrip.
If download is a bottleneck, the tricky part would be to parallelize transcoding and keep it in the cloud. Either you need to seek into chunks deterministically or pre-chunking to 15-60min files before upload. We do the latter to avoid any gaps and overlaps.
If pre-chunked, each video I/O task could be a prefect task.