Hi, I'm a Prefect noob, so please bear with me. I seem to be having a strange problem with decoding from byte string. I am using Prefect to perform a query from my database, which has text in RTF format (this is an old database!) in which the text is stored as latin-1 encoded byte strings. The query is performed using
pd.read_sql()
and so the text is then stored in a column in the pandas dataframe. I then decode the strings and use
striprtf
to convert to plain text. I have never had a problem performing this step in Jupyter notebooks or on multiple machines, but when I run this in Prefect, for a portion of the text I get a
UnicodeDecodeError
despite using
text.decode(encoding='latin-1', errors='replace')
. I've tried using
chardet
but have had no luck. Thanks in advance for the help.
✅ 1
Jinho Chung
10/18/2021, 1:28 AM
So no one wastes time - this is a problem with the conversion from rtf to text (
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.