What's the best way to pass an object containing c...
# prefect-community
m
What's the best way to pass an object containing credentials? I'm making an ETL to get stuff out of Google Sheets, and I'm using the
gspread
package, which has you do everything from method calls to an authentication object. So like you go
gc = gspread.service_account(filename=<filename>)
and point it at a special credentials file, then everything is through that. Should I just pass it to Secrets? Is there any risk of sensitive credentials being cached somewhere if I define the
gc
object within the Flow itself?
n
Hi @matta! Secret tasks are exactly what you want to use for this; they're explicitly configured to never store their contents anywhere.
j
Hi @matta - I know a few folks who would love to see a Google Sheets task in the task library; if you come up with something you can share, please do!
upvote 1
m
Sweet, thanks, messing with that right now - if I find something reliable I'll hit you up! Might need some guidance for a proper Pull Request though.
@nicholas Thanks, took me a sec but I made a little version that works.
Copy code
class AuthenticateGsheets(SecretBase):
    def __init__(self, credentials_filename: Union[str, pathlib.Path], **kwargs: Any):
        self.credentials_filename = credentials_filename
        super().__init__(**kwargs)

    def run(self) -> gspread.client.Client:
        return gspread.service_account(filename=self.credentials_filename)
👍 1
🚀 1
j
Of course! We're happy to help you get it over the finish line
m
Just made a first draft. Not sure what to do next to get it into ship shape. https://github.com/mattalhonte/prefect/tree/gsheets/src/prefect/tasks/gsheets
Also not quite sure how to write tests for this, since it needs to hit Google Sheets.
j
Awesome! This is great
m
So is there a way to "wrap" all an object's methods with Prefect tasks? This has a bunch of little ones: https://gspread.readthedocs.io/en/latest/user-guide.html
Or maybe I should make a sort of "macro"? Something like
Copy code
@task
def gsheet_macro(worsksheet: gspread.models.Worksheet, fn: Callable):
    return Callable(worksheed)

gsheet_macro(gc, lambda x: x.find("Dough"))
@nicholas @Jeremiah
j
Hey @matta, sorry for the delay! That pattern might work, if the
worksheet
object is Pickleable — but often, connection clients aren’t
Instead, you might have to create / instantiate the object in each task, then call the appropriate method. You could use a helper function to replace that boilerplate code inside the task, but I’d suggest not passing the gsheet client to the task directly.
m
Right on, thanks! So something like
Copy code
@task
def gsheet_macro(
    credentials_filename: Union[str, pathlib.Path] = None,
    sheet_key: str = None,
    worksheet_name: str = None,
    fn: Callable,
):
    client = AuthenticateGsheets(credentials_filename).run()
    google_sheet = client.open_by_key(sheet_key)
    worksheet = google_sheet.worksheet(worksheet_name)
    return Callable(worksheet)
Is it okay if I'm explicitly calling
.run()
inside another task? Does that mess with Prefect in some way?
j
Generally speaking, it should be fine! But I think setup might not be quite what you want - I’d remove the
@task
decorator at the top and instead make this just a function that returns tasks
actually never mind
I think for this pattern you’ll want a factory function which itself returns a
@task
decorated function. In your current setup, where
gsheet_macro
is decorated,
fn
is going to be an input to your task (which seems unecessary). In addition, does
AuthenticateGsheets
still need to be a task? maybe it can just be a helper function now?
Copy code
def gsheet_helper(fn):
    @task
    def inner(credentials, sheet_key, name):
        client = ...
        return fn(...)
    return inner
^ something like that
But again, that’s just to cut down on the boilerplate of needing to instantiate and authenticate a client each time just to call one of its methods, there are many many ways you could approach this
m
AuthenticateGsheets
is a task right now to take advantage of subclassing
SecretBase
as the "Prefectonic" way of handling authenticator objects. Not married to it if there's a better way!
j
I see — mainly that’s to help people ensure their credentials don’t accidentally get serialized as the results of a task, so I think there won’t be any issue with converting it to a function. HOWEVER, if you want to leave it a task, calling
.run()
should be totally fine!
m
Ah, okay, I wasn't sure what got cached by Prefect
I think I'll leave it around for usage with the
WriteGsheetRow
and `ReadGsheeRow`` tasks
Cool, thanks!
So once I've got that (with Docstrings and everything), what should I do to get it ready for the Pull Request? Not sure how to write Tests for it since it'd need to be hitting a Google Sheet.
j
Great! You can write tests for the non-google sheets parts. Have you used mocks to write tests for third party APIs before? If not it’s ok, you can open a PR with tests for what you can
In fact @Laura Lorenz (she/her) is working out a plan for the new
contrib/
folder, and this would be a great candidate - once it has full tests (from you, Prefect, or anyone in the community) it can graduate to the core!
m
I haven't, actually. And cool!
Excited 🙂
maybe I'll give a try, though! Is there a library you'd recommend for mocking tests?
l
Hi @matta, as Jeremiah said you can skip the tests and PR it into
/src/prefect/contrib/tasks/{optionally your package name}/google_sheets.py
(or whatever you called your module) — and if you want to go for tests you can look at https://github.com/PrefectHQ/prefect/blob/master/tests/tasks/gcp/test_gcs_upload_download.py for some inspiration, the tests in the
TestsBuckets
and
TestBlob
classes show how to use unittest’s MagicMock and pytest.monkeypatch so that your tests don’t actually have to communicate with the google sheets API!
m
Awesome, thanks!
I'll try to clean it up a little bit and submit it tomorrow!
😄 2
Just submitted the PR!
😍 1
🚀 2
n
thanks @matta!