https://prefect.io logo
Title
l

Lucas Brum

08/25/2022, 5:11 PM
I'm in need of an easy way to login to google account in prefect, but I didn't want to use Selenium. Can I use Playwright in prefect?
r

Ryan Peden

08/25/2022, 6:49 PM
Hi Lucas, in your previous message it looked like you were trying to sign in to Gmail and get the text of individual emails. Instead of scripting a web browser using Selenium or Playwright, might it be easier to use the Python client for the Gmail API to accomplish your goals? I only ask because using Selenium or Playwright to extract data from Gmail's HTML will be tricky since it is a very complex JavaScript SPA. Having said that, Playwright should run inside a Prefect task as long as you don't try to return any non-serializable Playwright objects like Pages.
l

Lucas Brum

08/25/2022, 6:58 PM
I need to log into my google account, in order to get a token that the ADS API forces me to manually get once a week. (Sorry for my english, this is not my native language.)
r

Ryan Peden

08/25/2022, 7:48 PM
Your English is excellent, Lucas. I'd also look into using a refresh token to obtain an updated token for the Ads API, but if that is not an option, then Playwright might help. I can't speak to the specifics of how to use Playwright to get your token, but I can confirm that it runs happily in Prefect tasks, as the following example demonstrates:
import asyncio

from prefect import flow, task, get_run_logger
from playwright.async_api import async_playwright

@task
async def get_page_title(url: str):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url)
        page_text = await page.inner_text("title")
        return page_text

@flow(name="Page downloader")
async def main():
    logger = get_run_logger()
    text_one = await get_page_title('<http://whatsmyuseragent.org/>')
    <http://logger.info|logger.info>(f"Page title: {text_one}")
    text_two = await get_page_title('<https://www.noaa.gov>')
    <http://logger.info|logger.info>(f"Page title: {text_two}")
    
if __name__ == '__main__':
    asyncio.run(main())
l

Lucas Brum

08/25/2022, 8:17 PM
I tried running it locally with "browser = p.chromium.launch(headless=False)" and it worked perfectly, but in prefect it forces me to put this "browser = p.chromium.launch(headless=True)", but with this True it does not find the html elements
Error during execution of task: Error("\n╔════════════════════════════════════════════════════════════════════════════════════════════════╗\n║ Looks like you launched a headed browser without having a XServer running. ║\n║ Set either 'headless: true' or use 'xvfb-run <your-playwright-app>' before running Playwright. ║\n║ ║\n║ <3 Playwright Team ║\n╚════════════════════════════════════════════════════════════════════════════════════════════════╝\n=========================== logs ===========================\n<launching> /root/.cache/ms-playwright/chromium-1019/chrome-linux/chrome --disable-field-trial-config --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-back-forward-cache --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --no-sandbox --user-data-dir=/tmp/playwright_chromiumdev_profile-cnysrE --remote-debugging-pipe --no-startup-window\n<launched> pid=3636\n[pid=3636][err] [3636:3636:0825/200216.120549:ERROR:<http://ozone_platform_x11.cc|ozone_platform_x11.cc>(240)] Missing X server or $DISPLAY\n[pid=3636][err] [3636:3636:0825/200216.122184:ERROR:<http://env.cc|env.cc>(255)] The platform failed to initialize. Exiting.\n============================================================")
If i use true receve this:
State Message:
Error during execution of task: TimeoutError('Timeout 30000ms exceeded.\n=========================== logs ===========================\nwaiting for selector "input[name=\'identifier\']"\n============================================================')
And if I run with false on my machine it finds. Is that because it doesn't render doom?? Have other metode to acesse in this case? Or make this in prefect is impossible?
r

Ryan Peden

08/25/2022, 8:35 PM
It might be because Google thinks you are a bot if using a headless browser. The DOM should still be there, but Google might not be rendering the HTML you are expecting. Are you running your Prefect tasks in a Docker container?
l

Lucas Brum

08/29/2022, 4:57 PM
Yes