Hi Prefect Experts, am facing an issue when invoki...
# prefect-community
a
Hi Prefect Experts, am facing an issue when invoking a script in
GITLab repo
in my prefect-server, can someone help me to figure-out what is the mistake in my code? the error
Failed to load and execute Flow's environment: GitlabGetError('404 Project Not Found')
Copy code
f.storage = GitLab( host="<https://hakko.sekai.dev>",
                    repo="sekai-backend/spark-runner", 
                    path="kikai/charts/kikai/DailySummariesJob/kikai_dailySummaries.py" ,
                    ref="dev-prefect")
URI to the script -
h*<ttps://hakko.sekai.dev/sekai-backend/spark-runner/blob/dev-prefect/kikai/charts/kikai/DailySummariesJob/kikai_dailySummaries.py>*
j
You likely need to provide an access token. If you're using prefect cloud you'd normally create a cloud secret, then specify the name via the
access_token_secret
kwarg to `GitLab`: https://docs.prefect.io/orchestration/flow_config/storage.html#gitlab
a
Hi @Jim Crist-Harif, let me try it out and get back to you, thanks
Hi @Jim Crist-Harif, now I get this error
Failed to load and execute Flow's environment: ValueError('Local Secret "XXXXXXXXX" was not found.')
, in the doc it says
Copy code
# name of personal access token secret
( but I passed the access token) where should I specify( configure) the
*name* of personal access token
?
j
You want to specify the name of the secret, not the value.
So you'd set a secret
GITLAB_ACCESS_TOKEN
somewhere (either in cloud, or as a local secret), then pass in
access_token_secret="GITLAB_ACCESS_TOKEN"
to
GitLab
storage.
a
Thank you @Jim Crist-Harif, its a
prefect-server
runs on kubernetes cluster , in that case , should i configure it on the agent? or is there any other place?
j
Configuring it on the agent would be the most straightforward way.
a
thanks @Jim Crist-Harif, I could get through the token issue, Later i faced this error
Failed to load and execute Flow's environment: ValueError('No flow found in file.')
so I tried to register the flow
command i ran -
prefect register flow -f flow.py -n flow_dailySummaries
j
Does the file specified at
h*<ttps://hakko.sekai.dev/sekai-backend/spark-runner/blob/dev-prefect/kikai/charts/kikai/DailySummariesJob/kikai_dailySummaries.py>*
contain a flow? Prefect will look for
prefect.Flow
objects in the file that match the name of the flow provided.
a
But it gave me this error -
TypeError: 'project_name' is a required field when registering a flow.
@Jim Crist-Harif Yes, the above snapshot is the exact file
j
Hmmm, that looks correct.
For registration, you do need to specify a project name (we should update that doc). You can do that via the
--project
flag in the CLI.
You don't need to specify the
--name
flag, that will be inferred automatically from the flow.
I see you specified a
ref
in the
GitLab
storage - is the file up to date on that branch?
a
@Jim Crist-Harif Yes, i've copied the file from that branch ( only that branch got flow script)
@Jim Crist-Harif its creating some pods with Error state and when looked into the logs it says
Copy code
No flow found in file.
<string>:16: UserWarning: Attempting to call `flow.register` during execution of flow file will lead to unexpected results.
Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 90, in flow_run
    raise exc
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 67, in flow_run
    flow = storage.get_flow(flow_data.name)
  File "/usr/local/lib/python3.7/site-packages/prefect/storage/gitlab.py", line 106, in get_flow
    file_contents=contents.decode(), flow_name=flow_name
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/storage.py", line 98, in extract_flow_from_file
    raise ValueError("No flow found in file.")
ValueError: No flow found in file.
j
That error will happen if it fails to find a flow with the specified name in the file. Is your flow named
flow_dailySummaries
as registered in cloud? Or does the file in cloud not match that name?
The name of the flow in the file needs to match the name of the flow that exists in cloud.
a
it matches...actually I copied it to local from the GitLab code
@Jim Crist-Harif
so its matching
j
And that's definitely the name of the flow that was registered with cloud? I see the code you sent has a commented-out block with a different name, which is why I'm asking.
The traceback you've sent has shown the flow was loaded properly, but it's not being found because the name doesn't match what it thinks it is. That's definitely the issue. I'm trying to step back and figure out why the names aren't matching now.
🙏 1
a
@Jim Crist-Harif- that commented code from the file where i copied from - I can assure the Flow name in "prefect-server" and local file used to register are same
j
Hmmm, ok. We'd need more information to debug further. In lieu of a reproducible example (I suspect this is something to do with your setup, as this code runs fine for other users), would you feel comfortable editing your local prefect install and rerunning to collect some logs? If you use a local agent, you should be able to edit the local prefect install and the flow runner will pick that up appropriately. What we'd want is a log of
contents
and
flow_name
from here: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/storage/gitlab.py#L105. Adding
Copy code
<http://self.logger.info|self.logger.info>("File contents: %s", contents.decode())
<http://self.logger.info|self.logger.info>("Flow name: %s", flow_name)
before that line should get the info we want.
👍 1
If that's not something you feel up for, then we'd need to find another way to debug further.
a
Thanks @Jim Crist-Harif - I can do that, will let you know the outcome
additionally, the flow-name has been registered as i executed the local flow.py with same flow name ( please see the flow name)
Hi @Jim Crist-Harif, am not quite sure whether I did what you expected , this is the code snippet where I added loggers in local file, let me know if this is different than expected
Copy code
import prefect
from prefect.tasks.shell import ShellTask
from prefect import task, Flow, Task
from prefect.run_configs import LocalRun


tasking = ShellTask(helper_script="ls -l",return_all=True,log_stdout=True,log_stderr=True)

#with Flow("flow_punctualityDailySummaries",schedule=schedule)
with Flow("flow_dailySummaries") as f:
    def run(self):
        <http://self.logger.info|self.logger.info>("File contents: %s", contents.decode())
        <http://self.logger.info|self.logger.info>("Flow name: %s", f)
    execute_command = tasking(command='./deploy-dev.sh ')

f.run_config = LocalRun()
f.register(project_name="project_dailySummaries")
j
Sorry, yeah, this is not what I was asking for.
I was asking if you could edit your prefect install itself, not your flow code.
I suspect that might be too tricky though, so we may need another method to help you debug.
a
well, prefect was installed via helm
j
That's your prefect server, I'm talking about your prefect (the python library) install.
a
@Jim Crist-Harif should i update ~/.prefect/config.toml ?
j
No, let me write up a test script that will do what we want.
a
Thanks @Jim Crist-Harif
j
Ok, I believe this should run (but haven't tested it, since I don't have a gitlab account).
Copy code
from urllib.parse import quote_plus

import prefect
from prefect.utilities.graphql import with_args
from prefect.serialization.storage import StorageSchema
from gitlab import Gitlab

# Put your flow id here. This should be a UUID:
FLOW_ID = "..."

GITLAB_HOST = "<https://hakko.sekai.dev>"

# Your gitlab access token goes here
GITLAB_ACCESS_TOKEN = "..."


# Attempt to load the flow
client = prefect.Client()
info = client.graphql(
    {"query": {with_args("flow", {"where": {"id": {"_eq": FLOW_ID}}}): {"name", "storage"}}}
)
flow_name = info.data.flow[0].name
flow_storage = info.data.flow[0].storage

print(f"Flow name: {flow_name}")
print(f"Flow storage: {flow_storage}")

# Attempt to load the flow from storage
storage = StorageSchema().load(flow_storage)
print(f"Storage type: {storage}")
try:
    storage.get_flow(flow_name)
except Exception as exc:
    print(f"Failed to load flow: {exc}")


gitlab = Gitlab(GITLAB_HOST, private_token=GITLAB_ACCESS_TOKEN)
print(f"Accessing repo {storage.repo}")
project = gitlab.projects.get(quote_plus(storage.repo))
flow_location = storage.flows[flow_name]
print(f"Loading flow from {flow_location} at ref {storage.ref}")
contents = project.files.get(file_path=flow_location, ref=storage.ref)
print(f"Flow contents: {contents.decode()}")
👀 1
There are 3 variables at the top that you'll want to edit: • `FLOW_ID`: the uuid for your flow. You can find this by navigating to the flow page, and expanding the details tab. •
GITLAB_HOST
: the address of your gitlab host. I believe I've entered this correctly from your code above. • `GITLAB_ACCESS_TOKEN`: the private access token for your gitlab account (the actual token value, not the name of the secret like before).
Running this should output some more info about what's going on.
a
Copy code
storage = StorageSchema().load(flow_storage)
print(f"Storage type: {storage}")
try:
    storage.get_flow(flow_name)
except Exception as exc:
    print(f"Failed to load flow: {exc}")
gitlab =  GitLab( host="<https://hakko.sekai.dev>",
                  repo="sekai-backend/spark-runner",
                  ref="dev-prefect",
                    access_token_secret=GITLAB_ACCESS_TOKEN)

print(f"Accessing repo {storage.repo}")
Gives an error
Storage type: <Storage: Local> Traceback (most recent call last): File "prefecTest.py", line 32, in <module> print(f"Accessing repo {storage.repo}") AttributeError: 'Local' object has no attribute 'repo'
@Jim Crist-Harif
while invoking this line -
Copy code
print(f"Accessing repo {storage.repo}")
j
That means the flow was registered with local storage, not
GitLab
storage. You can see that in the first line of the output (
<Storage: Local>
). So the flow you ran the script for either isn't the same one as the one you were executing before, or was re-registered with a different storage type.
Also note that you don't need to ping me by name with every comment, I'll see your replies either way.
a
sorry for tagging your name every time i ping you , I've run this in my local , is that what expected?
not sure whether i got you correcct
j
You should run the script locally, but fill in the
FLOW_ID
parameter with the flow id of the flow that was failing above. From the output you've given, it looks like the flow you ran this for is using local storage (not gitlab storage), which doesn't match your flow from before. So either that flow changed, or you have the wrong flow id.
I suspect you need to re-register that flow.
a
Thanks Jim been patience with me , if you see my GitLab flow, it is calling LocalRun
sorry, i might need to specify the storage as GitLab
j
That's the
run_config
. Your flow script there doesn't set
flow.storage
to
GitLab
though, so it will use
Local
storage by default.
👍 1
a
Hi Jim, i reran with some modifications , then I get this error -
Failed to load flow: Unable to import gitlab, please ensure you have installed the gitlab extra
Following is the whole output
Copy code
Flow name: KubeTestFlow
Flow storage: {'ref': 'dev-prefect', 'host': '<https://hakko.sekai.dev>', 'path': 'kikai/charts/kikai/DailySummariesJob/kikai_dailySummaries.py', 'repo': 'sekai-backend/spark-runner', 'type': 'GitLab', 'flows': {'KubeTestFlow': 'kikai/charts/kikai/DailySummariesJob/kikai_dailySummaries.py'}, 'secrets': [], '__version__': '0.14.7', 'access_token_secret': 'GITLAB_ACCESS_TOKEN'}
Storage type: <Storage: GitLab>
Failed to load flow: Unable to import gitlab, please ensure you have installed the gitlab extra
Accessing repo sekai-backend/spark-runner
Traceback (most recent call last):
  File "prefecTest.py", line 33, in <module>
    project = gitlab.projects.get(quote_plus(storage.repo))
AttributeError: 'GitLab' object has no attribute 'projects'
how can I install gitlab extra on the kubernetes pod?
`Probably this might be the root caused for
ValueError('No flow found in file.')
j
• you shouldn't be running the script I sent you in a k8s pod, this is for debugging locally • It looks like you modified the script I sent to switch
from gitlab import Gitlab
to
from prefect.storage import GitLab
. The original import was correct. You'll need to install the prefect gitlab extra, as the exception said.
pip install prefect[gitlab]
.
a
Thanks, I know
pip install prefect[gitlab]
would install gitlab locally, but how can I install it on kubernetes pod? I have installed prefect-server by using helm chart - do I have to re-install prefect-server again?
j
You'd only need
gitlab
in the pod running your actual flow. By default the
prefecthq/prefect
images already have that installed, and from the error messages you see in the pods, this is not your issue (the original error message you sent had no issues importing gitlab).
Can you install
gitlab
locally and run the script I sent you to get some more debug logs on the flow?
a
Thanks @Jim Crist-Harif, actually i've already installed gitlab extra when i saw the message, but after the installation also i get same message
Then I updated the script to refer
from gitlab import Gitlab
, but then again i faced bellow error
j
That means you don't have
gitlab
installed locally. The
from gitlab import Gitlab
import is the correct one.
Try
pip install python-gitlab
👀 1
a
seems a bad day for me
j
Why the sudo? You really don't want to be mixing the system python (which your OS uses) and the python your personal projects use. Mixing pip, the system python, and sudo is a recipe for packages that aren't available at the user level.
I suspect that
pip3
and
python3
are using different python installations.
a
Thanks @Jim Crist-Harif, i was able to figure-out it with your help!
j
Glad to hear it
a
Hi Jim, my company needs me to migrate Spark batch jobs to Prefect.io (schedule them via Prefect.io) , as you know I have managed to set up Prefect-server on our Kubernetes cluster. Those Spark batch jobs are already configured to deploy via Helm to Kubernetes cluster , what is the easiest way to schedule them via Prefect.io on Kubernetes cluster ?
j
Please open a new question in prefect-community, as this is unrelated to the thread above.
a
as of now , just executing the deploy-dev.sh on my minikube am able to deploy...
alright