Hello - We have a NodeJS script that we wish to e...
# prefect-community
t
Hello - We have a NodeJS script that we wish to execute as a task, the entire code (and all the NodeJS) dependencies are located on a docker image on ECR, we’re trying out an experimental alternative to running a Kubernetes namespaced job, which would be - to run the image itself as the image of the entire flow (and have the command shell to run the NodeJS script as one of the tasks of the flow) the docker image uses a docker base image of
node:12-alpine
(which doesn’t seem to have
pip
and possibly not
python
either) i tried to add:
Copy code
RUN apk update
RUN apk add py-pip
RUN pip install prefect[github,aws,kubernetes,snowflake]
first steps finished fine, but the prefect installation seems to be taking forever (over 25 minutes already) and also has a lot of weird warning messages like:
Copy code
Collecting snowflake-connector-python>=1.8.2                                                                                                                     
  Downloading snowflake_connector_python-1.8.7-py2.py3-none-any.whl (168 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.6-py2.py3-none-any.whl (161 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.5-py2.py3-none-any.whl (159 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.4-py2.py3-none-any.whl (161 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.3-py2.py3-none-any.whl (158 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.2-py2.py3-none-any.whl (157 kB)                                                                                     
INFO: pip is looking at multiple versions of six to determine which version is compatible with other requirements. This could take a while.
or :
Copy code
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this 
run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: <https://pip.pypa.io/surveys/backtracking>
any ideas?
1
k
Hey Tom, this is because the dependency resolver of
pip
is so aggressive in the newer versions. You can use an older version like we pin to in CI/CD here or you can have a stricter
requirements.txt
for your project
😅 1
t
what would i put in such a
requirements.txt
? i don’t have any dependencies of my own, its all due to prefect
k
You can pin the versions of the guys like
snowflake-connector-python
so that they don’t search all the versions
You can also try
Copy code
pip install ... --use-deprecated=legacy-resolver
t
hmm i see 🤔 seems really weird to me, it’s approaching 40 minutes for the process i still feel something about it is very wrong, not sure just it being new fully explains this
k
This is explained a lot more here.
t
is what i’m generally trying to do even making sense? the main reason i opted to try to not utilise the
RunNamespacedJob
option is because it’s easy to interact with the input and output of the process when it runs “locally” to the flow (and the idea of an agent that runs a job that runs a job makes our devops raise their eyebrow)
k
Yeah but it’s inevitable that tasks in the task library will have conflicting requirements. This is why the task library in Prefect 2.0 is decentralized into multiple PyPI packages
t
hmm 🤔 ok, not sure what you mean by multiple PyPI packages but i guess i will learn that when we check out 2.0
i was asking in general if it makes sense to use a “local” (to the flow) image to run a NodeJS script, otherwise i will basically have to add code to the script that pulls the input from somewhere and pushes it to some destination (and i am reluctant to do that)
k
Like you need to install:
Copy code
pip install prefect-aws
pip install prefect-github
etc.
t
aha, i see
k
Yes I think the structure makes sense to get all the dependencies in there
t
Copy code
Step 6/18 : RUN pip3 install numpy
 ---> Running in e1296f0a6382
Collecting numpy
  Downloading numpy-1.23.0.tar.gz (10.7 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Building wheels for collected packages: numpy
  Building wheel for numpy (PEP 517): started
  Building wheel for numpy (PEP 517): still running...
  Building wheel for numpy (PEP 517): still running...
[SYSTEM] 
 Message             Failed to build image: honeybook/ds-lead-enrichment:2022                                           
 Caused by           Cannot read property 'toString' of undefined                                                       
 Documentation Link  <https://codefresh.io/docs/docs/codefresh-yaml/steps>                                                
 Name                TypeError
i don’t even understand how this is possible 😞
oh nevermind, looks like an unrelated issue with our CI/CD platform
ok, it worked! the only issue now is that i can’t get the shell task to stream its output anywhere? any ideas? this is how i configured it:
Copy code
task = ShellTask(helper_script="cd /usr/src/app",
                 log_stderr=True,
                 return_all=True,
                 stream_output="DEBUG",
                 log_stdout=True)
but all we get is:
Copy code
Jul 6, 2022 @ 12:21:54.332	[2022-07-06 09:21:54+0000] INFO - prefect.CloudTaskRunner | Task 'ShellTask': Finished task run for task with final state: 'Failed'

Jul 6, 2022 @ 12:21:54.173	[2022-07-06 09:21:54+0000] ERROR - prefect.ShellTask | Command failed with exit code 1

Jul 6, 2022 @ 12:21:54.173	[2022-07-06 09:21:54+0000] INFO - prefect.CloudTaskRunner | FAIL signal raised: FAIL('Command failed with exit code 1')

Jul 6, 2022 @ 12:21:46.440	[2022-07-06 09:21:46+0000] INFO - prefect.CloudTaskRunner | Task 'ShellTask': Starting task run...
@Kevin Kho
a
can you see the error logs in ~/.npm/_logs? perhaps you can find the latest file and read the logs this way? for ShellTask all we can help with is what's described in this docstring, there is nothing else to configure Tom. If this still doesn't report your shell task logs it's either a bug or the Python subprocess can't get those logs https://github.com/PrefectHQ/prefect/blob/f3c8b8150897997372339a4690a2de055fd40717/src/prefect/tasks/shell.py#L12
btw for the future @Tom Klein it would be great if you could move all your code blocks to the thread, the main message should only state the problem. Thanks
t
@Anna Geller oh sure, sorry about that!
The shell task seems to return the output properly, but we expected it to also stream it to stdout, so that our own logging system can pick it up
👍 1
That doesn't seem to happen 🤔
@Anna Geller i see these lines in the code:
Copy code
if self.stream_output:
                        self.logger.log(level=self.stream_output, msg=line)
do i need to somehow provide my own logger from outside the task?
or do all prefect tasks use the default logger by default?
a
they use Prefect logger which is initialized when the task gets initialized
t
this logger writes to stdout i presume? just trying to figure out what could be going on.. its a simple
npm run
and when i run it myself on EC2 i always get output streamed out just fine… 🤔
a
if nothing else works, check and perhaps even manually read the logs from your npm home dir and log them manually in a separate subsequent task
t
yea, the problem isn’t just printing the logs (the shell task does return it, so we can theoretically just print when it finishes), the problem is these are very long-running tasks (hours possibly) and we need to be able to monitor them live - even before they finish - in some cases
a
gotcha, perhaps you can modify it on the node application? not much we can do about it on the ShellTask cc @nicholas if you have some idea on how to get real-time log updates from node JS application triggered via Prefect 1.0 ShellTask
👀 1
n
I’ve used the Naked Python CLI for running node scripts before but I think it might have the same problem where stdout is collected after the function has run. I think to make this work you’ll need to get clever with capturing stdout in the node application somewhere that your flow can access or break down the node application into more debuggable chunks
🙌 1
🙏 1
t
but why does it work when i run it myself on EC2 (or on my own PC)? is it because it’s being run as a subprocess? the node script itself constantly pushes out stuff to
stdout
via a popular logging module (
winston
), but essentially this could have been replaced with a
console.out
and it would have made no difference
wait, i think i misunderstood ---- the logging level provided to the
stream_output
argument (of the ShellTask) determines the log level in the UPPER logger that we wish all of these lines to be “reported as” -- for some reason i got confused and thought i need to set it to
debug
so that it captures ALL log levels from the UNDERLYING log output from the subprocess perhaps this is the explanation (since i guess the default prefect logger has a level above
debug
?)
yep! that was the issue 😄
k
I was just reading this thread and I had no idea. Glad you figured it out!
🙏 1
t
ok and just one more contiuation question: is it possible for it to be streamed to the
stdout
but NOT captured by the prefect logger?
(we have a logging system that captures all stdout from all our k8s jobs anywhere and records them)
k
You can do something like this or turn off sending logs to Cloud?
t
oh wait, i had
log_stdout=True
turned on, maybe that’s the issue hmm, nope, still writes to Prefect logger… isn’t there a way to like provide the
ShellTask
with a custom logger (like the one you gave in your example, only that writes nothing instead of some things) ? basically i want it to write to stdout of the subprocess (as it does anyway), and to that be forwarded to stdout (of the general flow process), but without appearing in the prefect logs (im not sure if what im asking makes sense)
k
That, I think you’d need to make your own shell task to accept it. Or you could to a subflow that doesn’t send logs to Prefect Cloud
t
how would a different subflow help? i still need the stuff to be on the (top-level) process’s stdout so they can be picked by our external log collector what im trying to say is it seems there’s a 1:1 between “sending data to stdout from a flow” and “writing to the prefect logger” - in that - one cannot happen without the other right now? am i understanding it correctly? ideally, there would be a distinction between things i want to appear in the prefect logger (e.g. high-level stuff related to tasks being carried out) - while still allowing some inner subprocesses to write to
stdout
so that they can be picked by an external log-capture mechanism
k
On a subflow, you could do:
Copy code
export PREFECT__CLOUD__SEND_FLOW_RUN_LOGS=false
and you won’t get any logs. I think you can also try
return_all=False
?
t
based on the ShellTask code:
Copy code
for raw_line in iter(sub_process.stdout.readline, b""):
                    line = raw_line.decode("utf-8").rstrip()

                    if self.return_all:
                        lines.append(line)

                    if self.stream_output:
                        self.logger.log(level=self.stream_output, msg=line)
it seems like
return_all
only determines what the final task returns as a result, not whether it’s being streadmed “upwards” or not, and it seems like the only way to make sure it gets to
stdout
is through the logger — i just wish there was a way to make sure it ends up in stdout but without appearing in the Prefect UI, that’s all
the Q is - with
PREFECT__CLOUD__SEND_FLOW_RUN_LOGS
will it still end up in the top-level process’s
stdout
or not
k
I think it will end up in the stdout. It’s just the
CloudHandler
that is turned off
t
i see 🤔 worth a try, thanks