Hello We have a NodeJS script that we wish to execute as a t Prefect Community #ask-community

Hello - We have a NodeJS script that we wish to e...

Tom Klein

07/05/2022, 2:27 PM

Hello - We have a NodeJS script that we wish to execute as a task, the entire code (and all the NodeJS) dependencies are located on a docker image on ECR, we’re trying out an experimental alternative to running a Kubernetes namespaced job, which would be - to run the image itself as the image of the entire flow (and have the command shell to run the NodeJS script as one of the tasks of the flow) the docker image uses a docker base image of

node:12-alpine

(which doesn’t seem to have

pip

and possibly not

python

either) i tried to add:

Copy code

RUN apk update
RUN apk add py-pip
RUN pip install prefect[github,aws,kubernetes,snowflake]

first steps finished fine, but the prefect installation seems to be taking forever (over 25 minutes already) and also has a lot of weird warning messages like:

Copy code

Collecting snowflake-connector-python>=1.8.2                                                                                                                     
  Downloading snowflake_connector_python-1.8.7-py2.py3-none-any.whl (168 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.6-py2.py3-none-any.whl (161 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.5-py2.py3-none-any.whl (159 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.4-py2.py3-none-any.whl (161 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.3-py2.py3-none-any.whl (158 kB)                                                                                     
  Downloading snowflake_connector_python-1.8.2-py2.py3-none-any.whl (157 kB)                                                                                     
INFO: pip is looking at multiple versions of six to determine which version is compatible with other requirements. This could take a while.

or :

Copy code

INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this 
run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: <https://pip.pypa.io/surveys/backtracking>

any ideas?

✅ 1

Kevin Kho

07/05/2022, 2:32 PM

Hey Tom, this is because the dependency resolver of

pip

is so aggressive in the newer versions. You can use an older version like we pin to in CI/CD here or you can have a stricter

requirements.txt

for your project

😅 1

Tom Klein

07/05/2022, 2:36 PM

what would i put in such a

requirements.txt

? i don’t have any dependencies of my own, its all due to prefect

Kevin Kho

07/05/2022, 2:38 PM

You can pin the versions of the guys like

snowflake-connector-python

so that they don’t search all the versions

Kevin Kho

07/05/2022, 2:39 PM

You can also try

Copy code

pip install ... --use-deprecated=legacy-resolver

Tom Klein

07/05/2022, 2:40 PM

hmm i see 🤔 seems really weird to me, it’s approaching 40 minutes for the process i still feel something about it is very wrong, not sure just it being new fully explains this

Kevin Kho

07/05/2022, 2:42 PM

This is explained a lot more here.

Tom Klein

07/05/2022, 2:42 PM

is what i’m generally trying to do even making sense? the main reason i opted to try to not utilise the

RunNamespacedJob

option is because it’s easy to interact with the input and output of the process when it runs “locally” to the flow (and the idea of an agent that runs a job that runs a job makes our devops raise their eyebrow)

Kevin Kho

07/05/2022, 2:50 PM

Yeah but it’s inevitable that tasks in the task library will have conflicting requirements. This is why the task library in Prefect 2.0 is decentralized into multiple PyPI packages

Tom Klein

07/05/2022, 2:52 PM

hmm 🤔 ok, not sure what you mean by multiple PyPI packages but i guess i will learn that when we check out 2.0

Tom Klein

07/05/2022, 2:53 PM

i was asking in general if it makes sense to use a “local” (to the flow) image to run a NodeJS script, otherwise i will basically have to add code to the script that pulls the input from somewhere and pushes it to some destination (and i am reluctant to do that)

Kevin Kho

07/05/2022, 2:53 PM

Like you need to install:

Copy code

pip install prefect-aws
pip install prefect-github

etc.

Tom Klein

07/05/2022, 2:53 PM

aha, i see

Kevin Kho

07/05/2022, 2:54 PM

Yes I think the structure makes sense to get all the dependencies in there

Tom Klein

07/05/2022, 2:55 PM

Copy code

Step 6/18 : RUN pip3 install numpy
 ---> Running in e1296f0a6382
Collecting numpy
  Downloading numpy-1.23.0.tar.gz (10.7 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Building wheels for collected packages: numpy
  Building wheel for numpy (PEP 517): started
  Building wheel for numpy (PEP 517): still running...
  Building wheel for numpy (PEP 517): still running...
[SYSTEM] 
 Message             Failed to build image: honeybook/ds-lead-enrichment:2022                                           
 Caused by           Cannot read property 'toString' of undefined                                                       
 Documentation Link  <https://codefresh.io/docs/docs/codefresh-yaml/steps>                                                
 Name                TypeError

i don’t even understand how this is possible 😞

Tom Klein

07/05/2022, 2:56 PM

oh nevermind, looks like an unrelated issue with our CI/CD platform

Tom Klein

07/06/2022, 9:29 AM

ok, it worked! the only issue now is that i can’t get the shell task to stream its output anywhere? any ideas? this is how i configured it:

Copy code

task = ShellTask(helper_script="cd /usr/src/app",
                 log_stderr=True,
                 return_all=True,
                 stream_output="DEBUG",
                 log_stdout=True)

but all we get is:

Copy code

Jul 6, 2022 @ 12:21:54.332	[2022-07-06 09:21:54+0000] INFO - prefect.CloudTaskRunner | Task 'ShellTask': Finished task run for task with final state: 'Failed'

Jul 6, 2022 @ 12:21:54.173	[2022-07-06 09:21:54+0000] ERROR - prefect.ShellTask | Command failed with exit code 1

Jul 6, 2022 @ 12:21:54.173	[2022-07-06 09:21:54+0000] INFO - prefect.CloudTaskRunner | FAIL signal raised: FAIL('Command failed with exit code 1')

Jul 6, 2022 @ 12:21:46.440	[2022-07-06 09:21:46+0000] INFO - prefect.CloudTaskRunner | Task 'ShellTask': Starting task run...

@Kevin Kho

Anna Geller

07/06/2022, 11:15 AM

can you see the error logs in ~/.npm/_logs? perhaps you can find the latest file and read the logs this way? for ShellTask all we can help with is what's described in this docstring, there is nothing else to configure Tom. If this still doesn't report your shell task logs it's either a bug or the Python subprocess can't get those logs https://github.com/PrefectHQ/prefect/blob/f3c8b8150897997372339a4690a2de055fd40717/src/prefect/tasks/shell.py#L12

Anna Geller

07/06/2022, 11:16 AM

btw for the future @Tom Klein it would be great if you could move all your code blocks to the thread, the main message should only state the problem. Thanks

Tom Klein

07/06/2022, 11:21 AM

@Anna Geller oh sure, sorry about that!

Tom Klein

07/06/2022, 11:23 AM

The shell task seems to return the output properly, but we expected it to also stream it to stdout, so that our own logging system can pick it up

👍 1

Tom Klein

07/06/2022, 11:23 AM

That doesn't seem to happen 🤔

Tom Klein

07/06/2022, 12:07 PM

@Anna Geller i see these lines in the code:

Copy code

if self.stream_output:
                        self.logger.log(level=self.stream_output, msg=line)

Tom Klein

07/06/2022, 12:08 PM

do i need to somehow provide my own logger from outside the task?

Tom Klein

07/06/2022, 12:08 PM

or do all prefect tasks use the default logger by default?

Anna Geller

07/06/2022, 12:15 PM

they use Prefect logger which is initialized when the task gets initialized

Tom Klein

07/06/2022, 12:35 PM

this logger writes to stdout i presume? just trying to figure out what could be going on.. its a simple

npm run

and when i run it myself on EC2 i always get output streamed out just fine… 🤔

Anna Geller

07/06/2022, 12:43 PM

if nothing else works, check and perhaps even manually read the logs from your npm home dir and log them manually in a separate subsequent task

Tom Klein

07/06/2022, 12:45 PM

yea, the problem isn’t just printing the logs (the shell task does return it, so we can theoretically just print when it finishes), the problem is these are very long-running tasks (hours possibly) and we need to be able to monitor them live - even before they finish - in some cases

Anna Geller

07/06/2022, 1:55 PM

gotcha, perhaps you can modify it on the node application? not much we can do about it on the ShellTask cc @nicholas if you have some idea on how to get real-time log updates from node JS application triggered via Prefect 1.0 ShellTask

👀 1

nicholas

07/06/2022, 2:11 PM

I’ve used the Naked Python CLI for running node scripts before but I think it might have the same problem where stdout is collected after the function has run. I think to make this work you’ll need to get clever with capturing stdout in the node application somewhere that your flow can access or break down the node application into more debuggable chunks

🙌 1

🙏 1

Tom Klein

07/06/2022, 2:30 PM

but why does it work when i run it myself on EC2 (or on my own PC)? is it because it’s being run as a subprocess? the node script itself constantly pushes out stuff to

stdout

via a popular logging module (

winston

), but essentially this could have been replaced with a

console.out

and it would have made no difference

Tom Klein

07/06/2022, 2:36 PM

wait, i think i misunderstood ---- the logging level provided to the

stream_output

argument (of the ShellTask) determines the log level in the UPPER logger that we wish all of these lines to be “reported as” -- for some reason i got confused and thought i need to set it to

debug

so that it captures ALL log levels from the UNDERLYING log output from the subprocess perhaps this is the explanation (since i guess the default prefect logger has a level above

debug

Tom Klein

07/06/2022, 2:40 PM

yep! that was the issue 😄

Kevin Kho

07/06/2022, 2:41 PM

I was just reading this thread and I had no idea. Glad you figured it out!

🙏 1

Tom Klein

07/06/2022, 2:44 PM

ok and just one more contiuation question: is it possible for it to be streamed to the

stdout

but NOT captured by the prefect logger?

Tom Klein

07/06/2022, 2:45 PM

(we have a logging system that captures all stdout from all our k8s jobs anywhere and records them)

Kevin Kho

07/06/2022, 2:46 PM

You can do something like this or turn off sending logs to Cloud?

Tom Klein

07/06/2022, 2:47 PM

oh wait, i had

log_stdout=True

turned on, maybe that’s the issue hmm, nope, still writes to Prefect logger… isn’t there a way to like provide the

ShellTask

with a custom logger (like the one you gave in your example, only that writes nothing instead of some things) ? basically i want it to write to stdout of the subprocess (as it does anyway), and to that be forwarded to stdout (of the general flow process), but without appearing in the prefect logs (im not sure if what im asking makes sense)

Kevin Kho

07/06/2022, 3:02 PM

That, I think you’d need to make your own shell task to accept it. Or you could to a subflow that doesn’t send logs to Prefect Cloud

Tom Klein

07/06/2022, 3:03 PM

how would a different subflow help? i still need the stuff to be on the (top-level) process’s stdout so they can be picked by our external log collector what im trying to say is it seems there’s a 1:1 between “sending data to stdout from a flow” and “writing to the prefect logger” - in that - one cannot happen without the other right now? am i understanding it correctly? ideally, there would be a distinction between things i want to appear in the prefect logger (e.g. high-level stuff related to tasks being carried out) - while still allowing some inner subprocesses to write to

stdout

so that they can be picked by an external log-capture mechanism

Kevin Kho

07/06/2022, 3:06 PM

On a subflow, you could do:

Copy code

export PREFECT__CLOUD__SEND_FLOW_RUN_LOGS=false

and you won’t get any logs. I think you can also try

return_all=False

Tom Klein

07/06/2022, 3:08 PM

based on the ShellTask code:

Copy code

for raw_line in iter(sub_process.stdout.readline, b""):
                    line = raw_line.decode("utf-8").rstrip()

                    if self.return_all:
                        lines.append(line)

                    if self.stream_output:
                        self.logger.log(level=self.stream_output, msg=line)

it seems like

return_all

only determines what the final task returns as a result, not whether it’s being streadmed “upwards” or not, and it seems like the only way to make sure it gets to

stdout

is through the logger — i just wish there was a way to make sure it ends up in stdout but without appearing in the Prefect UI, that’s all

Tom Klein

07/06/2022, 3:09 PM

the Q is - with

PREFECT__CLOUD__SEND_FLOW_RUN_LOGS

will it still end up in the top-level process’s

stdout

or not

Kevin Kho

07/06/2022, 3:10 PM

I think it will end up in the stdout. It’s just the

CloudHandler

that is turned off

Tom Klein

07/06/2022, 3:10 PM

i see 🤔 worth a try, thanks

20 Views

Open in Slack

Previous Next