Hello again community. I am attempting to test out...
# ask-community
a
Hello again community. I am attempting to test out the prefect-ray capability. I am able to successfully submit a job to my ray cluster using the following code:
Copy code
def train_model():
    # Define configurations.
    train_loop_config = {"num_epochs": 20, "lr": 0.01, "batch_size": 32}
    scaling_config = ScalingConfig(num_workers=num_workers, use_gpu=use_gpu)
    run_config = RunConfig(checkpoint_config=CheckpointConfig(num_to_keep=1))

    # Define datasets.
    train_dataset = ray.data.from_items(
        [{"input": [x], "label": [2 * x + 1]} for x in range(2000)]
    )
    datasets = {"train": train_dataset}

    # Initialize the Trainer.
    trainer = TorchTrainer(
        train_loop_per_worker=train_loop_per_worker,
        train_loop_config=train_loop_config,
        scaling_config=scaling_config,
        run_config=run_config,
        datasets=datasets
    )

    # Train the model.
    result = trainer.fit()

    # Inspect the results.
    final_loss = result.metrics["loss"]



@flow(task_runner=RayTaskRunner(address="<ray://raycluster-kuberay-head-svc.kuberay.svc.cluster.local:10001>", init_kwargs={"runtime_env": {"pip": ["prefect-ray", "torch", "torchvision", "boto3","botocore"]}},))
def training_pipeline():
    # equivalent to setting @ray.remote(num_cpus=4, num_gpus=2)
    with remote_options(num_cpus=4, num_gpus=1):
        train_model.submit()
The job runs and I can see it in the dashboard, however, issue is that the job is not actually running. It will literally say running forever until I kill it from the prefect dashboard. From the output part of the worker, I literally have two lines. I am running the latest version of prefect, prefect-ray, and ray!! My question is, should i not be using the latest versions of ray/prefect-ray? My guess is that python3.11 is just not tested!!