yellowdog_ray.raydog package
Module contents
Build a Ray cluster using YellowDog.
- class yellowdog_ray.raydog.builder.RayDogCluster(yd_application_key_id: str, yd_application_key_secret: str, cluster_name: str, cluster_namespace: str, head_node_compute_requirement_template_id: str, head_node_ray_start_script: str, yd_platform_api_url: str = 'https://api.yellowdog.ai', cluster_tag: str | None = None, head_node_images_id: str | None = None, head_node_userdata: str | None = None, head_node_instance_tags: dict[str, str] | None = None, head_node_metrics_enabled: bool | None = None, head_node_capture_taskoutput: bool = False, enable_observability: bool = False, observability_node_compute_requirement_template_id: str | None = None, observability_node_instance_tags: dict[str, str] | None = None, observability_node_images_id: str | None = None, observability_node_userdata: str | None = None, observability_node_metrics_enabled: bool | None = None, observability_node_start_script: str | None = None, observability_node_capture_taskoutput: bool = False, cluster_lifetime: timedelta | None = None)[source]
Bases:
object
A class representing a Ray cluster managed by YellowDog.
- __init__(yd_application_key_id: str, yd_application_key_secret: str, cluster_name: str, cluster_namespace: str, head_node_compute_requirement_template_id: str, head_node_ray_start_script: str, yd_platform_api_url: str = 'https://api.yellowdog.ai', cluster_tag: str | None = None, head_node_images_id: str | None = None, head_node_userdata: str | None = None, head_node_instance_tags: dict[str, str] | None = None, head_node_metrics_enabled: bool | None = None, head_node_capture_taskoutput: bool = False, enable_observability: bool = False, observability_node_compute_requirement_template_id: str | None = None, observability_node_instance_tags: dict[str, str] | None = None, observability_node_images_id: str | None = None, observability_node_userdata: str | None = None, observability_node_metrics_enabled: bool | None = None, observability_node_start_script: str | None = None, observability_node_capture_taskoutput: bool = False, cluster_lifetime: timedelta | None = None)[source]
Initialise the properties of the RayDog cluster and the Ray head node. Optionally set the properties of an observability node.
- Parameters:
yd_application_key_id – the key ID of the YellowDog application for connecting to the YellowDog platform.
yd_application_key_secret – the key secret of the YellowDog application.
cluster_name – a name for the cluster; the name must be unique to the YellowDog account and is used as the basis for the work requirement and worker pool names, and the worker tags.
cluster_namespace – the YellowDog namespace to use for the cluster.
head_node_compute_requirement_template_id – the YellowDog compute requirement template ID for the head node.
head_node_ray_start_script – the Bash script for starting the ray head node processes.
yd_platform_api_url – the URL of the YellowDog platform API.
cluster_tag – an optional tag to use for the YellowDog work requirement and worker pool(s).
head_node_images_id – the images ID to use for the head node (if required).
head_node_userdata – optional userdata for use when the head node instance is provisioned.
head_node_instance_tags – optional instance tags to use for the head node instance.
head_node_metrics_enabled – whether to enable metrics collection for the head node.
head_node_capture_taskoutput – whether to capture the console output of the head node task.
enable_observability – whether to enable observability node support
observability_node_compute_requirement_template_id – the compute requirement template to use for the observability node.
observability_node_instance_tags – optional instance tags to use for the observability node instance.
observability_node_images_id – the images ID to use for the observability node (if required).
observability_node_userdata – optional userdata for use when the observability node instance is provisioned.
observability_node_metrics_enabled – whether to enable metrics collection for the observability node.
observability_node_start_script – the Bash script for starting the observability node processes.
observability_node_capture_taskoutput – whether to capture the console output of the observability node task.
cluster_lifetime – an optional timeout that will shut down the Ray cluster if it expires.
- add_worker_pool(worker_node_compute_requirement_template_id: str, worker_node_task_script: str, worker_pool_node_count: int, worker_pool_internal_name: str | None = None, worker_node_images_id: str | None = None, worker_node_userdata: str | None = None, worker_node_instance_tags: dict[str, str] | None = None, worker_node_metrics_enabled: bool | None = None, worker_node_capture_taskoutput: bool = False) str | None [source]
Add a worker pool and task group that will provide Ray worker nodes.
- Parameters:
worker_node_compute_requirement_template_id – the YellowDog compute requirement template ID to use for the worker nodes in this worker pool.
worker_node_task_script – the Bash script for starting the ray worker nodes in this worker pool.
worker_pool_node_count – the number of ray worker nodes to create in this worker pool. Must be > 0.
worker_pool_internal_name – an optional internal name that can be used to look up the worker_node_worker_pool_object. Must be unique to the cluster.
worker_node_images_id – the images ID to use with the compute requirement template, if required.
worker_node_userdata – optional userdata for use when the worker node instances are provisioned.
worker_node_instance_tags – optional instance tags to apply to the worker node instances.
worker_node_metrics_enabled – whether to enable metrics collection for the worker nodes.
worker_node_capture_taskoutput – whether to capture the console output of the worker node tasks.
- Returns:
returns the worker pool ID if a worker pool was created, or None if the pool will be created later using the build() method.
- build(head_node_build_timeout: ~datetime.timedelta | None = None) -> (<class 'str'>, str | None)[source]
Build the cluster. This method will block until the Ray head node is ready, and optionally also the observability node.
Note that Ray worker nodes will still be in the process of configuring and joining the cluster after this method returns.
- Parameters:
head_node_build_timeout – an optional timeout for building the head node; if the timeout expires before the head node task is executing, a TimeoutError exception will be raised.
- Returns:
a tuple containing the private IP address of the head node, and the public IP address of the head node (or None).
- remove_worker_pool(worker_pool_id: str)[source]
Terminate the compute requirement associated with a worker pool.
- Parameters:
worker_pool_id – the ID of the worker pool to remove.
- remove_worker_pool_by_internal_name(internal_name: str)[source]
Remove a worker pool by its internal name. Raises exception if worker pool not found.
- Parameters:
internal_name – the internal name of the worker pool to remove.
- shut_down()[source]
Shut down the Ray cluster by cancelling the work requirement, including aborting all its tasks, and shutting down all remaining worker pools.
- property worker_pool_ids: list[str]
Generate the current list of worker pool IDs.
- property worker_pool_internal_names: list[str]
Generate the current list of worker pool internal names.
- class yellowdog_ray.raydog.builder.RayDogClusterProxy(yd_application_key_id: str, yd_application_key_secret: str, yd_platform_api_url: str = 'https://api.yellowdog.ai')[source]
Bases:
object
A proxy for a RayDog cluster, allowing saved cluster state to be imported, and the cluster to be shut down.
- __init__(yd_application_key_id: str, yd_application_key_secret: str, yd_platform_api_url: str = 'https://api.yellowdog.ai')[source]
Class representing a proxy of a RayDog cluster to allow cluster shutdown based on minimal cluster state.
- Parameters:
yd_application_key_id – the key ID of the YellowDog application for connecting to the YellowDog platform.
yd_application_key_secret – the key secret of the YellowDog application.
yd_platform_api_url – the URL of the YellowDog platform API.
- load_saved_state_from_json(cluster_state: str)[source]
Load the cluster state from a JSON string.
- Parameters:
cluster_state – the state of a RayDog cluster as a JSON string