Overview
RAIC's Supercomputers provides direct access to high-performance bare metal GPU resources without any virtualisation overhead. With Supercomputers, you can dynamically provision a cluster composed of one or many GPU nodes, interconnected via InfiniBand with shared storage, delivering the computational power required for the most demanding AI/ML workloads.
Supercomputers give you full control over physical hardware such as GPUs in a private, isolated environment.
Key Benefits
- Direct Hardware Access: Raw GPU and CPU performance with zero virtualisation overhead.
- Scalable Node Architecture: Start with a single node and scale up to multi-node clusters, with individual and batch-level node management actions.
- High-Performance Interconnect: Optional GPU Direct networking via InfiniBand for low-latency, high-bandwidth inter-node communication
- Flexible Provisioning: Choose between on-demand and reserved capacity to match your budget and workload patterns.
- Private Cloud Security: Dedicated, isolated resources with no shared tenancy, ideal for compliance sensitive workloads.
- Full Customisation: Configure OS images, init scripts, SSH access, and persistent storage volumes to suit your exact requirements.
Ideal Use Cases
Supercomputers are ideal for organisations and users who require:
- A dedicated environment for compliance sensitive ML/AI workloads.
- Large-scale distributed model training across multiple GPU nodes.
- Full customization and control over their computing infrastructure.
- Elimination of noisy neighbor issues common in shared environments.
- The ability to handle large-scale, performance-critical tasks such as data analytics, scientific computations, and complex algorithm processing without compromise.
Key Concepts
- Supercomputer
A Supercomputer is a cluster composed of one or more bare metal GPU nodes provisioned under a single logical resource. All nodes within a Supercomputer share the same configuration (GPU type, OS image, SSH key) and can optionally be interconnected via InfiniBand for high-performance inter-node communication.
- Node
A node is an individual bare metal server within a Supercomputer. Each node provides dedicated access to physical GPUs, CPUs, memory, and local storage. Nodes can be managed independently or in batches using actions such as soft reboot, hard reboot, and reinstall.
- GPU Types
Supercomputers currently support the following GPU options:
| GPU Type | Description |
|---|---|
| NVIDIA H100SXM 80GB | NVIDIA H100 Tensor Core GPU (SXM form factor) — optimised for large-scale training and inference workloads. |
| NVIDIA H200SXM 141GB | NVIDIA H200 Tensor Core GPU (SXM form factor) — next-generation HBM3e memory for larger model support and higher throughput. |
Additional GPU types may be made available in future releases.
- Provisioning Models
-
On-Demand: Provision Supercomputer nodes instantly and pay for the duration of usage. Ideal for variable workloads, experimentation, and short-duration training runs.
-
Reserved: Commit to a defined term for lower per-hour pricing. Ideal for sustained workloads, production training pipelines, and predictable capacity planning.
- InfiniBand (GPU-Direct Networking)
InfiniBand is a high-bandwidth, low-latency networking fabric that enables GPU-Direct RDMA communication between nodes. When enabled, data moves directly between GPUs across nodes without passing through the CPU, dramatically accelerating distributed training workloads. InfiniBand configuration is optional and selected at the time of Supercomputer creation.
- Volume Mount
A persistent block storage volume (see Volumes Guide) can be optionally attached to your Supercomputer at creation time. You select a volume from your available volumes and specify a mount path to make it accessible across the cluster.
- Init Script
A custom initialisation script that runs automatically on each node during provisioning. Init scripts allow you to pre-install packages, configure environment variables, or perform any other setup tasks required before your workloads begin.
- SSH Key
An SSH key is required at the time of Supercomputer creation. This key is used to authenticate secure shell access to all nodes in the cluster. You must provide a valid public key during the creation workflow.