Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How to deploy language models with Red Hat OpenShift AI

September 10, 2025
Alicia Cao
Related topics:
Artificial intelligence
Related products:
Red Hat AIRed Hat OpenShift AI

Share:

    Red Hat OpenShift AI provides a comprehensive platform for managing the entire data science lifecycle, from data collection to model deployment. In this guide, we will walk through the console and go through an example of deploying a Llama language model using OpenShift AI's easy-to-navigate interface and powerful infrastructure capabilities including GPU acceleration, automatic resource scaling, and distributed computing support. 

    Watch a full video demo here:

    Getting started with the OpenShift AI console

    The OpenShift AI console is your central hub for managing data science projects. The side tabs on the homepage, shown in Figure 1, give you access to:

    • Data science projects: All your current projects/namespaces within the cluster.
    • Models: All of your current model deployments within the cluster.
    • Applications: Integrated tools for data science.
    • Resources: Documentation and learning tutorials to get you started.
    • Settings: Direct access to configuration options.
    View of the OpenShift AI homepage.
    Figure 1: The OpenShift AI console.

    Looking within your project dashboard

    From the OpenShift AI console, navigate to Data science projects on the side tab to access a list of all of your projects within your cluster. Once you click on a project, you transition from the cluster-wide view to a project-specific dashboard that provides a focused view of resources within a single namespace. This project scoped view is shown in Figure 2. 

    Your project overview showing the status of your workbenches and models.
    Figure 2: The OpenShift AI project dashboard.

    This project-scoped console allows you to manage:

    • Active workbenches: Contained environments for working with models, pipelines, and storage.
    • Model deployments: Status tracking for successful and failed deployments.
    • Storage connections: Integration with persistent storage including OpenShift Data Foundation (ODF) for block, file, and object storage, as well as cloud storage like Amazon Web Services (AWS) S3 buckets.
    • External connections: For example, connections to images on different registries as well as credentials for a database.

    Deploying a Llama model: step-by-step

    Let's dive into the deployment process, starting with the essential GPU setup.

    Prerequisites: GPU node setup

    There are several model serving technologies that OpenShift AI supports, including TGIS, Hugging Face TGI, and NVIDIA NIM. For this demo, we chose to use vLLM because we are deploying a Llama model. vLLM has partnered with Meta to offer support for our specific model. 

    To start deploying with vLLM, make sure that you have GPU resources available and running. There are several ways you can deploy GPUs on your OpenShift cluster, each with different advantages depending on your infrastructure and requirements.

    One commonly used approach is the NVIDIA GPU Operator, which provides automated driver management and simplified GPU resource discovery across your cluster. However, in this example we will use MachineSets to provision GPU-enabled nodes, which gives us direct control over the underlying compute instances and allows us to integrate GPU provisioning seamlessly with OpenShift's native cluster scaling and node lifecycle management capabilities.

    1. Access the OpenShift console: Navigate to Compute → MachineSets.
    2. Provision GPU node: Adjust the desired count for your GPU MachineSet.
    3. Wait for readiness: Allow approximately 20 minutes for the complete setup.
    4. Verify status: Check the Nodes tab and filter for GPU workers by clicking Roles. In green you can see the specific node seems to be ready (Figure 3).
    View of your nodes within your OpenShift cluster.
    Figure 3: The Nodes tab in the OpenShift AI console.
    1. Confirm drivers: Click into the node name and select Pods on the top tab. You will see the screen in Figure 4. Search for driver in the pods to ensure the driver daemon set shows Ready and Running. 
    View of all the pods within a specific node in your cluster.
    Figure 4: Viewing the Pods tab for a particular node.

    Establishing model connections

    Now that we have confirmed our GPU instance is ready and running, we can now create our connection to a ModelCar container image. A ModelCar image is an OCI-compliant container that packages a machine learning model with its runtime environment and dependencies for consistent deployment across different platforms. 

    Going back to your project view in OpenShift AI, select connections on the top tabs and create a new connection by pressing the button outlined in red in Figure 5. Once pressed, select the URI connection type you want. OpenShift AI supports three connection types for accessing model images:

    • OCI-compliant registry: For proprietary images requiring authentication.
    • S3 compatible object storage: For cloud storage solutions.
    • URI: For publicly available resources (we will use this for our demo).
    View when creating an external connection to your project using a URI connection type.
    Figure 5: Creating an external connection using a URI connection type.

    For our Llama model demonstration, we're using a publicly available container image from the Quay.io image registry. We will be using the Llama 3.2 language model with 3 billion parameters, fine-tuned for following instructions, using 8-bit floating-point precision for reduced memory usage. To create this connection to your project, just input this for the URI, as shown in Figure 5.

    oci://quay.io/jharmison/models:redhatai--llama-3_2-3b-instruct-fp8-modelcar

    Model deployment configuration

    Now we can deploy the Llama model! Navigate to your specific project that you want to deploy your model in. You can either click the Deploy model button in the overview section or go into the Models tab at the top of your project dashboard. After you click Deploy model, you should see something like Figure 6.

    Top section of model deployment creation screen.
    Figure 6: Configuring and deploying the model.

    The deployment form contains several configuration sections, shown in Figures 6 and 7. Fill out the initial fields as follows:

    1. Access models tab: Click Deploy model to begin configuration.
    2. Name your deployment: Choose a descriptive name for easy identification.
    3. Select serving runtime: Choose VLLM NVIDIA GPU ServingRuntime for KServe.
    4. Deployment mode: Select Standard for automatic route and ingress setup.
    5. Server size: Choose appropriate resources. Here, we selected Medium.
    6. Accelerator: Specify the GPU you provisioned earlier.
    7. Model route and token authentication: Check both boxes to enable external route access and require token authentication.
    8. Connection: Select the established connection that we just created.
    9. Click Deploy.
    Continuation of the model deployment setup screen after scrolling down.
    Figure 7: Configuring and deploying the model.

    Testing your deployment

    Before diving into external access, let's first confirm functionality through internal testing.

    Internal testing

    Once deployed, verify functionality directly within the OpenShift console. Navigate to Pods, as shown in Figure 8. 

    View of all of your pods under your project name in OpenShift.
    Figure 8: Locating your Llama model server in the Pods tab.

    Select your project by using the other drop-down menu outlined in red and locate your currently running Llama model server:

    1. Navigate to Workloads > Pods on the left side tabs.
    2. Locate your running Llama model server. You should see it when you filter by your project when clicking the top drop-down menu, as shown in Figure 8.
    3. Access the pod terminal.
    4. Execute a curl command to test internal communication.

    The vLLM runtime uses OpenAI's API format, making integration straightforward. Learn more in the OpenAI documentation. The following is an example command that we used to test within the demo: 

    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "messages": [
          {"role": "system", "content": "You are a helpful assistant"},
          {"role": "user", "content": "Hello"},
          {"role": "assistant", "content": "Hello! How can I help you?"},
          {"role": "user", "content": "What is 2 plus 2?"}
        ]
      }'

    If your command output is successful, it should output something like Figure 9.

    Terminal output within one of the pods in your project, confirming a successful curl request.
    Figure 9: Successful curl request output in the pod terminal.

    Testing external access

    For external testing, use the token and external endpoint in your curl command. Going back to your model deployments within your project view in OpenShift AI, you can get this by selecting the drop-down button (Figure 10) and then the internal and external endpoint details for each of those respectively:

    1. Copy the authentication token from the deployment dashboard.
    2. Create an environment variable with your token within a terminal outside of the pod.
    3. Modify your curl command to use the external endpoint with proper authentication headers. 
    View of the Models tab within your project view of OpenShift AI showing a deployed model.
    Figure 10: Accessing external endpoint details from the Models tab.

    Web interface integration

    For a more user-friendly experience, integrate with OpenWebUI as follows:

    1. Create a YAML configuration file with your external endpoint and token.
    2. Use Helm to install Open WebUI in your OpenShift environment.
    3. Access the clean web interface instead of manual curl commands. See Figure 11.
    Screenshot of Open WebUI chat interface displaying an example user interaction.
    Figure 11: An example user interaction in the Open WebUI chat interface.

    Key benefits and takeaways

    Red Hat OpenShift AI simplifies the entire process of deploying and managing language models by providing:

    • Integrated infrastructure: GPU provisioning and management handled with OpenShift.
    • Flexible connectivity: Multiple options for accessing model images and data sources.
    • Security built-in: Token-based authentication and network isolation.
    • Scalable architecture: Easy adjustment of resources based on demand.

    This demo showcases just one of the many features available in OpenShift AI. The platform's comprehensive approach to the data science lifecycle makes it a beneficial tool for organizations looking to deploy AI solutions at scale while maintaining security and operational efficiency.  

    Related Posts

    • How to install single node OpenShift on bare metal

    • Red Hat OpenShift AI installation and setup

    • Red Hat OpenShift AI and machine learning operations

    • How to fine-tune Llama 3.1 with Ray on OpenShift AI

    • How AMD GPUs accelerate model training and tuning with OpenShift AI

    • Build and deploy a ModelCar container in OpenShift AI

    Recent Posts

    • Run Qwen3-Next on vLLM with Red Hat AI: A step-by-step guide

    • How to implement observability with Python and Llama Stack

    • Deploy a lightweight AI model with AI Inference Server containerization

    • vLLM Semantic Router: Improving efficiency in AI reasoning

    • Declaratively assigning DNS records to virtual machines

    What’s up next?

    Open source AI for developers introduces and covers key features of Red Hat OpenShift AI, including Jupyter Notebooks, PyTorch, and enhanced monitoring and observability tools, along with MLOps and continuous integration/continuous deployment (CI/CD) workflows.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue