Deploy Llama 3.2 Model on Azure using AI Studio

Author: Daniel Fang
Published: September 21, 2024

7 minutes to read

The landscape of artificial intelligence is rapidly evolving, with large language models (LLMs) like OpenAI o1, Llama 3 and Mistral pushing the boundaries of what’s possible in natural language processing (NLP). Azure AI Services now supports a variety of open-source LLMs, providing developers with the tools to deploy and scale AI applications seamlessly.

In this blog post, we’ll walk you through deploying the Llama 3.2 model on Azure using Azure AI Studio. The steps outlined here can also be adapted for other supported open-source LLMs, allowing you to leverage the power of these models within Azure’s robust infrastructure.

Exploring Model Catalog

Azure AI Studio provides a comprehensive Model Catalog where you can browse and select models for deployment. Note that each Azure region has different sets of supported models in the Model Catalog.

To start, create a new Azure AI Studio and a new project in the East US region. In your project, go to the Model catalog tab under Get started. You’ll see a variety of open-source models. Click the All filters option, the Filter by panel will display a wide range of Collection like Meta, Mistral, NVIDIA, Hugging Face.

Model Catalog

Then, select Meta in the filter, you will see about 44 models, including Llama-3.2 and Llama-2.

Meta Models

For this tutorial, we’ll choose Llama-3.2-1B. The details of Llama-3.2-1B is shown in the newly opened page with a description of the model.

Llama Model Info

Deploy Llama Model

Now, let’s dive into deploying the Meta Llama model on Azure. On the model’s Overview page, click the Deploy button.

The Deploy model popup will appear. Choose virtual machine size and provide a unique name for your deployment Endpoint name. Click Deploy to start the deployment process.

Llama Model Deployment

The deployment may take some time, depending on the model size and resources. Navigate to the Deployments tab in your project to monitor the progress.

Llama Model Deploying

Once the deployment is completed, the Provisioning state will change to Succeeded. We can retrieve the endpoint details of the model in Details tab.

Llama Model Deployed

Test it out

With the model deployed, let’s test it to ensure it’s working correctly. We’ll use below Python script to send a request to the deployed model. Save the following code in a file named test.py.

import urllib.request
import json
import os
import ssl

question = "the weather is fine today."
print('Question:', question)

data = {
  "input_data": [question]
}

body = str.encode(json.dumps(data))

url = '<YOUR_ENDPOINT_URL>'
api_key = '<YOUR_API_KEY>'
if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")

headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib.request.Request(url, body, headers)
response = urllib.request.urlopen(req)

answer = response.read()
print('Answer:', answer)

Remember to update the placeholders:

Replace YOUR_ENDPOINT_URL with the actual endpoint URL from your deployment details.
Replace YOUR_API_KEY with your deployment’s API key.

Go to terminal and run:

python test.py

Here’s an example of the response you might receive. This indicates that the model is generating a continuation of the input text, showcasing its language generation capabilities.

PS D:\blog> python test.py
Question: the weather is fine today.
Answer: b'["the weather is fine today. I don\'t mind if it gets a little wet. A real rain would be"]'

Endpoint Details

The components of your deployment endpoint are in the Consume tab.

Rest Endpoint: The URL where your model is hosted. This is used to send requests to the model.
API Key: A security token required to authenticate requests to the endpoint.

Note: the code in the Consumption option might not 100% suitable to the LLM model, you might need to revise the payload structure.

Llama Model Endpoint

The swagger endpoint of the model looks like below.

Llama Model Swagger

Test & Playground

Azure AI Studio offers a Test & Playground feature to interact with your model directly from the browser.

Llama Model Test

You can also access the model in the Chat Playground.

Llama Model Test

Note: Certain models might not work directly in the Playground due to payload structure. If you get a 400 error, try calling API directly with correct payload for the model.

Monitoring and Logs

Monitoring tab provides stats for performance.

Llama Model Monitoring

Use the log tab to trace the logs.

Llama Model Log

Conclusion

Congratulations! You’ve successfully deployed the Llama 3.2 model on Azure using Azure AI Studio. This setup allows you to harness the power of advanced language models within the scalable and secure environment of Azure. Whether you’re developing conversational AI, content generation tools, or other NLP applications, Azure AI Services provides a unified platform to deploy and manage your AI solutions efficiently.