DeepSeek-R1 mannequin now accessible in Amazon Bedrock Market and Amazon SageMaker JumpStart

Immediately, we’re saying that DeepSeek AI’s first-generation frontier mannequin, DeepSeek-R1, is offered by means of Amazon SageMaker JumpStart and Amazon Bedrock Market to deploy for inference. Now you can use DeepSeek-R1 to construct, experiment, and responsibly scale your generative AI concepts on AWS.

On this publish, we display how one can get began with DeepSeek-R1 on Amazon Bedrock and SageMaker JumpStart.

Overview of DeepSeek-R1

DeepSeek-R1 is a big language mannequin (LLM) developed by DeepSeek-AI that makes use of reinforcement studying to reinforce reasoning capabilities by means of a multi-stage coaching course of from a DeepSeek-V3-Base basis. A key distinguishing characteristic is its reinforcement studying (RL) step, which was used to refine the mannequin’s responses past the usual pre-training and fine-tuning course of. By incorporating RL, DeepSeek-R1 can adapt extra successfully to person suggestions and aims, in the end enhancing each relevance and readability. As well as, DeepSeek-R1 employs a chain-of-thought (CoT) method, which means it’s geared up to interrupt down advanced queries and motive by means of them in a step-by-step method. This guided reasoning course of permits the mannequin to supply extra correct, clear, and detailed solutions. This mannequin combines RL-based fine-tuning with CoT capabilities, aiming to generate structured responses whereas specializing in interpretability and person interplay. With its wide-ranging capabilities DeepSeek-R1 has captured the trade’s consideration as a flexible text-generation mannequin that may be built-in into numerous workflows equivalent to brokers, logical reasoning and information interpretation duties

DeepSeek-R1 makes use of a Combination of Specialists (MoE) structure and is 671 billion parameters in dimension. The MoE structure permits activation of 37 billion parameters, enabling environment friendly inference by routing queries to probably the most related skilled “clusters.” This method permits the mannequin to focus on totally different drawback domains whereas sustaining general effectivity. DeepSeek-R1 requires not less than 800 GB of HBM reminiscence in FP8 format for inference. On this publish, we are going to use an ml.p5e.48xlarge occasion to deploy the mannequin. ml.p5e.48xlarge comes with 8 Nvidia H200 GPUs offering 1128 GB of GPU reminiscence.

You possibly can deploy DeepSeek-R1 mannequin both by means of SageMaker JumpStart or Bedrock Market. As a result of DeepSeek-R1 is an rising mannequin, we advocate deploying this mannequin with guardrails in place. On this weblog, we are going to use Amazon Bedrock Guardrails to introduce safeguards, forestall dangerous content material, and consider fashions towards key security standards. On the time of scripting this weblog, for DeepSeek-R1 deployments on SageMaker JumpStart and Bedrock Market, Bedrock Guardrails helps solely the ApplyGuardrail API. You possibly can create a number of guardrails tailor-made to totally different use instances and apply them to the DeepSeek-R1 mannequin, enhancing person experiences and standardizing security controls throughout your generative AI functions.

Conditions

To deploy the DeepSeek-R1 mannequin, you want entry to an ml.p5e occasion. To test in case you have quotas for P5e, open the Service Quotas console and beneath AWS Providers, select Amazon SageMaker, and make sure you’re utilizing ml.p5e.48xlarge for endpoint utilization. Just be sure you have not less than one ml.P5e.48xlarge occasion within the AWS Area you might be deploying. To request a restrict improve, create a restrict improve request and attain out to your account staff.

As a result of you may be deploying this mannequin with Amazon Bedrock Guardrails, be sure you have the right AWS Identification and Entry Administration (IAM) permissions to make use of Amazon Bedrock Guardrails. For directions, see Arrange permissions to make use of guardrails for content material filtering.

Implementing guardrails with the ApplyGuardrail API

Amazon Bedrock Guardrails lets you introduce safeguards, forestall dangerous content material, and consider fashions towards key security standards. You possibly can implement security measures for the DeepSeek-R1 mannequin utilizing the Amazon Bedrock ApplyGuardrail API. This lets you apply guardrails to guage person inputs and mannequin responses deployed on Amazon Bedrock Market and SageMaker JumpStart. You possibly can create a guardrail utilizing the Amazon Bedrock console or the API. For the instance code to create the guardrail, see the GitHub repo.

The overall circulation entails the next steps: First, the system receives an enter for the mannequin. This enter is then processed by means of the ApplyGuardrail API. If the enter passes the guardrail test, it’s despatched to the mannequin for inference. After receiving the mannequin’s output, one other guardrail test is utilized. If the output passes this last test, it’s returned as the ultimate outcome. Nevertheless, if both the enter or output is intervened by the guardrail, a message is returned indicating the character of the intervention and whether or not it occurred on the enter or output stage. The examples showcased within the following sections display inference utilizing this API.

Deploy DeepSeek-R1 in Amazon Bedrock Market

Amazon Bedrock Market provides you entry to over 100 fashionable, rising, and specialised basis fashions (FMs) by means of Amazon Bedrock. To entry DeepSeek-R1 in Amazon Bedrock, full the next steps:

On the Amazon Bedrock console, select Mannequin catalog beneath Basis fashions within the navigation pane.
On the time of scripting this publish, you should utilize the InvokeModel API to invoke the mannequin. It doesn’t assist Converse APIs and different Amazon Bedrock tooling.
Filter for DeepSeek as a supplier and select the DeepSeek-R1 mannequin.

The mannequin element web page gives important details about the mannequin’s capabilities, pricing construction, and implementation pointers. You will discover detailed utilization directions, together with pattern API calls and code snippets for integration. The mannequin helps numerous textual content era duties, together with content material creation, code era, and query answering, utilizing its reinforcement studying optimization and CoT reasoning capabilities.
The web page additionally consists of deployment choices and licensing data that will help you get began with DeepSeek-R1 in your functions.
To start utilizing DeepSeek-R1, select Deploy.

You may be prompted to configure the deployment particulars for DeepSeek-R1. The mannequin ID will likely be pre-populated.
For Endpoint title, enter an endpoint title (between 1–50 alphanumeric characters).
For Variety of cases, enter numerous cases (between 1–100).
For Occasion kind, select your occasion kind. For optimum efficiency with DeepSeek-R1, a GPU-based occasion kind like ml.p5e.48xlarge is beneficial.
Optionally, you’ll be able to configure superior safety and infrastructure settings, together with digital non-public cloud (VPC) networking, service function permissions, and encryption settings. For many use instances, the default settings will work nicely. Nevertheless, for manufacturing deployments, you would possibly wish to evaluate these settings to align together with your group’s safety and compliance necessities.
Select Deploy to start utilizing the mannequin.

When the deployment is full, you’ll be able to take a look at DeepSeek-R1’s capabilities straight within the Amazon Bedrock playground.
Select Open in playground to entry an interactive interface the place you’ll be able to experiment with totally different prompts and alter mannequin parameters like temperature and most size.
When utilizing R1 with Bedrock’s InvokeModel and Playground Console, use DeepSeek’s chat template for optimum outcomes. For instance, <｜start▁of▁sentence｜><｜Person｜>content material for inference<｜Assistant｜> .

This is a wonderful solution to discover the mannequin’s reasoning and textual content era skills earlier than integrating it into your functions. The playground gives rapid suggestions, serving to you perceive how the mannequin responds to numerous inputs and letting you fine-tune your prompts for optimum outcomes.

You possibly can rapidly take a look at the mannequin within the playground by means of the UI. Nevertheless, to invoke the deployed mannequin programmatically with any Amazon Bedrock APIs, you must get the endpoint ARN.

Run inference utilizing guardrails with the deployed DeepSeek-R1 endpoint

The next code instance demonstrates how one can carry out inference utilizing a deployed DeepSeek-R1 mannequin by means of Amazon Bedrock utilizing the invoke_model and ApplyGuardrail API. You possibly can create a guardrail utilizing the Amazon Bedrock console or the API. For the instance code to create the guardrail, see the GitHub repo. After you could have created the guardrail, use the next code to implement guardrails. The script initializes the bedrock_runtime shopper, configures inference parameters, and sends a request to generate textual content primarily based on a person immediate.

import boto3
import json

# Initialize Bedrock shopper
bedrock_runtime = boto3.shopper("bedrock-runtime")

# Configuration
MODEL_ID = "your-model-id"  # Bedrock mannequin ID
GUARDRAIL_ID = "your-guardrail-id"
GUARDRAIL_VERSION = "your-guardrail-version"

def invoke_with_guardrails(immediate, max_tokens=1000, temperature=0.6, top_p=0.9):
    """
    Invoke Bedrock mannequin with enter and output guardrails
    """
    # Apply enter guardrails
    input_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        supply="INPUT",
        content material=[{"text": {"text": prompt}}]
    )
    
    if input_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Enter blocked: {input_guardrail['outputs'][0]['text']}"

    # Put together mannequin enter
    request_body = {
        "inputs": f"""You're an AI assistant. Do because the person asks.
### Instruction: {immediate}
### Response: <suppose>""",
        "parameters": {
            "max_new_tokens": max_tokens,
            "top_p": top_p,
            "temperature": temperature
        }
    }

    # Invoke mannequin
    response = bedrock_runtime.invoke_model(
        modelId=MODEL_ID,
        physique=json.dumps(request_body)
    )
    
    # Parse mannequin response
    model_output = json.hundreds(response['body'].learn())['generated_text']

    # Apply output guardrails
    output_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        supply="OUTPUT",
        content material=[{"text": {"text": model_output}}]
    )

    if output_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Output blocked: {output_guardrail['outputs'][0]['text']}"
    
    return model_output

# Instance utilization
if __name__ == "__main__":
    immediate = "What's 1+1?"
    outcome = invoke_with_guardrails(immediate)
    print(outcome)

Deploy DeepSeek-R1 with SageMaker JumpStart

SageMaker JumpStart is a machine studying (ML) hub with FMs, built-in algorithms, and prebuilt ML options which you can deploy with only a few clicks. With SageMaker JumpStart, you’ll be able to customise pre-trained fashions to your use case, together with your information, and deploy them into manufacturing utilizing both the UI or SDK.

Deploying DeepSeek-R1 mannequin by means of SageMaker JumpStart provides two handy approaches: utilizing the intuitive SageMaker JumpStart UI or implementing programmatically by means of the SageMaker Python SDK. Let’s discover each strategies that will help you select the method that most closely fits your wants.

Deploy DeepSeek-R1 by means of SageMaker JumpStart UI

Full the next steps to deploy DeepSeek-R1 utilizing SageMaker JumpStart:

On the SageMaker console, select Studio within the navigation pane.
First-time customers will likely be prompted to create a site.
On the SageMaker Studio console, select JumpStart within the navigation pane.

The mannequin browser shows accessible fashions, with particulars just like the supplier title and mannequin capabilities.
Seek for DeepSeek-R1 to view the DeepSeek-R1 mannequin card.
Every mannequin card reveals key data, together with:
- Mannequin title
- Supplier title
- Job class (for instance, Textual content Technology)
- Bedrock Prepared badge (if relevant), indicating that this mannequin may be registered with Amazon Bedrock, permitting you to make use of Amazon Bedrock APIs to invoke the mannequin
Select the mannequin card to view the mannequin particulars web page.

The mannequin particulars web page consists of the next data:
- The mannequin title and supplier data
- Deploy button to deploy the mannequin
- About and Notebooks tabs with detailed data
The About tab consists of necessary particulars, equivalent to:
- Mannequin description
- License data
- Technical specs
- Utilization pointers
Earlier than you deploy the mannequin, it’s beneficial to evaluate the mannequin particulars and license phrases to substantiate compatibility together with your use case.
Select Deploy to proceed with deployment.
For Endpoint title, use the routinely generated title or create a customized one.
For Occasion kind¸ select an occasion kind (default: ml.p5e.48xlarge).
For Preliminary occasion depend, enter the variety of cases (default: 1).
Deciding on acceptable occasion varieties and counts is essential for value and efficiency optimization. Monitor your deployment to regulate these settings as wanted.Below Inference kind, Actual-time inference is chosen by default. That is optimized for sustained visitors and low latency.
Overview all configurations for accuracy. For this mannequin, we strongly advocate adhering to SageMaker JumpStart default settings and ensuring that community isolation stays in place.
Select Deploy to deploy the mannequin.

The deployment course of can take a number of minutes to finish.

When deployment is full, your endpoint standing will change to InService. At this level, the mannequin is able to settle for inference requests by means of the endpoint. You possibly can monitor the deployment progress on the SageMaker console Endpoints web page, which is able to show related metrics and standing data. When the deployment is full, you’ll be able to invoke the mannequin utilizing a SageMaker runtime shopper and combine it together with your functions.

Deploy DeepSeek-R1 utilizing the SageMaker Python SDK

To get began with DeepSeek-R1 utilizing the SageMaker Python SDK, you’ll need to put in the SageMaker Python SDK and be sure you have the required AWS permissions and atmosphere setup. The next is a step-by-step code instance that demonstrates how one can deploy and use DeepSeek-R1 for inference programmatically. The code for deploying the mannequin is supplied within the Github right here . You possibly can clone the pocket book and run from SageMaker Studio.

!pip set up --force-reinstall --no-cache-dir sagemaker==2.235.2

from sagemaker.serve.builder.model_builder import ModelBuilder 
from sagemaker.serve.builder.schema_builder import SchemaBuilder 
from sagemaker.jumpstart.mannequin import ModelAccessConfig 
from sagemaker.session import Session 
import logging 

sagemaker_session = Session()
 
artifacts_bucket_name = sagemaker_session.default_bucket() 
execution_role_arn = sagemaker_session.get_caller_identity_arn()
 
js_model_id = "deepseek-llm-r1"

gpu_instance_type = "ml.p5e.48xlarge"
 
response = "Hey, I am a language mannequin, and I am right here that will help you together with your English."

 sample_input = {
 "inputs": "Hey, I am a language mannequin,",
 "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.6},
 }
  
 sample_output = [{"generated_text": response}]
  
 schema_builder = SchemaBuilder(sample_input, sample_output)
  
 model_builder = ModelBuilder( 
 mannequin=js_model_id, 
 schema_builder=schema_builder, 
 sagemaker_session=sagemaker_session, 
 role_arn=execution_role_arn, 
 log_level=logging.ERROR ) 
 
 mannequin= model_builder.construct() 
 predictor = mannequin.deploy(model_access_configs={js_model_id:ModelAccessConfig(accept_eula=True)}, accept_eula=True) 
 
 
 predictor.predict(sample_input)

You possibly can run further requests towards the predictor:

new_input = {
    "inputs": "What's Amazon doing in Generative AI?",
    "parameters": {"max_new_tokens": 64, "top_p": 0.8, "temperature": 0.7},
}

prediction = predictor.predict(new_input)
print(prediction)

Implement guardrails and run inference together with your SageMaker JumpStart predictor

Just like Amazon Bedrock, you may also use the ApplyGuardrail API together with your SageMaker JumpStart predictor. You possibly can create a guardrail utilizing the Amazon Bedrock console or the API, and implement it as proven within the following code:

import boto3
import json
bedrock_runtime = boto3.shopper('bedrock-runtime')
sagemaker_runtime = boto3.shopper('sagemaker-runtime')

# Add your guardrail identifier and model created from Bedrock Console or AWSCLI
guardrail_id = "" # Your Guardrail ID
guardrail_version = "" # Your Guardrail Model
endpoint_name = "" # Endpoint Title

immediate = "What's 1+1 equal?"

# Apply guardrail to enter earlier than sending to mannequin
input_guardrail_response = bedrock_runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion=guardrail_version,
    supply="INPUT",
    content material=[{ "text": { "text": prompt }}]
)

# If enter guardrail passes, proceed with mannequin inference
if input_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
    # Put together the enter for the SageMaker endpoint
    template = f"""You're an AI assistant. Do because the person asks.
### Instruction: {immediate}
### Response: <suppose>"""
    
    input_payload = {
        "inputs": template,
        "parameters": {
            "max_new_tokens": 1000,
            "top_p": 0.9,
            "temperature": 0.6
        }
    }
    
    # Convert the payload to JSON string
    input_payload_json = json.dumps(input_payload)
    
    # Invoke the SageMaker endpoint
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=input_payload_json
    )
    
    # Get the response from the mannequin
    model_response = json.hundreds(response['Body'].learn().decode())
    
    # Apply guardrail to output
    output_guardrail_response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        supply="OUTPUT",
        content material=[{ "text": { "text": model_response['generated_text'] }}]
    )
    
    # Examine if output passes guardrails
    if output_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
        print(model_response['generated_text'])
    else:
        print("Output blocked: ", output_guardrail_response['outputs'][0]['text'])
else:
    print("Enter blocked: ", input_guardrail_response['outputs'][0]['text'])

Clear up

To keep away from undesirable costs, full the steps on this part to scrub up your assets.

Delete the Amazon Bedrock Market deployment

If you happen to deployed the mannequin utilizing Amazon Bedrock Market, full the next steps:

On the Amazon Bedrock console, beneath Basis fashions within the navigation pane, select Market deployments.
Within the Managed deployments part, find the endpoint you wish to delete.
Choose the endpoint, and on the Actions menu, select Delete.
Confirm the endpoint particulars to be sure you’re deleting the right deployment:
1. Endpoint title
2. Mannequin title
3. Endpoint standing
Select Delete to delete the endpoint.
Within the deletion affirmation dialog, evaluate the warning message, enter affirm, and select Delete to completely take away the endpoint.

Delete the SageMaker JumpStart predictor

The SageMaker JumpStart mannequin you deployed will incur prices in the event you depart it working. Use the next code to delete the endpoint if you wish to cease incurring costs. For extra particulars, see Delete Endpoints and Sources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

On this publish, we explored how one can entry and deploy the DeepSeek-R1 mannequin utilizing Bedrock Market and SageMaker JumpStart. Go to SageMaker JumpStart in SageMaker Studio or Amazon Bedrock Market now to get began. For extra data, check with Use Amazon Bedrock tooling with Amazon SageMaker JumpStart fashions, SageMaker JumpStart pretrained fashions, Amazon SageMaker JumpStart Basis Fashions, Amazon Bedrock Market, and Getting began with Amazon SageMaker JumpStart.

In regards to the Authors

Vivek Gangasani is a Lead Specialist Options Architect for Inference at AWS. He helps rising generative AI corporations construct revolutionary options utilizing AWS companies and accelerated compute. At the moment, he’s targeted on creating methods for fine-tuning and optimizing the inference efficiency of enormous language fashions. In his free time, Vivek enjoys mountaineering, watching films, and attempting totally different cuisines.

Niithiyn Vijeaswaran is a Generative AI Specialist Options Architect with the Third-Social gathering Mannequin Science staff at AWS. His space of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s diploma in Pc Science and Bioinformatics.

Jonathan Evans is a Specialist Options Architect engaged on generative AI with the Third-Social gathering Mannequin Science staff at AWS.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker’s machine studying and generative AI hub. She is captivated with constructing options that assist prospects speed up their AI journey and unlock enterprise worth.