Meta SAM 2.1 is now obtainable in Amazon SageMaker JumpStart

This weblog submit is co-written with George Orlin from Meta.

Right this moment, we’re excited to announce that Meta’s Phase Something Mannequin (SAM) 2.1 imaginative and prescient segmentation mannequin is publicly obtainable by Amazon SageMaker JumpStart to deploy and run inference. Meta SAM 2.1 gives state-of-the-art video and picture segmentation capabilities in a single mannequin. This cutting-edge mannequin helps long-context processing, complicated segmentation situations, and fine-grained evaluation, making it preferrred for automating processes for numerous industries equivalent to medical imaging in healthcare, satellite tv for pc imagery for atmosphere monitoring, and object segmentation for autonomous methods. Meta SAM 2.1 is properly fitted to zero-shot object segmentation and correct object detection based mostly on easy prompts equivalent to level coordinates and bounding containers in a body for video monitoring and picture masking.

This mannequin was predominantly skilled on AWS, and AWS may even be the primary cloud supplier to make it obtainable to prospects. On this submit, we stroll by the right way to uncover and deploy the Meta SAM 2.1 mannequin utilizing SageMaker JumpStart.

Meta SAM 2.1 overview

Meta SAM 2.1 is a state-of-the-art imaginative and prescient segmentation mannequin designed for high-performance laptop imaginative and prescient duties, enabling superior object detection and segmentation workflows. Constructing upon its predecessor, model 2.1 introduces enhanced segmentation accuracy, strong generalization throughout numerous datasets, and scalability for production-grade functions. These options allow AI researchers and builders in laptop imaginative and prescient, picture processing, and data-driven analysis to enhance duties that require detailed evaluation segmentation throughout a number of fields.

Meta SAM 2.1 has a streamlined structure that’s optimized for integration with fashionable model-serving frameworks like TorchServe and might be deployed on Amazon SageMaker AI to energy real-time or batch inference pipelines. Meta SAM 2.1 empowers organizations to realize exact segmentation outcomes in vision-centric workflows with minimal configuration and most effectivity.

Meta SAM 2.1 affords a number of variants—Tiny, Small, Base Plus, and Massive—obtainable now on SageMaker JumpStart, balancing mannequin dimension, pace, and segmentation efficiency to cater to numerous software wants.

SageMaker JumpStart overview

SageMaker JumpStart affords entry to a broad number of publicly obtainable basis fashions (FMs). These pre-trained fashions function highly effective beginning factors that may be deeply custom-made to handle particular use instances. Now you can use state-of-the-art mannequin architectures, equivalent to language fashions, laptop imaginative and prescient fashions, and extra, with out having to construct them from scratch.

With SageMaker JumpStart, you’ll be able to deploy fashions in a safe atmosphere. Fashions hosted on JumpStart might be provisioned on devoted SageMaker Inference situations, together with AWS Trainium and AWS Inferentia based mostly situations, and are remoted inside your digital non-public cloud (VPC). This enforces knowledge safety and compliance, as a result of the fashions function below your individual VPC controls, moderately than in a shared public atmosphere. After deploying an FM, you’ll be able to additional customise and fine-tune it utilizing the in depth capabilities of SageMaker AI, together with SageMaker Inference for deploying fashions and container logs for improved observability. With SageMaker AI, you’ll be able to streamline your entire mannequin deployment course of.

Conditions

Be sure you have the next stipulations to deploy Meta SAM 2.1 and run inference:

An AWS account that may comprise all of your AWS assets.
An AWS Id and Entry Administration (IAM) position to entry SageMaker AI. To be taught extra about how IAM works with SageMaker AI, discuss with Id and Entry Administration for Amazon SageMaker AI.
Entry to Amazon SageMaker Studio or a SageMaker pocket book occasion or an interactive improvement atmosphere (IDE) equivalent to PyCharm or Visible Studio Code. We suggest utilizing SageMaker Studio for simple deployment and inference.
Entry to accelerated situations (GPUs) for internet hosting the mannequin.

Uncover Meta SAM 2.1 in SageMaker JumpStart

SageMaker JumpStart gives FMs by two main interfaces: SageMaker Studio and the SageMaker Python SDK. This gives a number of choices to find and use a whole lot of fashions on your particular use case.

SageMaker Studio is a complete IDE that gives a unified, web-based interface for performing all elements of the machine studying (ML) improvement lifecycle. From getting ready knowledge to constructing, coaching, and deploying fashions, SageMaker Studio gives purpose-built instruments to streamline your entire course of. In SageMaker Studio, you’ll be able to entry SageMaker JumpStart to find and discover the in depth catalog of FMs obtainable for deployment to inference capabilities on SageMaker Inference.

You possibly can entry the SageMaker JumpStart UI by both Amazon SageMaker Unified Studio or SageMaker Studio. To deploy Meta SAM 2.1 utilizing the SageMaker JumpStart UI, full the next steps:

In SageMaker Unified Studio, on the Construct menu, select JumpStart fashions.

Should you’re already on the SageMaker Studio console, select JumpStart within the navigation pane.

You’ll be prompted to create a mission, after which you’ll be able to start deployment.

Alternatively, you should use the SageMaker Python SDK to programmatically entry and use SageMaker JumpStart fashions. This method permits for higher flexibility and integration with current AI/ML workflows and pipelines. By offering a number of entry factors, SageMaker JumpStart helps you seamlessly incorporate pre-trained fashions into your AI/ML improvement efforts, no matter your most well-liked interface or workflow.

Deploy Meta SAM 2.1 for inference utilizing SageMaker JumpStart

On the SageMaker JumpStart touchdown web page, you’ll be able to uncover the general public pre-trained fashions provided by SageMaker AI. You possibly can select the Meta mannequin supplier tab to find the Meta fashions obtainable.

Should you’re utilizing SageMaker Studio and don’t see the SAM 2.1 fashions, replace your SageMaker Studio model by shutting down and restarting. For extra details about model updates, discuss with Shut down and Replace Studio Traditional Apps.

You possibly can select the mannequin card to view particulars in regards to the mannequin equivalent to license, knowledge used to coach, and the right way to use. You can too discover two buttons, Deploy and Open Pocket book, which allow you to use the mannequin.

While you select Deploy, you have to be prompted to the following display to decide on an endpoint title and occasion kind to provoke deployment.

Upon defining your endpoint settings, you’ll be able to proceed to the following step to make use of the mannequin.

Deploy Meta SAM 2.1 imaginative and prescient segmentation mannequin for inference utilizing the Python SDK

While you select Deploy, mannequin deployment will begin. Alternatively, you’ll be able to deploy by the instance pocket book by selecting Open Pocket book. The pocket book gives end-to-end steerage on the right way to deploy the mannequin for inference and clear up assets.

To deploy utilizing a pocket book, you begin by deciding on an acceptable mannequin, specified by the model_id. You possibly can deploy any of the chosen fashions on SageMaker AI.

You possibly can deploy a Meta SAM 2.1 imaginative and prescient segmentation mannequin utilizing SageMaker JumpStart with the next SageMaker Python SDK code:

from sagemaker.jumpstart.mannequin import JumpStartModel 
mannequin = JumpStartModel(model_id = "meta-vs-sam-2-1-hiera-tiny") 
predictor = mannequin.deploy()

This deploys the mannequin on SageMaker AI with default configurations, together with default occasion kind and default VPC configurations. You possibly can change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you’ll be able to run inference in opposition to the deployed endpoint by the SageMaker predictor. There are three duties which can be obtainable with this endpoint: automated masks generator, picture predictor, and video predictor. We offer a code snippet for every later on this submit. To make use of the predictor, a sure payload schema must be adopted. The endpoint has sticky classes enabled, so to begin inference, it’s good to ship a start_session payload:

def start_session(asset_type, asset_path):

    asset_base64 = None
    
     with open(image_path, 'rb') as f:
            asset_base64 = base64.b64encode(f.learn()).decode('utf-8')
    
    response = predictor.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps({
                    "kind": "start_session",
                    "input_type": asset_type,
                    "path": asset_base64 
                }),
        SessionId="NEW_SESSION",
    )
    
    session_id = response.headers.get("x-amzn-sagemaker-new-session-id")
    
    return session_id

The start_session invocation wants an enter media kind of both picture or video and the base64 encoded knowledge of the media. This can launch a session with an occasion of the mannequin and cargo the media to be segmented.

To shut a session, ship a close_session invocation:

def close_session(session_id):
    response = predictor.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps({
                    "kind": "close_session",
                    "session_id": session_id
                }),
        SessionId=session_id,
    )
    
    session_id = response.headers.get("x-amzn-sagemaker-closed-session-id")
    
    return session_id

If x-amzn-sagemaker-closed-session-id exists as a header, then the session has been efficiently closed.

To proceed a session and retrieve the session ID of the present session, the response header can have the x-amzn-sagemaker-session-id key with the present session ID for any operation that isn’t start_session or close_session. Operations that aren’t start_session or close_session must be invoked with a response stream. That is as a result of dimension of the ensuing payload being bigger than what SageMaker real-time endpoints can return.

This can be a primary instance of interacting with the SAM 2.1 SageMaker JumpStart endpoint with sticky classes. The next examples for every of the duties reference these operations with out repeating them. The returned knowledge is of mime kind JSONL. For extra full examples, discuss with the instance notebooks for Meta SAM 2.1 on SageMaker Jumpstart.

Really useful situations and benchmarks

The next desk lists all of the Meta SAM 2.1 fashions obtainable in SageMaker JumpStart together with the model_id, default occasion varieties, and most variety of complete tokens (sum of variety of enter tokens and variety of generated tokens) supported for every of those fashions. For elevated context size, you’ll be able to modify the default occasion kind within the SageMaker JumpStart UI.

Mannequin Title	Mannequin ID	Default Occasion Sort	Supported Occasion Varieties
Meta SAM 2.1 Tiny	meta-vs-sam-2-1-hiera-tiny	ml.g6.24xlarge (5.5 MB complete picture or video dimension)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge
Meta SAM 2.1 Small	meta-vs-sam-2-1-hiera-small	ml.g6.24xlarge (5.5 MB complete picture or video dimension)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge
Meta SAM 2.1 Base Plus	meta-vs-sam-2-1-hiera-base-plus	ml.g6.24xlarge (5.5 MB complete picture or video dimension)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge
Meta SAM 2.1 Massive	meta-vs-sam-2-1-hiera-large	ml.g6.24xlarge (5.5 MB complete picture or video dimension)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge

Meta SAM 2.1 use instances: Inference and immediate examples

After you deploy the mannequin utilizing SageMaker JumpStart, it’s best to be capable to see a reference Jupyter pocket book that references the parser and helper features wanted to start utilizing Meta SAM 2.1. After you comply with these cells within the pocket book, you have to be prepared to start utilizing the mannequin’s imaginative and prescient segmentation capabilities.

Meta SAM 2.1 affords assist for 3 completely different duties (automated masks generator, picture predictor, video predictor) to generate masks for numerous objects in photographs, together with object monitoring in movies. Within the following examples, we reveal the right way to use the automated masks generator and picture predictor on a JPG of a truck. This truck.jpg file is saved within the jumpstart-cache-prod bucket; you’ll be able to entry it with the next code:

s3_bucket = f"jumpstart-cache-prod-{area}"
key_prefix = "inference-notebook-assets"

def download_from_s3(key_filenames):
    for key_filename in key_filenames:
        s3.download_file(s3_bucket, f"{key_prefix}/{key_filename}", key_filename)
        
truck_jpg = "truck.jpg"

#Obtain photographs.
download_from_s3(key_filenames=[truck_jpg])
show(Picture(filename=truck_jpg))

After you’ve gotten your picture and it’s encoded, you’ll be able to create masks for objects within the picture. To be used instances the place you wish to generate masks for each object within the picture, you should use the automated masks generator process.

Automated masks generator

The automated masks generator is nice for AI researchers for laptop imaginative and prescient duties and functions equivalent to medical imaging and diagnostics to routinely section areas of curiosity like tumors or particular organs to supply extra correct diagnostic assist. Moreover, the automated masks generator might be notably helpful within the autonomous car area, during which it might section out components in a digicam like pedestrians, autos, and different objects. Let’s use the automated masks generator to generate masks for all of the objects in truck.jpg.

The next code is the immediate to generate masks on your base64 encoded picture:

# Begin session
session_id = start_session("picture", truck_jpg)
    
# Generate and visualize masks with primary parameters
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps({
            "kind": "generate_automatic_masks",
            "session_id": session_id,
            "points_per_side": 32,
            "min_mask_region_area": 100
        }),
        SessionId=session_id,
        Settle for="software/jsonlines"
    )
    
# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

masks = parser.get_responses()

# Finish session
end_session(session_id)

We obtain the next output (parsed and visualized).

Picture predictor

Moreover, you’ll be able to select which objects within the offered picture you wish to create a masks for by including factors inside that object for Meta SAM 2.1 to create. A use case for the picture predictor might be invaluable for duties associated to design and modeling by automating processes that usually require guide efforts. For instance, the picture predictor can automate turning 2D photographs into 3D fashions by analyzing 2D photographs of blueprints, sketches, or flooring plans and producing preliminary 3D fashions. That is one among many examples of how the picture predictor can act as a bridge between 2D and 3D development throughout many alternative duties. We use the next picture with the factors that we used to immediate Meta SAM 2.1 for masking the article.

The next code is used to immediate Meta SAM 2.1 and plot the coordinates:

# Begin session
session_id = start_session("picture", truck_jpg)

factors = [
            {"type": "point", "coordinates": [500, 375], "label": 1},
            {"kind": "level", "coordinates": [1125, 625], "label": 1}
         ]
    
# Add a number of factors
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps({
            "kind": "add_points",
            "session_id": session_id,
            "factors": [p["coordinates"] for p in factors],
            "labels": [p["label"] for p in factors],
            "clear_old_points": clear_old_point,
        }),
        SessionId=session_id,
        Settle for="software/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

# Intermediate Response
masks = parser.get_responses()
    
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps({
            "kind": "predict",
            "session_id": session_id,
            "multimask_output": True,
            "return_logits": True
        }),
        SessionId=session_id,
        Settle for="software/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

masks = parser.get_responses()

# Finish session
end_session(session_id)

We obtain the next output (parsed and visualized).

Video predictor

We now reveal the right way to immediate Meta SAM 2.1 for object monitoring on video. One use case could be for ergonomic knowledge assortment and coaching functions. You should use the video predictor to investigate the motion and posture of people in actual time, serving as a strategy to scale back harm and enhance efficiency by setting alarms for unhealthy posture or actions. Let’s begin by accessing the basketball-layup.mp4 file [1] from the jumpstart-cache-prod S3 bucket outlined within the following code:

basketball_mp4 = "basketball-layup.mp4"

#Obtain video
download_from_s3(key_filenames=[basketball_mp4])
show(Video(filename=basketball_mp4))

Video:

The next code reveals how one can arrange the immediate format to trace objects within the video. The primary object will use coordinates to trace and never monitor, and the second object will monitor one coordinate.

# Begin session
session_id = start_session("video", basketball_mp4)

# Object 1
prompts1 = [
        {"type": "point", "coordinates": [1478, 649], "label": 1},
        {"kind": "level", "coordinates": [1433, 689], "label": 0},
    ]
    
# Extract factors and labels
factors = []
labels = []
for immediate in prompts1:
    if immediate["type"] == "level":
        factors.append(immediate["coordinates"])
        labels.append(immediate["label"])

request = {
        "kind": "add_points",
        "session_id": session_id,
        "frame_index": 0,
        "object_id": 1,
        "factors": factors,
        "labels": labels,
        "clear_old_points": True,
    }
    
# Add a number of factors
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps(request),
        SessionId=session_id,
        Settle for="software/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

# Intermediate Response
masks = parser.get_responses()

# Object 2
prompts2 = [{"type": "point", "coordinates": [1433, 689], "label": 1}]

# Extract factors and labels
factors = []
labels = []
for immediate in prompts2:
    if immediate["type"] == "level":
        factors.append(immediate["coordinates"])
        labels.append(immediate["label"])

request = {
        "kind": "add_points",
        "session_id": session_id,
        "frame_index": 0,
        "object_id": 2,
        "factors": factors,
        "labels": labels,
        "clear_old_points": True,
    }
    
# Add a number of factors
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps(request),
        SessionId=session_id,
        Settle for="software/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

# Intermediate Response
masks = parser.get_responses()
    
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps({
            "kind": "propagate_in_video",
            "session_id": session_id,
            "start_frame_index": 0,
        }),
        SessionId=session_id,
        Settle for="software/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

masks = parser.get_responses()

# Finish session
end_session(session_id)

We obtain the next output (parsed and visualized).

Video:

Right here we will see that Meta SAM 2.1 Tiny was capable of efficiently monitor the objects based mostly off the coordinates that had been offered in immediate.

Clear up

To keep away from incurring pointless prices, once you’re performed, delete the SageMaker AI endpoints utilizing the next code:

predictor.delete_model()
predictor.delete_endpoint()

Alternatively, to make use of the SageMaker AI console, full the next steps:

On the SageMaker AI console, below Inference within the navigation pane, select
Seek for the embedding and textual content technology endpoints.
On the endpoint particulars web page, select Delete.
Select Delete once more to verify.

Conclusion

On this submit, we explored how SageMaker JumpStart empowers knowledge scientists and ML engineers to find, entry, and deploy a variety of pre-trained FMs for inference, together with Meta’s most superior and succesful fashions thus far. Get began with SageMaker JumpStart and Meta SAM 2.1 fashions right this moment. For extra details about SageMaker JumpStart, see SageMaker JumpStart pretrained fashions and Getting began with Amazon SageMaker JumpStart.

Sources:

[1] Erčulj F, Štrumbelj E (2015) Basketball Shot Varieties and Shot Success in Completely different Ranges of Aggressive Basketball. PLOS ONE 10(6): e0128885. https://doi.org/10.1371/journal.pone.0128885

In regards to the Authors

Marco Punio is a Sr. Specialist Options Architect targeted on generative AI technique, utilized AI options, and conducting analysis to assist prospects hyper-scale on AWS. As a member of the third Celebration Mannequin Supplier Utilized Sciences Options Structure staff at AWS, he’s a World Lead for the Meta – AWS Partnership and technical technique. Primarily based in Seattle, WA, Marco enjoys writing, studying, exercising, and constructing functions in his free time.

Deepak Rupakula is a Principal GTM lead within the specialists group at AWS. He focuses on creating GTM technique for giant language fashions like Meta throughout AWS companies like Amazon Bedrock and Amazon SageMaker AI. With over 15 years of expertise within the tech business, his expertise consists of management roles in product administration, buyer success, and analytics.

Harish Rao is a Senior Options Architect at AWS, specializing in large-scale distributed AI coaching and inference. He empowers prospects to harness the ability of AI to drive innovation and remedy complicated challenges. Outdoors of labor, Harish embraces an energetic way of life, having fun with the tranquility of climbing, the depth of racquetball, and the psychological readability of mindfulness practices.

Baladithya Balamurugan is a Options Architect at AWS targeted on ML deployments for inference and utilizing AWS Neuron to speed up coaching and inference. He works with prospects to allow and speed up their ML deployments on companies equivalent to Amazon SageMaker AI and Amazon EC2. Primarily based in San Francisco, Baladithya enjoys tinkering, creating functions, and constructing his homelab in his free time.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker AI’s machine studying and generative AI hub. She is enthusiastic about constructing options that assist prospects speed up their AI journey and unlock enterprise worth.

Naman Nandan is a software program improvement engineer at AWS, specializing in enabling large-scale AI/ML inference workloads on Amazon SageMaker AI utilizing TorchServe, a mission collectively developed by AWS and Meta. In his free time, he enjoys enjoying tennis and occurring hikes.