How Cisco accelerated using generative AI with Amazon SageMaker Inference

This publish is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Webex by Cisco is a number one supplier of cloud-based collaboration options, together with video conferences, calling, messaging, occasions, polling, asynchronous video, and buyer expertise options like contact middle and purpose-built collaboration gadgets. Webex’s concentrate on delivering inclusive collaboration experiences fuels their innovation, which makes use of synthetic intelligence (AI) and machine studying (ML), to take away the limitations of geography, language, persona, and familiarity with expertise. Its options are underpinned with safety and privateness by design. Webex works with the world’s main enterprise and productiveness apps—together with AWS.

Cisco’s Webex AI (WxAI) staff performs an important position in enhancing these merchandise with AI-driven options and functionalities, utilizing massive language fashions (LLMs) to enhance consumer productiveness and experiences. Prior to now 12 months, the staff has more and more centered on constructing AI capabilities powered by LLMs to enhance productiveness and expertise for customers. Notably, the staff’s work extends to Webex Contact Middle, a cloud-based omni-channel contact middle resolution that empowers organizations to ship distinctive buyer experiences. By integrating LLMs, the WxAI staff allows superior capabilities comparable to clever digital assistants, pure language processing (NLP), and sentiment evaluation, permitting Webex Contact Middle to supply extra personalised and environment friendly buyer assist. Nonetheless, as these LLM fashions grew to include a whole bunch of gigabytes of knowledge, the WxAI staff confronted challenges in effectively allocating assets and beginning purposes with the embedded fashions. To optimize its AI/ML infrastructure, Cisco migrated its LLMs to Amazon SageMaker Inference, bettering pace, scalability, and price-performance.

This publish highlights how Cisco carried out new functionalities and migrated present workloads to Amazon SageMaker inference elements for his or her industry-specific contact middle use instances. By integrating generative AI, they will now analyze name transcripts to raised perceive buyer ache factors and enhance agent productiveness. Cisco has additionally carried out conversational AI experiences, together with chatbots and digital brokers that may generate human-like responses, to automate personalised communications based mostly on buyer context. Moreover, they’re utilizing generative AI to extract key name drivers, optimize agent workflows, and acquire deeper insights into buyer sentiment. Cisco’s adoption of SageMaker Inference has enabled them to streamline their contact middle operations and supply extra satisfying, personalised interactions that tackle buyer wants.

On this publish, we focus on the next:

Cisco’s enterprise use instances and outcomes
How Cisco accelerated using generative AI powered by LLMs for his or her contact middle use instances with the assistance of SageMaker Inference
Cisco’s generative AI inference structure, which is constructed as a strong and safe basis, utilizing varied providers and options comparable to SageMaker Inference, Amazon Bedrock, Kubernetes, Prometheus, Grafana, and extra
How Cisco makes use of an LLM router and auto scaling to route requests to acceptable LLMs for various duties whereas concurrently scaling their fashions for resiliency and efficiency effectivity.
How the options on this publish impacted Cisco’s enterprise roadmap and strategic partnership with AWS
How Cisco helped SageMaker Inference construct new capabilities to deploy generative AI purposes at scale

Enhancing collaboration and buyer engagement with generative AI: Webex’s AI-powered options

On this part, we focus on Cisco’s AI-powered use instances.

Assembly summaries and insights

For Webex Conferences, the platform makes use of generative AI to routinely summarize assembly recordings and transcripts. This extracts the important thing takeaways and motion gadgets, serving to distributed groups keep knowledgeable even when they missed a reside session. The AI-generated summaries present a concise overview of vital discussions and choices, permitting workers to shortly rise up to hurry. Past summaries, Webex’s generative AI capabilities additionally floor clever insights from assembly content material. This contains figuring out motion gadgets, highlighting vital choices, and producing personalised assembly notes and to-do lists for every participant. These insights assist make conferences extra productive and maintain attendees accountable.

Enhancing contact middle experiences

Webex can also be making use of generative AI to its contact middle options, enabling extra pure, human-like conversations between prospects and brokers. The AI can generate contextual, empathetic responses to buyer inquiries, in addition to routinely draft personalised emails and chat messages. This helps contact middle brokers work extra effectively whereas sustaining a excessive stage of customer support.

Webex prospects understand constructive outcomes with generative AI

Webex’s adoption of generative AI is driving tangible advantages for purchasers. Shoppers utilizing the platform’s AI-powered assembly summaries and insights have reported productiveness beneficial properties. Webex prospects utilizing the platform’s generative AI for contact facilities have dealt with a whole bunch of hundreds of calls with improved buyer satisfaction and decreased deal with instances, enabling extra pure, empathetic conversations between brokers and purchasers. Webex’s strategic integration of generative AI is empowering customers to work smarter and ship distinctive experiences.

For extra particulars on how Webex is harnessing generative AI to reinforce collaboration and buyer engagement, see Webex | Distinctive Experiences for Each Interplay on the Webex weblog.

Utilizing SageMaker Inference to optimize assets for Cisco

Cisco’s WxAI staff is devoted to delivering superior collaboration experiences powered by cutting-edge ML. The staff develops a complete suite of AI and ML options for the Webex ecosystem, together with audio intelligence capabilities like noise elimination and optimizing speaker voices, language intelligence for transcription and translation, and video intelligence options like digital backgrounds. On the forefront of WxAI’s improvements is the AI-powered Webex Assistant, a digital assistant that gives voice-activated management and seamless assembly assist in a number of languages. To construct these refined capabilities, WxAI makes use of LLMs, which might include as much as a whole bunch of gigabytes of coaching information.

Initially, WxAI embedded LLM fashions instantly into the appliance container photographs operating on Amazon Elastic Kubernetes Service (Amazon EKS). Nonetheless, because the fashions grew bigger and extra advanced, this method confronted vital scalability and useful resource utilization challenges. Working the resource-intensive LLMs via the purposes required provisioning substantial compute assets, which slowed down processes like allocating assets and beginning purposes. This inefficiency hampered WxAI’s potential to quickly develop, check, and deploy new AI-powered options for the Webex portfolio. To deal with these challenges, the WxAI staff turned to SageMaker Inference—a completely managed AI inference service that permits seamless deployment and scaling of fashions independently from the purposes that use them. By decoupling the LLM internet hosting from the Webex purposes, WxAI may provision the required compute assets for the fashions with out impacting the core collaboration and communication capabilities.

“The purposes and the fashions work and scale basically in another way, with completely completely different value issues; by separating them fairly than lumping them collectively, it’s a lot easier to resolve points independently.”

– Travis Mehlinger, Principal Engineer at Cisco.

This architectural shift has enabled Webex to harness the ability of generative AI throughout its suite of collaboration and buyer engagement options.

Resolution overview: Bettering effectivity and lowering prices by migrating to SageMaker Inference

To deal with the scalability and useful resource utilization challenges confronted with embedding LLMs instantly into their purposes, the WxAI staff migrated to SageMaker Inference. By profiting from this totally managed service for deploying LLMs, Cisco unlocked vital efficiency and cost-optimization alternatives. Key advantages embrace the power to deploy a number of LLMs behind a single endpoint for sooner scaling and improved response latencies, in addition to value financial savings. Moreover, the WxAI staff carried out an LLM proxy to simplify entry to LLMs for Webex groups, allow centralized information assortment, and cut back operational overhead. With SageMaker Inference, Cisco can effectively handle and scale their LLM deployments, harnessing the ability of generative AI throughout the Webex portfolio whereas sustaining optimum efficiency, scalability, and cost-effectiveness.

The next diagram illustrates the WxAI structure on AWS.

The structure is constructed on a strong and safe AWS basis:

The structure makes use of AWS providers like Software Load Balancer, AWS WAF, and EKS clusters for seamless ingress, risk mitigation, and containerized workload administration.
The LLM proxy (a microservice deployed on an EKS pod as a part of the Service VPC) simplifies the mixing of LLMs for Webex groups, offering a streamlined interface and lowering operational overhead. The LLM proxy helps LLM deployments on SageMaker Inference, Amazon Bedrock, or different LLM suppliers for Webex groups.
The structure makes use of SageMaker Inference for optimized mannequin deployment, auto scaling, and routing mechanisms.
The system integrates Loki for logging, Amazon Managed Service for Prometheus for metrics, and Grafana for unified visualization, seamlessly built-in with Cisco SSO.
The Information VPC homes the info layer elements, together with Amazon ElastiCache for caching and Amazon Relational Database Service (Amazon RDS) for database providers, offering environment friendly information entry and administration.

Use case overview: Contact middle matter analytics

A key focus space for the WxAI staff is to reinforce the capabilities of the Webex Contact Middle platform. A typical Webex Contact Middle set up has a whole bunch of brokers dealing with many interactions via varied channels like telephone calls and digital channels. Webex’s AI-powered Subject Analytics function extracts the important thing causes prospects are calling about by analyzing aggregated historic interactions and clustering them into significant matter classes, as proven within the following screenshot. The contact middle administrator can then use these insights to optimize operations, improve agent efficiency, and in the end ship a extra passable buyer expertise.

The Subject Analytics function is powered by a pipeline of three fashions: a name driver extraction mannequin, a subject clustering mannequin, and a subject labeling mannequin, as illustrated within the following diagram.

The mannequin particulars are as follows:

Name driver extraction – This generative mannequin summarizes the first motive or intent (known as the name driver) behind a buyer’s name. Correct computerized tagging of calls with name drivers helps contact middle supervisors and directors shortly perceive the first motive for any historic name. One of many key issues when fixing this downside was choosing the best mannequin to stability high quality and operational prices. The WxAI staff selected the FLAN T5 mannequin on SageMaker Inference and instruction fine-tuned it for extracting name drivers from name transcripts. FLAN-T5 is a strong text-to-text switch transformer mannequin that performs varied pure language understanding and era duties. This workload had a world footprint deployed in us-east-2, eu-west-2, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, and ca-central-1 AWS
Subject clustering – Though routinely tagging each contact middle interplay with its name driver is a helpful function in itself, analyzing these name drivers in an aggregated style over a big batch of calls can uncover much more attention-grabbing tendencies and insights. The subject clustering mannequin achieves this by clustering all of the individually extracted name drivers from a big batch of calls into completely different matter clusters. It does this by making a semantic embedding for every name driver and using an unsupervised hierarchical clustering method that operates on the vector embeddings. This leads to distinct and coherent matter clusters the place semantically comparable name drivers are grouped collectively.
Subject labeling – The subject labeling mannequin is a generative mannequin that creates a descriptive title to function the label for every matter cluster. A number of LLMs had been prompt-tuned and evaluated in a few-shot setting to decide on the perfect mannequin for the label era activity. Lastly, Llama2-13b-chat, with its potential to raised seize contextual nuances and semantics of pure language dialog, was used for its accuracy, efficiency, and cost-effectiveness. Moreover, Llama2-13b-chat was deployed and used on SageMaker inference elements, whereas sustaining comparatively low working prices in comparison with different LLMs, by utilizing particular {hardware} like g4dn and g5

This resolution additionally used the auto scaling capabilities of SageMaker to dynamically modify the variety of situations based mostly on a desired minimal of 1 endpoint and most of 30. This method offers environment friendly useful resource utilization whereas sustaining excessive throughput, permitting the WxAI platform to deal with batch jobs in a single day and scale to a whole bunch of inferences per minute throughout peak hours. By deploying the mannequin on SageMaker Inference with auto scaling, WxAI staff was in a position to ship dependable and correct responses to buyer interactions for his or her Subject Analytics use case.

By precisely pinpointing the decision driver, the system can counsel acceptable actions, assets, and subsequent steps to the agent, streamlining the client assist course of, additional resulting in personalised and correct responses to buyer questions.

To deal with fluctuating demand and optimize useful resource utilization, the WxAI staff carried out auto scaling for his or her SageMaker Inference endpoints. They configured the endpoints to scale from a minimal to a most occasion rely based mostly on GPU utilization. Moreover, the LLM proxy routed requests between the completely different LLMs deployed on SageMaker Inference. This proxy abstracts the complexities of speaking with varied LLM suppliers and allows centralized information assortment and evaluation. This led to enhanced generative AI workflows, optimized latency, and personalised use case implementations.

Advantages

By means of the strategic adoption of AWS AI providers, Cisco’s WxAI staff has realized vital advantages, enabling them to construct cutting-edge, AI-powered collaboration capabilities extra quickly and cost-effectively:

Improved growth and deployment cycle time – By decoupling fashions from purposes, the staff has streamlined processes like bug fixes, integration testing, and have rollouts throughout environments, accelerating their total growth velocity.
Simplified engineering and supply – The clear separation of issues between the lean software layer and resource-intensive mannequin layer has simplified engineering efforts and supply, permitting the staff to concentrate on innovation fairly than infrastructure complexities.
Decreased prices – Through the use of totally managed providers like SageMaker Inference, the staff has offloaded infrastructure administration overhead. Moreover, capabilities like asynchronous inference and multi-model endpoints have enabled vital value optimization with out compromising efficiency or availability.
Scalability and efficiency – Providers like SageMaker Inference and Amazon Bedrock, mixed with applied sciences like NVIDIA Triton Inference Server on SageMaker, have empowered the WxAI staff to scale their AI/ML workloads reliably and ship high-performance inference for demanding use instances.
Accelerated innovation – The partnership with AWS has given the WxAI staff entry to cutting-edge AI providers and experience, enabling them to quickly prototype and deploy progressive capabilities just like the AI-powered Webex Assistant and superior contact middle AI options.

Cisco’s contributions to SageMaker Inference: Enhancing generative AI inference capabilities

Constructing upon the success of their strategic migration to SageMaker Inference, Cisco has been instrumental in partnering with the SageMaker Inference staff to construct and improve key generative AI capabilities inside the SageMaker platform. Because the early days of generative AI, Cisco has offered the SageMaker Inference staff with helpful inputs and experience, enabling the introduction of a number of new options and optimizations:

Value and efficiency optimizations for generative AI inference – Cisco helped the SageMaker Inference staff develop progressive strategies to optimize using accelerators, enabling SageMaker Inference to scale back basis mannequin (ML) deployment prices by 50% on common and latency by 20% on common with inference elements. This breakthrough delivers vital value financial savings and efficiency enhancements for purchasers operating generative AI workloads on SageMaker.
Scaling enhancements for generative AI inference – Cisco’s experience in distributed programs and auto scaling has additionally helped the SageMaker staff develop superior capabilities to raised deal with the scaling necessities of generative AI fashions. These enhancements cut back auto scaling instances by as much as 40% and auto scaling detection by 6 instances, so prospects can quickly scale their generative AI workloads on SageMaker to fulfill spikes in demand with out compromising efficiency.
Streamlined generative AI mannequin deployment for inference – Recognizing the necessity for simplified generative AI mannequin deployment, Cisco collaborated with AWS to introduce the power to deploy open supply LLMs and FMs with just some clicks. This user-friendly performance removes the complexity historically related to deploying these superior fashions, empowering extra prospects to harness the ability of generative AI.
Simplified inference deployment for Kubernetes prospects – Cisco’s deep experience in Kubernetes and container applied sciences helped the SageMaker staff develop new Kubernetes Operator-based inference capabilities. These improvements make it simple for prospects operating purposes on Kubernetes to deploy and handle generative AI fashions, lowering LLM deployment prices by 50% on common.
Utilizing NVIDIA Triton Inference Server for generative AI – Cisco labored with AWS to combine the NVIDIA Triton Inference Server, a high-performance mannequin serving container managed by SageMaker, to energy generative AI inference on SageMaker Inference. This enabled the WxAI staff to scale their AI/ML workloads reliably and ship high-performance inference for demanding generative AI use instances.
Packaging generative AI fashions extra effectively – To additional simplify the generative AI mannequin lifecycle, Cisco labored with AWS to reinforce the capabilities in SageMaker for packaging LLMs and FMs for deployment. These enhancements make it simple to arrange and deploy these generative AI fashions, accelerating their adoption and integration.
Improved documentation for generative AI – Recognizing the significance of complete documentation to assist the rising generative AI ecosystem, Cisco collaborated with the AWS staff to reinforce the SageMaker documentation. This contains detailed guides, finest practices, and reference supplies tailor-made particularly for generative AI use instances, serving to prospects shortly ramp up their generative AI initiatives on the SageMaker platform.

By carefully partnering with the SageMaker Inference staff, Cisco has performed a pivotal position in driving the fast evolution of generative AI Inference capabilities in SageMaker. The options and optimizations launched via this collaboration are empowering AWS prospects to unlock the transformative potential of generative AI with higher ease, cost-effectiveness, and efficiency.

“Our partnership with the SageMaker Inference product staff goes again to the early days of generative AI, and we imagine the options now we have inbuilt collaboration, from value optimizations to high-performance mannequin deployment, will broadly assist different enterprises quickly undertake and scale generative AI workloads on SageMaker, unlocking new frontiers of innovation and enterprise transformation.”

– Travis Mehlinger, Principal Engineer at Cisco.

Conclusion

Through the use of AWS providers like SageMaker Inference and Amazon Bedrock for generative AI, Cisco’s WxAI staff has been in a position to optimize their AI/ML infrastructure, enabling them to construct and deploy AI-powered options extra effectively, reliably, and cost-effectively. This strategic method has unlocked vital advantages for Cisco in deploying and scaling its generative AI capabilities for the Webex platform. Cisco’s personal journey with generative AI, as showcased on this publish, gives helpful classes and insights for different makes use of of SageMaker Inference.

Recognizing the affect of generative AI, Cisco has performed an important position in shaping the way forward for these capabilities inside SageMaker Inference. By offering helpful insights and hands-on collaboration, Cisco has helped AWS develop a spread of highly effective options which are making generative AI extra accessible and scalable for organizations. From optimizing infrastructure prices and efficiency to streamlining mannequin deployment and scaling, Cisco’s contributions have been instrumental in enhancing the SageMaker Inference service.

Shifting ahead, the Cisco-AWS partnership goals to drive additional developments in areas like conversational and generative AI inference. As generative AI adoption accelerates throughout industries, Cisco’s Webex platform is designed to scale and streamline consumer experiences via varied use instances mentioned on this publish and past. You may count on to see ongoing innovation from this collaboration in SageMaker Inference capabilities, as Cisco and SageMaker Inference proceed to push the boundaries of what’s doable on the planet of AI.

For extra data on Webex Contact Middle’s Subject Analytics function and associated AI capabilities, check with The Webex Benefit: Navigating Buyer Expertise within the Age of AI on the Webex weblog.

In regards to the Authors

Travis Mehlinger is a Principal Software program Engineer within the Webex Collaboration AI group, the place he helps groups develop and function cloud-centered AI and ML capabilities to assist Webex AI options for purchasers around the globe. In his spare time, Travis enjoys cooking barbecue, enjoying video video games, and touring across the US and UK to race go-karts.

Karthik Raghunathan is the Senior Director for Speech, Language, and Video AI within the Webex Collaboration AI Group. He leads a multidisciplinary staff of software program engineers, machine studying engineers, information scientists, computational linguists, and designers who develop superior AI-driven options for the Webex collaboration portfolio. Previous to Cisco, Karthik held analysis positions at MindMeld (acquired by Cisco), Microsoft, and Stanford College.

Saurabh Trikande is a Senior Product Supervisor for Amazon SageMaker Inference. He’s captivated with working with prospects and is motivated by the aim of democratizing machine studying. He focuses on core challenges associated to deploying advanced ML purposes, multi-tenant ML fashions, value optimizations, and making deployment of deep studying fashions extra accessible. In his spare time, Saurabh enjoys climbing, studying about progressive applied sciences, following TechCrunch and spending time together with his household.

Ravi Thakur is a Senior Options Architect at AWS, based mostly in Charlotte, NC. He makes a speciality of fixing advanced enterprise challenges utilizing distributed, cloud-centered, and well-architected patterns. Ravi’s experience contains microservices, containerization, AI/ML, and generative AI. He empowers AWS strategic prospects on digital transformation journeys, delivering bottom-line advantages. In his spare time, Ravi enjoys motorbike rides, household time, studying, motion pictures, and touring.

Amit Arora is an AI and ML Specialist Architect at Amazon Internet Providers, serving to enterprise prospects use cloud-based machine studying providers to quickly scale their improvements. He’s additionally an adjunct lecturer within the MS information science and analytics program at Georgetown College in Washington D.C.

Madhur Prashant is an AI and ML Options Architect at Amazon Internet Providers. He’s passionate in regards to the intersection of human pondering and generative AI. His pursuits lie in generative AI, particularly constructing options which are useful and innocent, and most of all optimum for purchasers. Outdoors of labor, he loves doing yoga, climbing, spending time together with his twin, and enjoying the guitar.

How Cisco accelerated using generative AI with Amazon SageMaker Inference

Tips on how to rotate your iPhone display screen with out transferring your machine

Recommended.

Researchers Uncover Vulnerabilities in Open-Supply AI and ML Fashions

Qualcomm accelerates automotive AI with Snapdragon’s newest chips

Trending.

Chi siamo

Categories

Le nostre policy