As the size and complexity of information dealt with by organizations enhance, conventional rules-based approaches to analyzing the information alone are now not viable. As an alternative, organizations are more and more seeking to reap the benefits of transformative applied sciences like machine studying (ML) and synthetic intelligence (AI) to ship progressive merchandise, enhance outcomes, and achieve operational efficiencies at scale. Moreover, the democratization of AI and ML by way of AWS and AWS Associate options is accelerating its adoption throughout all industries.
For instance, a health-tech firm could also be seeking to enhance affected person care by predicting the likelihood that an aged affected person might change into hospitalized by analyzing each scientific and non-clinical information. This can enable them to intervene early, personalize the supply of care, and take advantage of environment friendly use of present assets, similar to hospital mattress capability and nursing employees.
AWS gives the broadest and deepest set of AI and ML providers and supporting infrastructure, similar to Amazon SageMaker and Amazon Bedrock, that can assist you at each stage of your AI/ML adoption journey, together with adoption of generative AI. Splunk, an AWS Associate, gives a unified safety and observability platform constructed for velocity and scale.
As the range and quantity of information will increase, it’s important to know how they are often harnessed at scale by utilizing complementary capabilities of the 2 platforms. For organizations wanting past using out-of-the-box Splunk AI/ML options, this put up explores how Amazon SageMaker Canvas, a no-code ML improvement service, can be utilized at the side of information collected in Splunk to drive actionable insights. We additionally display learn how to use the generative AI capabilities of SageMaker Canvas to hurry up your information exploration and provide help to construct higher ML fashions.
Use case overview
On this instance, a health-tech firm providing distant affected person monitoring is gathering operational information from wearables utilizing Splunk. These machine metrics and logs are ingested into and saved in a Splunk index, a repository of incoming information. Inside Splunk, this information is used to meet context-specific safety and observability use instances by Splunk customers, similar to monitoring the safety posture and uptime of gadgets and performing proactive upkeep of the fleet.
Individually, the corporate makes use of AWS information providers, similar to Amazon Easy Storage Service (Amazon S3), to retailer information associated to sufferers, similar to affected person data, machine possession particulars, and scientific telemetry information obtained from the wearables. These might embody exports from buyer relationship administration (CRM), configuration administration database (CMDB), and digital well being file (EHR) techniques. On this instance, they’ve entry to an extract of affected person data and hospital admission data that reside in an S3 bucket.
The next desk illustrates the completely different information explored on this instance use case.
Description |
Function Title |
Storage |
Instance Supply |
|
Age of affected person |
|
AWS |
EHR |
|
Items of alcohol consumed by affected person each week |
|
AWS |
EHR |
|
Tobacco utilization by affected person per week |
|
AWS |
EHR |
|
Common systolic blood strain of affected person |
|
AWS |
Wearables |
|
Common diastolic blood strain of affected person |
|
AWS |
Wearables |
|
Common resting coronary heart fee of affected person |
|
AWS |
Wearables |
|
Affected person admission file |
|
AWS |
EHR |
|
Variety of days the machine has been energetic over a interval |
|
Splunk |
Wearables |
|
Common finish of the day battery stage over a interval |
|
Splunk |
Wearables |
This put up describes an strategy with two key parts:
- The 2 information sources are saved alongside one another utilizing a standard AWS information engineering pipeline. Knowledge is introduced to the personas that want entry utilizing a unified interface.
- An ML mannequin to foretell hospital admissions (
admitted
) is developed utilizing the mixed dataset and SageMaker Canvas. Professionals with no background in ML are empowered to investigate the information utilizing no-code tooling.
The answer permits customized ML fashions to be developed from a broader number of scientific and non-clinical information sources to cater for various real-life eventualities. For instance, it may be used to reply questions similar to “If sufferers have a tendency to have their wearables turned off and there’s no scientific telemetry information obtainable, can the probability that they’re hospitalized nonetheless be precisely predicted?”
AWS information engineering pipeline
The adaptable strategy detailed on this put up begins with an automatic information engineering pipeline to make information saved in Splunk obtainable to a variety of personas, together with enterprise intelligence (BI) analysts, information scientists, and ML practitioners, by way of a SQL interface. That is achieved by utilizing the pipeline to switch information from a Splunk index into an S3 bucket, the place it will likely be cataloged.
The strategy is proven within the following diagram.

Determine 1: Structure overview of information engineering pipeline
The automated AWS information pipeline consists of the next steps:
- Knowledge from wearables is saved in a Splunk index the place it may be queried by customers, similar to safety operations middle (SOC) analysts, utilizing the Splunk search processing language (SPL). Spunk’s out-of-the-box AI/ML capabilities, such because the Splunk Machine Studying Toolkit (Splunk MLTK) and purpose-built fashions for safety and observability use instances (for instance, for anomaly detection and forecasting), might be utilized contained in the Splunk Platform. Utilizing these Splunk ML options lets you derive contextualized insights rapidly with out the necessity for extra AWS infrastructure or abilities.
- Some organizations might look to develop customized, differentiated ML fashions, or need to construct AI-enabled purposes utilizing AWS providers for his or her particular use instances. To facilitate this, an automatic information engineering pipeline is constructed utilizing AWS Step Features. The Step Features state machine is configured with an AWS Lambda perform to retrieve information from the Splunk index utilizing the Splunk Enterprise SDK for Python. The SPL question requested by way of this REST API name is scoped to solely retrieve the information of curiosity.
-
- Lambda helps container photos. This answer makes use of a Lambda perform that runs a Docker container picture. This enables bigger information manipulation libraries, similar to pandas and PyArrow, to be included within the deployment package deal.
- If a big quantity of information is being exported, the code might have to run for longer than the utmost attainable length, or require extra reminiscence than supported by Lambda features. If that is so, Step Features might be configured to immediately run a container activity on Amazon Elastic Container Service (Amazon ECS).
-
- For authentication and authorization, the Spunk bearer token is securely retrieved from AWS Secrets and techniques Supervisor by the Lambda perform earlier than calling the Splunk
/search
REST API endpoint. This bearer authentication token lets customers entry the REST endpoint utilizing an authenticated id. - Knowledge retrieved by the Lambda perform is reworked (if required) and uploaded to the designated S3 bucket alongside different datasets. This information is partitioned and compressed, and saved in storage and performance-optimized Apache Parquet file format.
- As its final step, the Step Features state machine runs an AWS Glue crawler to deduce the schema of the Splunk information residing within the S3 bucket, and catalogs it for wider consumption as tables utilizing the AWS Glue Knowledge Catalog.
- Wearable information exported from Splunk is now obtainable to customers and purposes by way of the Knowledge Catalog as a desk. Analytics tooling similar to Amazon Athena can now be used to question the information utilizing SQL.
- As information saved in your AWS setting grows, it’s important to have centralized governance in place. AWS Lake Formation lets you simplify permissions administration and information sharing to keep up safety and compliance.
An AWS Serverless Utility Mannequin (AWS SAM) template is offered to deploy all AWS assets required by this answer. This template might be discovered within the accompanying GitHub repository.
Check with the README file for required stipulations, deployment steps, and the method to check the information engineering pipeline answer.
AWS AI/ML analytics workflow
After the information engineering pipeline’s Step Features state machine efficiently completes and wearables information from Splunk is accessible alongside affected person healthcare information utilizing Athena, we use an instance strategy based mostly on SageMaker Canvas to drive actionable insights.
SageMaker Canvas is a no-code visible interface that empowers you to organize information, construct, and deploy extremely correct ML fashions, streamlining the end-to-end ML lifecycle in a unified setting. You possibly can put together and remodel information by way of point-and-click interactions and pure language, powered by Amazon SageMaker Knowledge Wrangler. You may also faucet into the ability of automated machine studying (AutoML) and mechanically construct customized ML fashions for regression, classification, time sequence forecasting, pure language processing, and pc imaginative and prescient, supported by Amazon SageMaker Autopilot.
On this instance, we use the service to categorise whether or not a affected person is more likely to be admitted to a hospital over the following 30 days based mostly on the mixed dataset.
The strategy is proven within the following diagram.

Determine 2: Structure overview of ML improvement
The answer consists of the next steps:
- An AWS Glue crawler crawls the information saved in S3 bucket. The Knowledge Catalog exposes this information discovered within the folder construction as tables.
- Athena supplies a question engine to permit individuals and purposes to work together with the tables utilizing SQL.
- SageMaker Canvas makes use of Athena as a knowledge supply to permit the information saved within the tables for use for ML mannequin improvement.
Answer overview
SageMaker Canvas lets you construct a customized ML mannequin utilizing a dataset that you’ve got imported. Within the following sections, we display learn how to create, discover, and remodel a pattern dataset, use pure language to question the information, test for information high quality, create extra steps for the information circulate, and construct, take a look at, and deploy an ML mannequin.
Stipulations
Earlier than continuing, confer with Getting began with utilizing Amazon SageMaker Canvas to be sure to have the required stipulations in place. Particularly, validate that the AWS Id and Entry Administration (IAM) function your SageMaker area is utilizing has a coverage hooked up with adequate permissions to entry Athena, AWS Glue, and Amazon S3 assets.
Create the dataset
SageMaker Canvas helps Athena as a information supply. Knowledge from wearables and affected person healthcare information residing throughout your S3 bucket is accessed utilizing Athena and the Knowledge Catalog. This enables this tabular information to be immediately imported into SageMaker Canvas to start out your ML improvement.
To create your dataset, full the next steps:
- On the SageMaker Canvas console, select Knowledge Wrangler within the navigation pane.
- On the Import and put together dropdown menu, select Tabular because the dataset kind to indicate that the imported information consists of rows and columns.

Determine 3: Importing tabular information utilizing SageMaker Knowledge Wrangler
- For Choose a knowledge supply, select Athena.
On this web page, you will notice your Knowledge Catalog database and tables listed, named patient_data
and splunk_ops_data
.
- Be a part of (interior be part of) the tables collectively utilizing the
user_id
andid
to create one overarching dataset that can be utilized throughout ML mannequin improvement. - Below Import settings, enter
unprocessed_data
for Dataset title. - Select Import to finish the method.

Determine 4: Becoming a member of information utilizing SageMaker Knowledge Wrangler
The mixed dataset is now obtainable to discover and remodel utilizing SageMaker Knowledge Wrangler.
Discover and remodel the dataset
SageMaker Knowledge Wrangler lets you remodel and analyze the supply dataset by way of information flows whereas nonetheless sustaining a no-code strategy.
The earlier step mechanically created a knowledge circulate within the SageMaker Canvas console which we’ve renamed to data_prep_data_flow.circulate
. Moreover, two steps are mechanically generated, as listed within the following desk.
Step |
Title |
Description |
1 |
Athena Supply |
Units the |
2 |
Knowledge sorts |
Units column kinds of |
Earlier than we create extra remodel steps, let’s discover two SageMaker Canvas options that may assist us give attention to the correct actions.
Use pure language to question the information
SageMaker Knowledge Wrangler additionally supplies generative AI capabilities known as Chat for information prep powered by a big language mannequin (LLM). This function lets you discover your information utilizing pure language with none background in ML or SQL. Moreover, any contextualized suggestions returned by the generative AI mannequin might be launched immediately again into the information circulate with out writing any code.
On this part, we current some instance prompts to display this in motion. These examples have been chosen for example the artwork of the attainable. We suggest that you just experiment with completely different prompts to realize one of the best outcomes in your explicit use instances.
Instance 1: Establish Splunk default fields
On this first instance, we need to know whether or not there are Splunk default fields that we might probably exclude from our dataset previous to ML mannequin improvement.
- In SageMaker Knowledge Wrangler, open your information circulate.
- Select Step 2 Knowledge sorts, and select Chat for information prep.
- Within the Chat for information prep pane, you’ll be able to enter prompts in pure language to discover and remodel the information. For instance:
On this instance, the generative AI LLM has accurately recognized Splunk default fields that might be safely dropped from the dataset.
- Select Add to steps so as to add this recognized transformation to the information circulate.

Determine 5: Utilizing SageMaker Knowledge Wrangler’s chat for information prep to establish Splunk’s default fields
Instance 2: Establish extra columns that might be dropped
We now need to establish any additional columns that might be dropped with out being too particular about what we’re in search of. We would like the LLM to make the options based mostly on the information, and supply us with the rationale. For instance:
Along with the Splunk default fields recognized earlier, the generative AI mannequin is now proposing the removing of columns similar to timestamp
, punct
, id
, index
, and linecount
that don’t seem like conducive to ML mannequin improvement.

Determine 6: Utilizing SageMaker Knowledge Wrangler’s chat for information prep to establish extra fields that may be dropped
Instance 3: Calculate common age column in dataset
You may also use the generative AI mannequin to carry out Text2SQL duties in which you’ll merely ask questions of the information utilizing pure language. That is helpful if you wish to validate the content material of the dataset.
On this instance, we need to know what the common affected person age worth is throughout the dataset:
By increasing View code, you’ll be able to see what SQL statements the LLM has constructed utilizing its Text2SQL capabilities. This provides you full visibility into how the outcomes are being returned.

Determine 7: Utilizing SageMaker Knowledge Wrangler’s chat for information prep to run SQL statements
Examine for information high quality
SageMaker Canvas additionally supplies exploratory information evaluation (EDA) capabilities that assist you to achieve deeper insights into the information previous to the ML mannequin construct step. With EDA, you’ll be able to generate visualizations and analyses to validate whether or not you have got the correct information, and whether or not your ML mannequin construct is more likely to yield outcomes which are aligned to your group’s expectations.
Instance 1: Create a Knowledge High quality and Insights Report
Full the next steps to create a Knowledge High quality and Insights Report:
- Whereas within the information circulate step, select the Analyses tab.
- For Evaluation kind, select Knowledge High quality and Insights Report.
- For Goal column, select
admitted
. - For Drawback kind, select Classification.
This performs an evaluation of the information that you’ve got and supplies data such because the variety of lacking values and outliers.

Determine 8: Operating SageMaker Knowledge Wrangler’s information high quality and insights report
Check with Get Insights On Knowledge and Knowledge High quality for particulars on learn how to interpret the outcomes of this report.
Instance 2: Create a Fast Mannequin
On this second instance, select Fast Mannequin for Evaluation kind and for Goal column, select admitted
. The Fast Mannequin estimates the anticipated predicted high quality of the mannequin.
By working the evaluation, the estimated F1 rating (a measure of predictive efficiency) of the mannequin and have significance scores are displayed.

Determine 9: Operating SageMaker Knowledge Wrangler’s fast mannequin function to evaluate the potential accuracy of the mannequin
SageMaker Canvas helps many different evaluation sorts. By reviewing these analyses upfront of your ML mannequin construct, you’ll be able to proceed to engineer the information and options to realize adequate confidence that the ML mannequin will meet your enterprise targets.
Create extra steps within the information circulate
On this instance, we’ve determined to replace our data_prep_data_flow.circulate
information circulate to implement extra transformations. The next desk summarizes these steps.
Step |
Remodel |
Description |
3 |
Chat for information prep |
Removes Splunk default fields recognized. |
4 |
Chat for information prep |
Removes extra fields recognized as being unhelpful to ML mannequin improvement. |
5 |
Group by |
Teams collectively the rows by user_id and calculates a mean |
6 |
Drop column (handle columns) |
Drops remaining columns which are pointless for our ML improvement, similar to columns with excessive cardinality (for instance, |
7 |
Parse column as kind |
Converts numerical worth sorts, for instance from |
8 |
Parse column as kind |
Converts extra columns that should be parsed (every column requires a separate step). |
9 |
Drop duplicates (handle rows) |
Drops duplicate rows to keep away from overfitting. |
To create a brand new remodel, view the information circulate, then select Add remodel on the final step.

Determine 10: Utilizing SageMaker Knowledge Wrangler so as to add a remodel to an information circulate
Select Add remodel, and proceed to decide on a remodel kind and its configuration.

Determine 11: Utilizing SageMaker Knowledge Wrangler so as to add a remodel to an information circulate
The next screenshot exhibits our newly up to date end-to-end information circulate that includes a number of steps. On this instance, we ran the analyses on the finish of the information circulate.

Determine 12: Displaying the end-to-end SageMaker Canvas Knowledge Wrangler information circulate
If you wish to incorporate this information circulate right into a productionized ML workflow, SageMaker Canvas can create a Jupyter pocket book that exports your information circulate to Amazon SageMaker Pipelines.
Develop the ML mannequin
To get began with ML mannequin improvement, full the next steps:
- Select Create mannequin immediately from the final step of the information circulate.

Determine 13: Making a mannequin from the SageMaker Knowledge Wrangler information circulate
- For Dataset title, enter a reputation in your reworked dataset (for instance,
processed_data
). - Select Export.

Determine 14: Naming the exported dataset for use by the mannequin in SageMaker Knowledge Wrangler
This step will mechanically create a brand new dataset.
- After the dataset has been created efficiently, select Create mannequin to start the ML mannequin creation.

Determine 15: Creating the mannequin in SageMaker Knowledge Wrangler
- For Mannequin title, enter a reputation for the mannequin (for instance,
my_healthcare_model
). - For Drawback kind, choose Predictive evaluation.
- Select Create.

Determine 16: Naming the mannequin in SageMaker Canvas and choosing the predictive evaluation kind
You are actually able to progress by way of the Construct, Analyze, Predict, and Deploy levels to develop and operationalize the ML mannequin utilizing SageMaker Canvas.
- On the Construct tab, for Goal column, select the column you need to predict (
admitted
). - Select Fast construct to construct the mannequin.
The Fast construct choice has a shorter construct time, however the Normal construct choice usually enjoys greater accuracy.

Determine 17: Deciding on the goal column to foretell in SageMaker Canvas
After a couple of minutes, on the Analyze tab, it is possible for you to to view the accuracy of the mannequin, together with column impression, scoring, and different superior metrics. For instance, we are able to see {that a} function from the wearables information captured in Splunk—average_num_days_device_active
—has a robust impression on whether or not the affected person is more likely to be admitted or not, together with their age. As such, the health-tech firm might proactively attain out to aged sufferers who are likely to maintain their wearables off to attenuate the danger of their hospitalization.

Determine 18: Displaying the outcomes from the mannequin fast construct in SageMaker Canvas
If you’re pleased with the outcomes from the Fast construct, repeat the method with a Normal construct to be sure to have an ML mannequin with greater accuracy that may be deployed.
Take a look at the ML mannequin
Our ML mannequin has now been constructed. Should you’re glad with its accuracy, you can also make predictions utilizing this ML mannequin utilizing web new information on the Predict tab. Predictions might be carried out both utilizing batch (record of sufferers) or for a single entry (one affected person).
Experiment with completely different values and select Replace prediction. The ML mannequin will reply with a prediction for the brand new values that you’ve got entered.
On this instance, the ML mannequin has recognized a 64.5% likelihood that this explicit affected person might be admitted to hospital within the subsequent 30 days. The health-tech firm will probably need to prioritize the care of this affected person.

Determine 19: Displaying the outcomes from a single prediction utilizing the mannequin in SageMaker Canvas
Deploy the ML mannequin
It’s now attainable for the health-tech firm to construct purposes that may use this ML mannequin to make predictions. ML fashions developed in SageMaker Canvas might be operationalized utilizing a broader set of SageMaker providers. For instance:
To deploy the ML mannequin, full the next steps:
- On the Deploy tab, select Create Deployment.
- Specify Deployment title, Occasion kind, and Occasion rely.
- Select Deploy to make the ML mannequin obtainable as a SageMaker endpoint.
On this instance, we lowered the occasion kind to ml.m5.4xlarge and occasion rely to 1 earlier than deployment.

Determine 20: Deploying the utilizing SageMaker Canvas
At any time, you’ll be able to immediately take a look at the endpoint from SageMaker Canvas on the Take a look at deployment tab of the deployed endpoint listed below Operations on the SageMaker Canvas console.
Check with the Amazon SageMaker Canvas Developer Information for detailed steps to take your ML mannequin improvement by way of its full improvement lifecycle and construct purposes that may devour the ML mannequin to make predictions.
Clear up
Check with the directions within the README file to wash up the assets provisioned for the AWS information engineering pipeline answer.
SageMaker Canvas payments you at some stage in the session, and we suggest logging out of SageMaker Canvas if you end up not utilizing it. Check with Logging out of Amazon SageMaker Canvas for extra particulars. Moreover, should you deployed a SageMaker endpoint, be sure to have deleted it.
Conclusion
This put up explored a no-code strategy involving SageMaker Canvas that may drive actionable insights from information saved throughout each Splunk and AWS platforms utilizing AI/ML methods. We additionally demonstrated how you need to use the generative AI capabilities of SageMaker Canvas to hurry up your information exploration and construct ML fashions which are aligned to your enterprise’s expectations.
Study extra about AI on Splunk and ML on AWS.
In regards to the Authors
Alan Peaty is a Senior Associate Options Architect, serving to International Methods Integrators (GSIs), International Impartial Software program Distributors (GISVs), and their prospects undertake AWS providers. Previous to becoming a member of AWS, Alan labored as an architect at techniques integrators similar to IBM, Capita, and CGI. Outdoors of labor, Alan is a eager runner who likes to hit the muddy trails of the English countryside, and is an IoT fanatic.
Brett Roberts is the International Associate Technical Supervisor for AWS at Splunk, main the technical technique to assist prospects higher safe and monitor their crucial AWS environments and purposes utilizing Splunk. Brett was a member of the Splunk Belief and holds a number of Splunk and AWS certifications. Moreover, he co-hosts a group podcast and weblog known as Huge Knowledge Beard, exploring tendencies and applied sciences within the analytics and AI house.
Arnaud Lauer is a Principal Associate Options Architect within the Public Sector staff at AWS. He allows companions and prospects to know learn how to finest use AWS applied sciences to translate enterprise wants into options. He brings greater than 18 years of expertise in delivering and architecting digital transformation tasks throughout a spread of industries, together with public sector, power, and client items.