Generative AI is revolutionizing enterprise automation, enabling AI programs to grasp context, make choices, and act independently. Generative AI basis fashions (FMs), with their potential to grasp context and make choices, have gotten highly effective companions in fixing refined enterprise issues. At AWS, we’re utilizing the facility of fashions in Amazon Bedrock to drive automation of complicated processes which have historically been difficult to streamline.
On this submit, we deal with one such complicated workflow: doc processing. This serves for instance of how generative AI can streamline operations that contain numerous knowledge sorts and codecs.
Challenges with doc processing
Doc processing usually includes dealing with three fundamental classes of paperwork:
- Structured – For instance, kinds with mounted fields
- Semi-structured – Paperwork which have a predictable set of data however may fluctuate in format or presentation
- Unstructured – For instance, paragraphs of textual content or notes
Historically, processing these different doc sorts has been a ache level for a lot of organizations. Rule-based programs or specialised machine studying (ML) fashions usually battle with the variability of real-world paperwork, particularly when coping with semi-structured and unstructured knowledge.
We display how generative AI together with exterior device use gives a extra versatile and adaptable resolution to this problem. By means of a sensible use case of processing a affected person well being bundle at a physician’s workplace, you will notice how this know-how can extract and synthesize data from all three doc sorts, probably bettering knowledge accuracy and operational effectivity.
Answer overview
This clever doc processing resolution makes use of Amazon Bedrock FMs to orchestrate a complicated workflow for dealing with multi-page healthcare paperwork with combined content material sorts. The answer makes use of the FM’s device use capabilities, accessed by means of the Amazon Bedrock Converse API. This permits the FMs to not simply course of textual content, however to actively have interaction with numerous exterior instruments and APIs to carry out complicated doc evaluation duties.
The answer employs a strategic multi-model strategy, optimizing for each efficiency and price by choosing probably the most acceptable mannequin for every job:
-
Anthropic’s Claude 3 Haiku – Serves because the workflow orchestrator resulting from its low latency and cost-effectiveness. This mannequin’s sturdy reasoning and power use skills make it ideally suited for the next:
-
Coordinating the general doc processing pipeline
-
Making routing choices for various doc sorts
-
Invoking acceptable processing features
-
Managing the workflow state
-
-
Anthropic’s Claude 3.5 Sonnet (v2) – Used for its superior reasoning capabilities, notably sturdy visible processing skills, notably excelling at decoding charts and graphs. Its key strengths embrace:
-
Decoding complicated doc layouts and construction
-
Extracting textual content from tables and kinds
-
Processing medical charts and handwritten notes
-
Changing unstructured visible data into structured knowledge
-
By means of the Amazon Bedrock Converse API’s standardized device use (operate calling) interface, these fashions can work collectively seamlessly to invoke doc processing features, name exterior APIs for knowledge validation, set off storage operations, and execute content material transformation duties. The API serves as the muse for this clever workflow, offering a unified interface for mannequin communication whereas sustaining dialog state all through the processing pipeline. The API’s standardized strategy to device definition and performance calling gives constant interplay patterns throughout totally different processing phases. For extra particulars on how device use works, confer with The whole device use workflow.
The answer incorporates Amazon Bedrock Guardrails to implement sturdy content material filtering insurance policies and delicate data detection, ensuring that non-public well being data (PHI) and personally identifiable data (PII) knowledge is appropriately protected by means of automated detection and masking capabilities whereas sustaining trade normal compliance all through the doc processing workflow.
Stipulations
You want the next stipulations earlier than you possibly can proceed with this resolution. For this submit, we use the us-west-2
AWS Area. For particulars on obtainable Areas, see Amazon Bedrock endpoints and quotas.
Use case and dataset
For our instance use case, we study a affected person consumption course of at a healthcare establishment. The workflow processes a affected person well being data bundle containing three distinct doc sorts:
- Structured doc – A brand new affected person consumption kind with standardized fields for private data, medical historical past, and present signs. This manner follows a constant format with clearly outlined fields and test containers, making it a great instance of a structured doc.
- Semi-structured doc – A medical health insurance card that accommodates important protection data. Though insurance coverage playing cards usually comprise comparable data (coverage quantity, group ID, protection dates), they arrive from totally different suppliers with various layouts and codecs, exhibiting the semi-structured nature of those paperwork.
- Unstructured doc – A handwritten physician’s observe from an preliminary session, containing free-form observations, preliminary diagnoses, and remedy suggestions. This represents probably the most difficult class of unstructured paperwork, the place data isn’t confined to any predetermined format or construction.
The instance doc will be downloaded from the next GitHub repo.
This healthcare use case is especially related as a result of it encompasses frequent challenges in doc processing: the necessity for top accuracy, compliance with healthcare knowledge privateness necessities, and the flexibility to deal with a number of doc codecs inside a single workflow. The number of paperwork on this affected person bundle demonstrates how a contemporary clever doc processing resolution should be versatile sufficient to deal with totally different ranges of doc construction whereas sustaining consistency and accuracy in knowledge extraction.
The next diagram illustrates the answer workflow.
This self-orchestrated workflow demonstrates how trendy generative AI options can stability functionality, efficiency, and cost-effectiveness in remodeling conventional doc processing workflows in healthcare settings.
Deploy the answer
- Create an Amazon SageMaker area. For directions, see Use fast setup for Amazon SageMaker AI.
- Launch SageMaker Studio, then create and launch a JupyterLab area. For directions, see Create an area.
- Create a guardrail. Give attention to including delicate data filters that may masks PII or PHI.
-
Clone the code from the GitHub repository:
git clone https://github.com/aws-samples/anthropic-on-aws.git
-
Change the listing to the basis of the cloned repository:
cd medical-idp
-
Set up dependencies:
pip set up -r necessities.txt
-
Replace setup.sh with the guardrail ID you created in Step 3. Then set the ENV variable:
supply setup.sh
-
Lastly, begin the Streamlit software:
streamlit run streamlit_app.py
Now you’re able to discover the clever doc processing workflow utilizing Amazon Bedrock.
Technical implementation
The answer is constructed across the Amazon Bedrock Converse API and power use framework, with Anthropic’s Claude 3 Haiku serving as the first orchestrator. When a doc is uploaded by means of the Streamlit interface, Haiku analyzes the request and determines the sequence of instruments wanted by consulting the device definitions in ToolConfig
. These definitions embrace instruments for the next:
- Doc processing pipeline – Handles preliminary PDF processing and classification
- Doc notes processing – Extracts data from medical notes
- New affected person data processing – Processes affected person consumption kinds
- Insurance coverage kind processing – Handles insurance coverage card data
The next code is an instance device definition for extracting session notes. Right here, extract_consultation_notes
represents the identify of the operate that the orchestration workflow will name, and document_paths
defines the schema of the enter parameter that will likely be handed to the operate. The FM will contextually extract the data from the doc and move to the strategy. An analogous toolspec
will likely be outlined for every step. Seek advice from the GitHub repo for the total toolspec
definition.
{
"toolSpec": {
"identify": "extract_consultation_notes",
"description": "Extract diagnostics data from a physician's session notes. Together with the extraction embrace the total transcript in a <transcript> node",
"inputSchema": {
"json": {
"kind": "object",
"properties": {
"document_paths": {
"kind": "array",
"objects": {"kind": "string"},
"description": "Paths to the information that have been labeled as DOC_NOTES"
}
},
"required": ["document_paths"]
}
}
}
}
When a PDF doc is uploaded by means of the Streamlit interface, it’s briefly saved and handed to the FileProcessor class together with the device specification and a person immediate:
immediate = ("1. Extract 2. save and three. summarize the data from the affected person data bundle situated at " + tmp_file + ". " +
"The bundle may comprise numerous forms of paperwork together with insurance coverage playing cards. Extract and save data from all paperwork offered. "
"Carry out any preprocessing or classification of the file offered previous to the extraction." +
"Set the enable_guardrails parameter to " + str(enable_guardrails) + ". " +
"On the finish, record all of the instruments that you just had entry to. Give an explantion on why every device was used and in case you are not utilizing a device, clarify why it was not used as effectively" +
"Suppose step-by-step.")
processor.process_file(immediate=immediate,
toolspecs=toolspecs,
...
The BedrockUtils
class manages the dialog with Anthropic’s Claude 3 Haiku by means of the Amazon Bedrock Converse API. It maintains the dialog state and handles the device use workflow:
# From bedrockutility.py
def invoke_bedrock(self, message_list, system_message=[], tool_list=[],
temperature=0, maxTokens=2048, guardrail_config=None):
response = self.bedrock.converse(
modelId=self.model_id,
messages=message_list,
system=system_message,
inferenceConfig={
"maxTokens": maxTokens,
"temperature": temperature
},
**({"toolConfig": {"instruments": tool_list}} if tool_list else {})
)
When the processor receives a doc, it initiates a dialog loop with Anthropic’s Claude 3 Haiku, which analyzes the doc and determines which instruments to make use of primarily based on the content material. The mannequin acts as an clever orchestrator, making choices in regards to the following:
- Which doc processing instruments to invoke
- The sequence of processing steps
- The right way to deal with totally different doc sorts inside the similar bundle
- When to summarize and full the processing
This orchestration is managed by means of a steady dialog loop that processes device requests and their outcomes till all the doc bundle has been processed.
The primary key choice within the workflow is initiating the doc classification course of. By means of the DocumentClassifier
class, the answer makes use of Anthropic’s Claude 3.5 Sonnet to research and categorize every web page of the uploaded doc into three fundamental sorts: consumption kinds, insurance coverage playing cards, and physician’s notes:
# from document_classifier.py
class DocumentClassifier:
def __init__(self, file_handler):
self.sonnet_3_5_bedrock_utils = BedrockUtils(
model_id=ModelIDs.anthropic_claude_3_5_sonnet
)
def categorize_document(self, file_paths):
# Convert paperwork to binary format for mannequin processing
binary_data_array = []
for file_path in file_paths:
binary_data, media_type = self.file_handler.get_binary_for_file(file_path)
binary_data_array.append((binary_data[0], media_type))
# Put together message for classification
message_content = [
{"image": {"format": media_type, "source": {"bytes": data}}}
for data, media_type in binary_data_array
]
# Create classification request
message_list = [{
"role": 'user',
"content": [
*message_content,
{"text": "What types of document is in this image?"}
]
}]
# Outline system message for classification
system_message = [{
"text": '''You are a medical document processing agent.
Categorize images as: INTAKE_FORM, INSURANCE_CARD, or DOC_NOTES'''
}]
# Get classification from mannequin
response = self.sonnet_3_5_bedrock_utils.invoke_bedrock(
message_list=message_list,
system_message=system_message
)
return [response['output']['message']]
Primarily based on the classification outcomes, the FM determines the subsequent device to be invoked. The device’s description and enter schema outline precisely what data must be extracted. Following the earlier instance, let’s assume the subsequent web page to be processed is a session observe. The workflow will invoke the extract_consultation_notes
operate. This operate processes paperwork to extract detailed medical data. Just like the classification course of mentioned earlier, it first converts the paperwork to binary format appropriate for mannequin processing. The important thing to correct extraction lies in how the pictures and system message are mixed:
def extract_info(self, file_paths):
# Convert paperwork to binary knowledge
# This can comply with the identical sample to as within the classification operate
message_content = [
{"image": {"format": media_type, "source": {"bytes": data}}}
for data, media_type in binary_data_array
]
message_list = [{
"role": 'user',
"content": [
*message_content, # Include the processed document images
{"text": '''Extract all information from this file
If you find a visualization
- Provide a detailed description in natural language
- Use domain specific language for the description
'''}
]
}]
system_message = [{
"text": '''You are a medical consultation agent with expertise in diagnosing and treating various health conditions.
You have a deep understanding of human anatomy, physiology, and medical knowledge across different specialties.
During the consultation, you review the patient's medical records, test results, and documentation provided.
You analyze this information objectively and make associations between the data and potential diagnoses.
Associate a confidence score to each extracted information. This should reflect how confident the model in the extracted value matched the requested entity.
'''}
]
response = self.bedrock_utils.invoke_bedrock(
message_list=message_list,
system_message=system_message
)
return [response['output']['message']]
The system message serves three essential functions:
- Set up medical area experience for correct interpretation.
- Present pointers for dealing with several types of data (textual content and visualizations).
- Present a self-scored confidence. Though this isn’t an impartial grading mechanism, the rating is directionally indicative of how assured the mannequin is in its personal extraction.
Following the identical sample, the FM will use the opposite instruments within the toolspec
definition to save lots of and summarize the outcomes.
A novel benefit of utilizing a multi-modal FM for the extraction job is its potential to have a deep understanding of the textual content it’s extracting. For instance, the next code is an summary of the information schema we’re requesting as enter to the save_consultation_notes
operate. Seek advice from the code in constants.py for full definition. The mannequin must not solely extract a transcript, but additionally perceive it to extract such structured knowledge from an unstructured doc. This considerably reduces the postprocessing efforts required for the information to be consumed by a downstream software.
"session": {
"kind": "object",
"properties": {
"date": {"kind": "string"},
"concern": {
"kind": "object",
"properties": {
"primaryComplaint": {
"kind": "string",
"description": "Major medical criticism of the affected person. Solely seize the medical situation. no timelines"
},
"period": {"kind": "quantity"},
"durationUnit": {"kind": "string", "enum": ["days", "weeks", "months", "years"]},
"associatedSymptoms": {
"kind": "object",
"additionalProperties": {
"kind": "boolean"
},
"description": "Key-value pairs of signs and their presence (true) or absence (false)"
},
"absentSymptoms": {
"kind": "array",
"objects": {"kind": "string"}
}
},
"required": ["primaryComplaint", "duration", "durationUnit"]
}
The paperwork comprise a treasure trove of personally identifiable data (PII) and private well being data (PIH). To redact this data, you possibly can move enable_guardrails as true. This can use the guardrail you setup earlier as a part of the data extraction course of and masks data recognized as PII or PIH.
processor.process_file(immediate=immediate,
enable_guardrails=True,
toolspecs=toolspecs,
…
)
Lastly, cross-document validation is essential for sustaining knowledge accuracy and compliance in healthcare settings. Though the present implementation performs primary consistency checks by means of the abstract immediate, organizations can prolong the framework by implementing a devoted validation device that integrates with their particular enterprise guidelines and compliance necessities. Such a device might carry out refined validation logic like insurance coverage coverage verification, appointment date consistency checks, or another domain-specific validation necessities, offering full knowledge integrity throughout the doc bundle.
Future concerns
As Amazon Bedrock continues to evolve, a number of highly effective options will be built-in into this doc processing workflow to reinforce its enterprise readiness, efficiency, and cost-efficiency. Let’s discover how these superior capabilities can take this resolution to the subsequent degree:
- Inference profiles in Amazon Bedrock outline a mannequin and its related Areas for routing invocation requests, enabling numerous duties equivalent to utilization monitoring, price monitoring, and cross-Area inference. These profiles assist customers observe metrics by means of Amazon CloudWatch logs, monitor prices with price allocation tags, and improve throughput by distributing requests throughout a number of Areas.
- Immediate caching may help when you’ve got workloads with lengthy and repeated contexts which might be regularly reused for a number of queries. As an alternative of reprocessing all the context for every doc, the workflow can reuse cached prompts, which is especially useful when utilizing the identical picture throughout totally different tooling workflows. With help for a number of cache checkpoints, this function can considerably scale back processing time and inference prices whereas sustaining the workflow’s clever orchestration capabilities.
- Clever immediate routing can dynamically choose probably the most acceptable mannequin for every job primarily based on efficiency and price necessities. Relatively than explicitly assigning Anthropic’s Claude 3 Haiku for orchestration and Anthropic’s Claude 3.5 Sonnet for doc evaluation, the workflow can use clever routing to mechanically select the optimum mannequin inside the Anthropic household for every request. This strategy simplifies mannequin administration whereas offering cost-effective processing of various doc sorts, from easy structured kinds to complicated handwritten notes, all by means of a single endpoint.
Conclusion
This clever doc processing resolution demonstrates the facility of mixing Amazon Bedrock FMs with device use capabilities to create refined, self-orchestrating workflows. By utilizing Anthropic’s Claude 3 Haiku for orchestration and Anthropic’s Claude 3.5 Sonnet for complicated visible duties, the answer successfully handles structured, semi-structured, and unstructured paperwork whereas sustaining excessive accuracy and compliance requirements.
Key advantages of this strategy embrace:
- Decreased handbook processing by means of clever automation
- Improved accuracy by means of specialised mannequin choice
- Constructed-in compliance with guardrails for delicate knowledge
- Versatile structure that adapts to varied doc sorts
- Price-effective processing by means of strategic mannequin utilization
As organizations proceed to digitize their operations, options like this showcase how generative AI can remodel conventional doc processing workflows. The mix of highly effective FMs in Amazon Bedrock and the device use framework gives a strong basis for constructing clever, scalable doc processing options throughout industries.
For extra details about Amazon Bedrock and its capabilities, go to the Amazon Bedrock Consumer Information.
In regards to the Writer
Raju Rangan is a Senior Options Architect at AWS. He works with government-sponsored entities, serving to them construct AI/ML options utilizing AWS. When not tinkering with cloud options, you’ll catch him hanging out with household or smashing birdies in a vigorous recreation of badminton with buddies.