Introducing document-level sync stories: Enhanced information sync visibility in Amazon Q Enterprise

Amazon Q Enterprise is a totally managed, generative synthetic intelligence (AI)-powered assistant that helps enterprises unlock the worth of their information and information. With Amazon Q, you’ll be able to shortly discover solutions to questions, generate summaries and content material, and full duties through the use of the data and experience saved throughout your organization’s varied information sources and enterprise methods. On the core of this functionality are native information supply connectors that seamlessly combine and index content material from a number of repositories right into a unified index. This allows the Amazon Q giant language mannequin (LLM) to offer correct, well-written solutions by drawing from the consolidated information and knowledge. The information supply connectors act as a bridge, synchronizing content material from disparate methods like Salesforce, Jira, and SharePoint right into a centralized index that powers the pure language understanding and generative skills of Amazon Q.

Clients admire that Amazon Q Enterprise securely connects to over 40 information sources. Whereas utilizing their information supply, they need higher visibility into the doc processing lifecycle throughout information supply sync jobs. They need to know the standing of every doc they tried to crawl and index, in addition to the flexibility to troubleshoot why sure paperwork weren’t returned with the anticipated solutions. Moreover, they need entry to metadata, timestamps, and entry management lists (ACLs) for the listed paperwork.

We’re happy to announce a brand new function now out there in Amazon Q Enterprise that considerably improves visibility into information supply sync operations. The most recent launch introduces a complete document-level report integrated into the sync historical past, offering directors with granular indexing standing, metadata, and ACL particulars for each doc processed throughout a knowledge supply sync job. This enhancement to sync job observability allows directors to shortly examine and resolve ingestion or entry points encountered whereas organising an Amazon Q Enterprise software. The detailed doc stories are persevered within the new SYNC_RUN_HISTORY_REPORT log stream beneath the Amazon Q Enterprise software log group, so essential sync job particulars can be found on-demand when troubleshooting.

Lifecycle of a doc in a knowledge supply sync run job

On this part, we study the lifecycle of a doc inside a knowledge supply sync in Amazon Q Enterprise. This offers beneficial perception into the sync course of. The information supply sync includes three key phases: crawling, syncing, and indexing. Crawling includes the connector connecting to the information supply and extracting paperwork assembly the outlined sync scope based on the information supply configuration. These paperwork are then synced to Amazon Q Enterprise through the syncing section. Lastly, indexing makes the synced paperwork searchable throughout the Amazon Q Enterprise setting.

The next diagram exhibits a flowchart of a sync run job.

Crawling stage

The primary stage is the crawling stage, the place the connector crawls all paperwork and their metadata from the information supply. Throughout this stage, the connector additionally compares the checksum of the doc towards the Amazon Q index to determine if a specific doc must be added, modified, or deleted from the index. This operation corresponds to the CrawlAction discipline within the sync run historical past report.

If the doc is unmodified, it’s marked as UNMODIFIED and skipped in the remainder of the phases. If any doc fails within the crawling stage, for instance as a consequence of throttling errors, damaged content material, or if the doc dimension is just too huge, that doc is marked as failed within the sync run historical past report with the CrawlStatus as FAILED. If the doc was skipped as a consequence of any validation errors, its CrawlStatus is marked as SKIPPED. These paperwork will not be despatched ahead to the subsequent stage. All profitable paperwork are marked as SUCCESS and are despatched ahead.

We additionally seize the ACLs and metadata on every doc on this stage to have the ability to add it to the sync run historical past report.

Syncing stage

Through the syncing stage, the doc is shipped to Amazon Q Enterprise ingestion service APIs like BatchPutDocument and BatchDeleteDocument. After a doc is submitted to those APIs, Amazon Q Enterprise runs validation checks on the submitted paperwork. If any doc fails these checks, its SyncStatus is marked as FAILED. If there’s an irrecoverable error for a specific doc, it’s marked as SKIPPED and different paperwork are despatched ahead.

Indexing stage

On this step, Amazon Q Enterprise parses the doc, processes it based on its content material kind, and persists it within the index. If the doc fails to be persevered, its IndexStatus is marked as FAILED; in any other case, it’s marked as SUCCESS.

After the statuses of all of the phases have been captured, we emit these statuses as an Amazon Cloudwatch occasion to the shopper’s AWS account.

Key options and advantages of document-level stories

The next are the important thing options and advantages of the brand new doc stage report in Amazon Q Enterprise purposes:

Enhanced sync run historical past web page – A brand new Actions column has been added to the sync run historical past web page, offering entry to the document-level report for every sync run.
Devoted log stream – A brand new log stream named SYNC_RUN_HISTORY_REPORT has been created within the Amazon Q Enterprise CloudWatch log group, containing the document-level report.
Complete doc data – The document-level report consists of the next data for every doc.
Doc ID – That is the doc ID that’s inherited instantly from the information supply or mapped by the shopper within the information supply discipline mappings.
Doc title – The title of the doc is taken from the information supply or mapped by the shopper within the information supply discipline mappings.
Consolidated doc standing (SUCCESS, FAILED, or SKIPPED) – That is the ultimate consolidated standing of the doc. It will possibly have a price of SUCCESS, FAILED, or SKIPPED. If the doc was efficiently processed in all phases, then the worth is SUCCESS. If the doc has failed or was skipped in any of the phases, then the worth of this discipline will likely be FAILED or SKIPPED.
Error message (if the doc failed) – This discipline accommodates the error message with which a doc failed. If a doc was skipped as a consequence of throttling errors, or any inner errors, this will likely be proven within the error message discipline.
Crawl standing – This discipline denotes whether or not the doc was crawled efficiently from the information supply. This standing correlates to the syncing-crawling state within the information supply sync.
Sync standing – This discipline denotes whether or not the doc was despatched for syncing efficiently. This correlates to the syncing-indexing state within the information supply sync.
Index standing – This discipline denotes whether or not the doc was efficiently persevered within the index.
ACLs – This discipline accommodates a listing of document-level permissions that had been crawled from the information supply. The small print of every ingredient within the checklist are:
- International title: That is the e-mail/username of the person. This discipline is mapped throughout a number of information sources. For instance, if a person has 3 information sources – Confluence, Sharepoint and Gmail with the native person ID as confluence_user, sharepoint_user and gmail_user respectively, and their e-mail tackle person@e-mail.com is the globalName within the ACL for all of them; then Amazon Q Enterprise understands that every one of those native person IDs map to the identical international title.
- Title: That is the native distinctive ID of the person which is assigned by the information supply.
- Kind: This discipline signifies the principal kind. This may be both USER or GROUP.
- Is Federated: This can be a boolean flag which signifies whether or not the group is of INDEX stage (true) or DATASOURCE stage (false).
- Entry: This discipline signifies whether or not the person has entry allowed or denied explicitly. Values could be both ALLOWED or DENIED.
- Knowledge supply ID: That is the information supply ID. For federated teams (INDEX stage), this discipline will likely be null.
Metadata – This discipline accommodates the metadata fields (aside from ACL) that had been pulled from the information supply. This checklist additionally consists of the metadata fields mapped by the shopper within the information supply discipline mappings in addition to additional metadata fields added by the connector.
Hashed doc ID (for troubleshooting help) – To safeguard your information privateness, we current a safe, one-way hash of the doc identifier. This encrypted worth allows the Amazon Q Enterprise group to effectively find and analyze the precise doc inside our logs, must you encounter any problem that requires additional investigation and determination.
Timestamp – The timestamp signifies when the doc standing was logged in CloudWatch.

Within the following sections, we discover completely different use circumstances for the logging function.

Troubleshoot “Sorry, I couldn’t discover related data” with the new logging feature

The brand new document-level logging function in Amazon Q Enterprise may help troubleshoot frequent points associated to the “Sorry, I couldn’t discover related data to finish your request” response.

Let’s discover an instance state of affairs. A mutual funds supervisor makes use of Amazon Q Enterprise chat for information retrieval and insights extraction throughout their enterprise information shops. When the fund supervisor asks, “What’s the CAGR of the multi-asset fund?” within the Amazon Q chat, they obtain the “Sorry, I couldn’t discover related data to finish your request” response.

Because the administrator managing their Amazon Q Enterprise software, you’ll be able to troubleshoot the difficulty utilizing the next strategy with the brand new logging function. First, you need to decide whether or not the multi-asset fund doc was efficiently listed within the Amazon Q Enterprise software. Subsequent, it’s essential confirm if the fund supervisor’s person account has the required permission to learn the data from the multi-asset fund doc. Amazon Q Enterprise enforces the doc permissions configured in its information supply, and you should utilize this new function to confirm that the doc ACL settings are synced within the Amazon Q Enterprise software index.

You should utilize the next CloudWatch question string to examine the doc ACL settings:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and DocumentTitle = "your-document-title"
| fields DocumentTitle, ConnectorDocumentStatus.Standing, Acl
| kind @timestamp desc
| restrict 1

This question filter makes use of the per-document-level logging stream SYNC_RUN_HISTORY_REPORT, and shows the doc title and its related ACL settings. By verifying the doc indexing and permissions, you’ll be able to determine and resolve potential points which may be inflicting the “Sorry, I couldn’t discover related data” response.

The next screenshot exhibits an instance consequence.

Decide the optimum boosting length for current paperwork in utilizing document-level reporting

In terms of producing correct solutions, it’s possible you’ll need to fine-tune the way in which Amazon Q prioritizes its content material. For example, it’s possible you’ll desire to spice up current paperwork over older ones to verify essentially the most up-to-date passages are used to generate a solution. To realize this, you should utilize the enterprise’s relevance tuning function in Amazon Q Enterprise to spice up paperwork primarily based on the final replace date attribute, with a specified boosting length. Nevertheless, figuring out the optimum boosting interval could be difficult when coping with a lot of continuously altering paperwork.

Now you can use the per-document-level report back to get hold of the _last_updated_at metadata discipline data in your paperwork, which may help you identify the suitable boosting interval. For this, you employ the next CloudWatch Logs Insights question to retrieve the _last_updated_at metadata attribute for machine studying paperwork from the SYNC_RUN_HISTORY_REPORT log stream:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Metadata like 'Machine Studying'
| parse Metadata '{"key":"_last_updated_at","worth":{"dateValue":"*"}}' as @last_updated_at
| kind @last_updated_at desc, @timestamp desc
| dedup DocumentTitle

With the previous question, you’ll be able to acquire insights into the final up to date timestamps of your paperwork, enabling you to make knowledgeable choices concerning the optimum boosting interval. This strategy makes positive your chat responses are generated utilizing the latest and related data, enhancing the general accuracy and effectiveness of your Amazon Q Enterprise implementation.

The next screenshot exhibits an instance consequence.

Widespread doc indexing observability and troubleshooting strategies

On this part, we discover some frequent admin duties for observing and troubleshooting doc indexing utilizing the brand new document-level reporting function.

Checklist all efficiently listed paperwork from a information supply

To retrieve a listing of all paperwork which have been efficiently listed from a selected information supply, you should utilize the next CloudWatch question:

fields DocumentTitle, DocumentId, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/'
and ConnectorDocumentStatus.Standing = "SUCCESS"
| kind @timestamp desc | dedup DocumentTitle, DocumentId

The next screenshot exhibits an instance consequence.

Checklist all efficiently listed paperwork from a information supply sync job

To retrieve a listing of all paperwork which have been efficiently listed throughout a selected sync job, you should utilize the next CloudWatch question:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Standing AS IndexStatus, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Standing = "SUCCESS"
| kind DocumentTitle

The next screenshot exhibits an instance consequence.

Checklist all failed listed paperwork from a information supply sync job

To retrieve a listing of all paperwork that did not index throughout a selected sync job, together with the error messages, you should utilize the next CloudWatch question:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Standing AS IndexStatus, ErrorMsg, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Standing = "FAILED"
| kind @timestamp desc

The next screenshot exhibits an instance consequence.

Checklist all paperwork that accommodates a specific person title ACL permission from an Amazon Q Enterprise software

To retrieve a listing of paperwork which have a selected person’s ACL permission, you should utilize the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Acl like 'aneesh@mydemoaws.onmicrosoft.com'
| show DocumentTitle, SourceUri

The next screenshot exhibits an instance consequence.

Checklist the ACL of an listed doc from a information supply sync job

To retrieve the ACL data for a selected listed doc from a sync job, you should utilize the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| show DocumentTitle, Acl

The next screenshot exhibits an instance consequence.

Checklist metadata of an listed doc from a information supply sync job

To retrieve the metadata data for a selected listed doc from a sync job, you should utilize the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| show DocumentTitle, Metadata

The next screenshot exhibits an instance consequence.

Conclusion

The newly launched document-level report in Amazon Q Enterprise offers enhanced visibility and observability into the doc processing lifecycle throughout information supply sync jobs. This function addresses a essential want expressed by prospects for higher troubleshooting capabilities and entry to detailed details about the indexing standing, metadata, and ACLs of particular person paperwork.

The document-level report is saved in a devoted log stream named SYNC_RUN_HISTORY_REPORT throughout the Amazon Q Enterprise software CloudWatch log group. This report accommodates complete data for every doc, together with the doc ID, title, general doc sync standing, error messages (if any), together with its ACLs, and metadata data retrieved from the information sources. The information supply sync run historical past web page now consists of an Actions column, offering entry to the document-level report for every sync run. This function considerably improves the flexibility to troubleshoot points associated to doc ingestion and entry management, and points associated to metadata relevance, and offers higher visibility concerning the paperwork synced with an Amazon Q index.

To get began with Amazon Q Enterprise, discover the Getting began information. To be taught extra about information supply connectors and finest practices, see Configuring Amazon Q Enterprise information supply connectors.

In regards to the authors

Aneesh Mohan is a Senior Options Architect at Amazon Internet Companies (AWS), bringing twenty years of expertise in creating impactful options for business-critical workloads. He’s keen about expertise and loves working with prospects to construct well-architected options, specializing in the monetary providers business, AI/ML, safety, and information applied sciences.

Ashwin Shukla is a Software program Improvement Engineer II on the Amazon Q for Enterprise and Amazon Kendra engineering group, with 6 years of expertise in creating enterprise software program. On this position, he works on designing and creating foundational options for Amazon Q for Enterprise.