This put up is co-written with Aurélien Capdecomme and Bertrand d’Aure from 20 Minutes.
With 19 million month-to-month readers, 20 Minutes is a serious participant within the French media panorama. The media group delivers helpful, related, and accessible info to an viewers that consists primarily of younger and energetic city readers. Each month, practically 8.3 million 25–49-year-olds select 20 Minutes to remain knowledgeable. Established in 2002, 20 Minutes persistently reaches greater than a 3rd (39 %) of the French inhabitants every month by means of print, net, and cellular platforms.
As 20 Minutes’s know-how crew, we’re accountable for growing and working the group’s net and cellular choices and driving revolutionary know-how initiatives. For a number of years, we now have been actively utilizing machine studying and synthetic intelligence (AI) to enhance our digital publishing workflow and to ship a related and personalised expertise to our readers. With the appearance of generative AI, and particularly giant language fashions (LLMs), we now have now adopted an AI by design technique, evaluating the appliance of AI for each new know-how product we develop.
One in all our key targets is to offer our journalists with a best-in-class digital publishing expertise. Our newsroom journalists work on information tales utilizing Storm, our customized in-house digital modifying expertise. Storm serves because the entrance finish for Nova, our serverless content material administration system (CMS). These purposes are a spotlight level for our generative AI efforts.
In 2023, we recognized a number of challenges the place we see the potential for generative AI to have a optimistic affect. These embrace new instruments for newsroom journalists, methods to extend viewers engagement, and a brand new manner to make sure advertisers can confidently assess the model security of our content material. To implement these use instances, we depend on Amazon Bedrock.
Amazon Bedrock is a totally managed service that provides a alternative of high-performing basis fashions (FMs) from main AI firms like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon Internet Providers (AWS) by means of a single API, together with a broad set of capabilities it’s essential to construct generative AI purposes with safety, privateness, and accountable AI.
This weblog put up outlines numerous use instances the place we’re utilizing generative AI to deal with digital publishing challenges. We dive into the technical elements of our implementation and clarify our resolution to decide on Amazon Bedrock as our basis mannequin supplier.
Figuring out challenges and use instances
Right this moment’s fast-paced information surroundings presents each challenges and alternatives for digital publishers. At 20 Minutes, a key purpose of our know-how crew is to develop new instruments for our journalists that automate repetitive duties, enhance the standard of reporting, and permit us to achieve a wider viewers. Primarily based on this purpose, we now have recognized three challenges and corresponding use instances the place generative AI can have a optimistic affect.
The primary use case is to make use of automation to reduce the repetitive guide duties that journalists carry out as a part of the digital publishing course of. The core work of growing a information story revolves round researching, writing, and modifying the article. Nevertheless, when the article is full, supporting info and metadata should be outlined, equivalent to an article abstract, classes, tags, and associated articles.
Whereas these duties can really feel like a chore, they’re crucial to SEO (search engine marketing) and subsequently the viewers attain of the article. If we are able to automate a few of these repetitive duties, this use case has the potential to unencumber time for our newsroom to concentrate on core journalistic work whereas rising the attain of our content material.
The second use case is how we republish information company dispatches at 20 Minutes. Like most information retailers, 20 Minutes subscribes to information companies, such because the Agence France-Presse (AFP) and others, that publish a feed of stories dispatches overlaying nationwide and worldwide information. 20 Minutes journalists choose tales related to our viewers and rewrite, edit, and increase on them to suit the editorial requirements and distinctive tone our readership is used to. Rewriting these dispatches can be needed for search engine marketing, as search engines like google rank duplicate content material low. As a result of this course of follows a repeatable sample, we determined to construct an AI-based device to simplify the republishing course of and scale back the time spent on it.
The third and last use case we recognized is to enhance transparency across the model security of our revealed content material. As a digital writer, 20 Minutes is dedicated to offering a brand-safe surroundings for potential advertisers. Content material will be categorized as brand-safe or not brand-safe primarily based on its appropriateness for promoting and monetization. Relying on the advertiser and model, various kinds of content material could be thought of acceptable. For instance, some advertisers won’t need their model to seem subsequent to information content material about delicate matters equivalent to navy conflicts, whereas others won’t need to seem subsequent to content material about medicine and alcohol.
Organizations such because the Interactive Promoting Bureau (IAB) and the International Alliance for Accountable Media (GARM) have developed complete tips and frameworks for classifying the model security of content material. Primarily based on these tips, knowledge suppliers such because the IAB and others conduct automated model security assessments of digital publishers by repeatedly crawling web sites equivalent to 20minutes.fr and calculating a model security rating.
Nevertheless, this model security rating is site-wide and doesn’t break down the model security of particular person information articles. Given the reasoning capabilities of LLMs, we determined to develop an automatic per-article model security evaluation primarily based on industry-standard tips to offer advertisers with a real-time, granular view of the model security of 20 Minutes content material.
Our technical resolution
At 20 Minutes, we’ve been utilizing AWS since 2017, and we goal to construct on prime of serverless providers at any time when attainable.
The digital publishing frontend software Storm is a single-page software constructed utilizing React and Materials Design and deployed utilizing Amazon Easy Storage Service (Amazon S3) and Amazon CloudFront. Our CMS backend Nova is carried out utilizing Amazon API Gateway and a number of other AWS Lambda features. Amazon DynamoDB serves as the first database for 20 Minutes articles. New articles and modifications to present articles are captured utilizing DynamoDB Streams, which invokes processing logic in AWS Step Capabilities and feeds our search service primarily based on Amazon OpenSearch.
We combine Amazon Bedrock utilizing AWS PrivateLink, which permits us to create a non-public connection between our Amazon Digital Non-public Cloud (VPC) and Amazon Bedrock with out traversing the general public web.
When engaged on articles in Storm, journalists have entry to a number of AI instruments carried out utilizing Amazon Bedrock. Storm is a block-based editor that permits journalists to mix a number of blocks of content material, equivalent to title, lede, textual content, picture, social media quotes, and extra, into an entire article. With Amazon Bedrock, journalists can use AI to generate an article abstract suggestion block and place it straight into the article. We use a single-shot immediate with the complete article textual content in context to generate the abstract.
Storm CMS additionally offers journalists ideas for article metadata. This consists of suggestions for acceptable classes, tags, and even in-text hyperlinks. These references to different 20 Minutes content material are crucial to rising viewers engagement, as search engines like google rank content material with related inside and exterior hyperlinks greater.
To implement this, we use a mixture of Amazon Comprehend and Amazon Bedrock to extract essentially the most related phrases from an article’s textual content after which carry out a search towards our inside taxonomic database in OpenSearch. Primarily based on the outcomes, Storm gives a number of ideas of phrases that must be linked to different articles or matters, which customers can settle for or reject.
Information dispatches turn out to be obtainable in Storm as quickly as we obtain them from our companions equivalent to AFP. Journalists can browse the dispatches and choose them for republication on 20minutes.fr. Each dispatch is manually reworked by our journalists earlier than publication. To take action, journalists first invoke a rewrite of the article by an LLM utilizing Amazon Bedrock. For this, we use a low-temperature single-shot immediate that instructs the LLM to not reinterpret the article through the rewrite, and to maintain the phrase rely and construction as related as attainable. The rewritten article is then manually edited by a journalist in Storm like some other article.
To implement our new model security function, we course of each new article revealed on 20minutes.fr. Presently, we use a single shot immediate that features each the article textual content and the IAB model security tips in context to get a sentiment evaluation from the LLM. We then parse the response, retailer the sentiment, and make it publicly obtainable for every article to be accessed by advert servers.
Classes discovered and outlook
After we began engaged on generative AI use instances at 20 Minutes, we had been shocked at how shortly we had been capable of iterate on options and get them into manufacturing. Because of the unified Amazon Bedrock API, it’s simple to change between fashions for experimentation and discover one of the best mannequin for every use case.
For the use instances described above, we use Anthropic’s Claude in Amazon Bedrock as our major LLM due to its total prime quality and, particularly, its high quality in recognizing French prompts and producing French completions. As a result of 20 Minutes content material is sort of completely French, these multilingual capabilities are key for us. We’ve discovered that cautious immediate engineering is a key success issue and we intently adhere to Anthropic’s immediate engineering assets to maximise completion high quality.
Even with out counting on approaches like fine-tuning or retrieval-augmented technology (RAG) thus far, we are able to implement use instances that ship actual worth to our journalists. Primarily based on knowledge collected from our newsroom journalists, our AI instruments save them a mean of eight minutes per article. With round 160 items of content material revealed day by day, that is already a big period of time that may now be spent reporting the information to our readers, fairly than performing repetitive guide duties.
The success of those use instances relies upon not solely on technical efforts, but in addition on shut collaboration between our product, engineering, newsroom, advertising, and authorized groups. Collectively, representatives from these roles make up our AI Committee, which establishes clear insurance policies and frameworks to make sure the clear and accountable use of AI at 20 Minutes. For instance, each use of AI is mentioned and accepted by this committee, and all AI-generated content material should endure human validation earlier than being revealed.
We imagine that generative AI continues to be in its infancy in the case of digital publishing, and we look ahead to bringing extra revolutionary use instances to our platform this 12 months. We’re at the moment engaged on deploying fine-tuned LLMs utilizing Amazon Bedrock to precisely match the tone and voice of our publication and additional enhance our model security evaluation capabilities. We additionally plan to make use of Bedrock fashions to tag our present picture library and supply automated ideas for article pictures.
Why Amazon Bedrock?
Primarily based on our analysis of a number of generative AI mannequin suppliers and our expertise implementing the use instances described above, we chosen Amazon Bedrock as our major supplier for all our basis mannequin wants. The important thing causes that influenced this resolution had been:
- Alternative of fashions: The marketplace for generative AI is evolving quickly, and the AWS method of working with a number of main mannequin suppliers ensures that we now have entry to a big and rising set of foundational fashions by means of a single API.
- Inference efficiency: Amazon Bedrock delivers low-latency, high-throughput inference. With on-demand and provisioned throughput, the service can persistently meet all of our capability wants.
- Non-public mannequin entry: We use AWS PrivateLink to determine a non-public connection to Amazon Bedrock endpoints with out traversing the general public web, making certain that we preserve full management over the info we ship for inference.
- Integration with AWS providers: Amazon Bedrock is tightly built-in with AWS providers equivalent to AWS Id and Entry Administration (IAM) and the AWS Software program Improvement Package (AWS SDK). Consequently, we had been capable of shortly combine Bedrock into our present structure with out having to adapt any new instruments or conventions.
Conclusion and outlook
On this weblog put up, we described how 20 Minutes is utilizing generative AI on Amazon Bedrock to empower our journalists within the newsroom, attain a broader viewers, and make model security clear to our advertisers. With these use instances, we’re utilizing generative AI to carry extra worth to our journalists at the moment, and we’ve constructed a basis for promising new AI use instances sooner or later.
To be taught extra about Amazon Bedrock, begin with Amazon Bedrock Assets for documentation, weblog posts, and extra buyer success tales.
Concerning the authors
Aurélien Capdecomme is the Chief Know-how Officer at 20 Minutes, the place he leads the IT growth and infrastructure groups. With over 20 years of expertise in constructing environment friendly and cost-optimized architectures, he has a powerful concentrate on serverless technique, scalable purposes and AI initiatives. He has carried out innovation and digital transformation methods at 20 Minutes, overseeing the entire migration of digital providers to the cloud.
Bertrand d’Aure is a software program developer at 20 Minutes. An engineer by coaching, he designs and implements the backend of 20 Minutes purposes, with a concentrate on the software program utilized by journalists to create their tales. Amongst different issues, he’s accountable for including generative AI options to the software program to simplify the authoring course of.
Dr. Pascal Vogel is a Options Architect at Amazon Internet Providers. He collaborates with enterprise prospects throughout EMEA to construct cloud-native options with a concentrate on serverless and generative AI. As a cloud fanatic, Pascal loves studying new applied sciences and connecting with like-minded prospects who need to make a distinction of their cloud journey.