Amazon SageMaker Knowledge Wrangler supplies a visible interface to streamline and speed up knowledge preparation for machine studying (ML), which is commonly essentially the most time-consuming and tedious process in ML tasks. Amazon SageMaker Canvas is a low-code no-code visible interface to construct and deploy ML fashions with out the necessity to write code. Based mostly on clients’ suggestions, we’ve mixed the superior ML-specific knowledge preparation capabilities of SageMaker Knowledge Wrangler inside SageMaker Canvas, offering customers with an end-to-end, no-code workspace for making ready knowledge, and constructing and deploying ML fashions.
By abstracting away a lot of the complexity of the ML workflow, SageMaker Canvas allows you to put together knowledge, then construct or use a mannequin to generate extremely correct enterprise insights with out writing code. Moreover, making ready knowledge in SageMaker Canvas provides many enhancements, similar to web page hundreds as much as 10 occasions sooner, a pure language interface for knowledge preparation, the flexibility to view the info measurement and form at each step, and improved change and reorder transforms to iterate on a knowledge move. Lastly, you may one-click create a mannequin in the identical interface, or create a SageMaker Canvas dataset to fine-tune basis fashions (FMs).
This submit demonstrates how one can convey your current SageMaker Knowledge Wrangler flows—the directions created when constructing knowledge transformations—from SageMaker Studio Basic to SageMaker Canvas. We offer an instance of shifting recordsdata from SageMaker Studio Basic to Amazon Easy Storage Service (Amazon S3) as an intermediate step earlier than importing them into SageMaker Canvas.
Resolution overview
The high-level steps are as follows:
- Open a terminal in SageMaker Studio and duplicate the move recordsdata to Amazon S3.
- Import the move recordsdata into SageMaker Canvas from Amazon S3.
Conditions
On this instance, we use a folder known as data-wrangler-classic-flows
as a staging folder for migrating move recordsdata to Amazon S3. It isn’t essential to create a migration folder, however on this instance, the folder was created utilizing the file system browser portion of SageMaker Studio Basic. After you create the folder, take care to maneuver and consolidate related SageMaker Knowledge Wrangler move recordsdata collectively. Within the following screenshot, three move recordsdata vital for migration have been moved into the folder data-wrangler-classic-flows,
as seen within the left pane. One in every of these recordsdata, titanic.move
, is opened and visual in the appropriate pane.
Copy move recordsdata to Amazon S3
To repeat the move recordsdata to Amazon S3, full the next steps:
- To open a brand new terminal in SageMaker Studio Basic, on the File menu, select Terminal.
- With a brand new terminal open, you may provide the next instructions to repeat your move recordsdata to the Amazon S3 location of your selecting (changing NNNNNNNNNNNN along with your AWS account quantity):
The next screenshot reveals an instance of what the Amazon S3 sync course of ought to appear to be. You’re going to get a affirmation in spite of everything recordsdata are uploaded. You may alter the previous code to fulfill your distinctive enter folder and Amazon S3 location wants. When you don’t need to create a folder, whenever you enter the terminal, merely skip the change listing (cd
) command, and all move recordsdata in your total SageMaker Studio Basic file system shall be copied to Amazon S3, no matter origin folder.
After you add the recordsdata to Amazon S3, you may validate that they’ve been copied utilizing the Amazon S3 console. Within the following screenshot, we see the unique three move recordsdata, now in an S3 bucket.
Import Knowledge Wrangler move recordsdata into SageMaker Canvas
To import the move recordsdata into SageMaker Canvas, full the next steps:
- On the SageMaker Studio console, select Knowledge Wrangler within the navigation pane.
- Select Import knowledge flows.
- For Choose a knowledge supply, select Amazon S3.
- For Enter S3 endpoint, enter the Amazon S3 location you used earlier to repeat recordsdata from SageMaker Studio to Amazon S3, then select Go. You can too navigate to the Amazon S3 location utilizing the browser beneath.
- Choose the move recordsdata to import, then select Import.
After you import the recordsdata, the SageMaker Knowledge Wrangler web page will refresh to point out the newly imported recordsdata, as proven within the following screenshot.
Use SageMaker Canvas for knowledge transformation with SageMaker Knowledge Wrangler
Select one of many flows (for this instance, we select titanic.move
) to launch the SageMaker Knowledge Wrangler transformation.
Now you may add analyses and transformations to the info move utilizing a visible interface (Speed up knowledge preparation for ML in Amazon SageMaker Canvas) or pure language interface (Use pure language to discover and put together knowledge with a brand new functionality of Amazon SageMaker Canvas).
If you’re proud of the info, select the plus signal and select Create mannequin, or select Export to export the dataset to construct and use ML fashions.
Alternate migration methodology
This submit has supplied steering on utilizing Amazon S3 emigrate SageMaker Knowledge Wrangler move recordsdata from a SageMaker Studio Basic surroundings. Section 3: (Optionally available) Migrate knowledge from Studio Basic to Studio supplies a second methodology that makes use of your native machine to switch the move recordsdata. Moreover, you may obtain single move recordsdata from the SageMaker Studio tree management to your native machine, then import them manually in SageMaker Canvas. Select the tactic that fits your wants and use case.
Clear up
If you’re performed, shut down any working SageMaker Knowledge Wrangler functions in SageMaker Studio Basic. To avoid wasting prices, you too can take away any move recordsdata from the SageMaker Studio Basic file browser, which is an Amazon Elastic File System (Amazon EFS) quantity. You can too delete any of the intermediate recordsdata in Amazon S3. After the move recordsdata are imported into SageMaker Canvas, the recordsdata copied to Amazon S3 are not wanted.
You may sign off of SageMaker Canvas whenever you’re performed, then relaunch it whenever you’re prepared to make use of it once more.
Conclusion
Migrating your current SageMaker Knowledge Wrangler flows to SageMaker Canvas is a simple course of that lets you use the superior knowledge preparations you’ve already developed whereas profiting from the end-to-end, low-code no-code ML workflow of SageMaker Canvas. By following the steps outlined on this submit, you may seamlessly transition your knowledge wrangling artifacts to the SageMaker Canvas surroundings, streamlining your ML tasks and enabling enterprise analysts and non-technical customers to construct and deploy fashions extra effectively.
Begin exploring SageMaker Canvas immediately and expertise the facility of a unified platform for knowledge preparation, mannequin constructing, and deployment!
In regards to the Authors
Charles Laughlin is a Principal AI Specialist at Amazon Net Providers (AWS). Charles holds an MS in Provide Chain Administration and a PhD in Knowledge Science. Charles works within the Amazon SageMaker service group the place he brings analysis and voice of the shopper to tell the service roadmap. In his work, he collaborates day by day with numerous AWS clients to assist remodel their companies with cutting-edge AWS applied sciences and thought management.
Dan Sinnreich is a Sr. Product Supervisor for Amazon SageMaker, targeted on increasing no-code / low-code providers. He’s devoted to creating ML and generative AI extra accessible and making use of them to resolve difficult issues. Outdoors of labor, he will be discovered enjoying hockey, scuba diving, and studying science fiction.
Huong Nguyen is a Sr. Product Supervisor at AWS. She is main the ML knowledge preparation for SageMaker Canvas and SageMaker Knowledge Wrangler, with 15 years of expertise constructing customer-centric and data-driven merchandise.
Davide Gallitelli is a Specialist Options Architect for AI/ML within the EMEA area. He’s based mostly in Brussels and works intently with buyer all through Benelux. He has been a developer since very younger, beginning to code on the age of seven. He began studying AI/ML in his later years of college, and has fallen in love with it since then.get affirmation