landing page

Many Amazon Web Services (AWS) customers require a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. A data lake is a new and increasingly popular way to store and analyze data because it allows companies to manage multiple data types from a wide variety of sources, and store this data, structured and unstructured, in a centralized repository.

The AWS Cloud provides many of the building blocks required to help customers implement a secure, flexible, and cost-effective data lake. These include AWS managed services that help ingest, store, find, process, and analyze both structured and unstructured data. To support our customers as they build data lakes, AWS offers the data lake solution, which is an automated reference implementation that deploys a highly available, cost-effective data lake architecture on the AWS Cloud along with a user-friendly console for searching and requesting datasets.

The following figures illustrate sample simplified AWS Data Lake Architectures.

This example focuses on real-time predictive analytics with AWS. In this example, the app writes to a DynamoDB table, which generates an event. The event triggers a Lambda function that initiates an analytic query using the Amazon Machine Learning real-time API. That prediction is stored back in DynamoDB and is available for the application to read. A useful example for this is a factory or a warehouse where sensor data tracked in DynamoDB is used to anticipate potential safety or productivity issues. Predicting these issues might require more than simply monitoring for a specific condition or set of conditions; if so, Amazon Machine Learning could be used to find patterns in the data and predict catastrophes before they occur.

AWS sample data Platform Architectures

Need help with your AWS data platform project?