Amazon SageMaker Feature Store

Robert Ayub Technology 18 July 2023 Hits: 998

Amazon SageMaker Feature Store is a fully managed, purpose built repository that is used to store, share, and manage features for machine learning (ML) models. Features are the input variables into a machine learning model used during training and inference. Features are used repeatedly by multiple teams and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it’s hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store to process, standardize, and use features at scale across the ML lifecycle.

Ingestion - You can ingest data from various sources into the feature store. Some of the sources are:

Application logs
Service logs
Clickstreams
Sensors
Tabular data from S3
Amazon Redshift
AWS Lake Formation
Snowflake
Databricks Delta lake

Processing - Through feature processing, one can specify the data source and feature transformation functions (e.g. count of product views or time window aggregates) and the SageMaker feature store transforms the data at the time of ingestion into ML Features. With Amazon SageMaker DataWrangler, you can publish features directly into SageMaker Feature Store.

The Store - Feature store tags and indexes feature groups so that they are easily discoverable through the visual interface of SageMaker studio.

The catalog - allows teams to discover existing features they can confidently reuse and avoid duplication of pipelines. SageMaker Feature store uses the AWS Glue Data Catalog by default bu allows you to use a different catalog if desired. You can also query features using familiar SQL with Amazon Athena or another query tool of your choice.

Consistency - During training, models often use the complete data set and can take hours to complete, while inference needs to happen in milliseconds and usually uses a subset of the data. When used together, SageMaker Feature Store ensures that offline and online datasets remain in sync which is critical because if they diverge, it can negatively impact model accuracy.

Supports offline storage for training
Supports online storage for real-time inferences

Tracking - It is important to know how features were built and which models and endpoints are using them. SageMaker Feature Store allows data scientists to track their features in Amazon SageMaker Studio with SageMaker Lineage. SageMaker Lineage lets you track scheduled pipeline executions, visualize upstream lineage to trace features back to their data sources, and view feature processing code, all in one environment.

Time Travel - there may be need to train models with the exact set of feature values from a specific time in the past without the risk of including data from beyond that time (also referred to as feature leakage), such as patient medical data before a diagnosis. Point in time queries can be used to retrieve the state of each feature at the historical time of interest.

MLOps - Feature stores manage datasets and feature pipelines, speeding up data science tasks and eliminating the duplicate work of creating the same features multiple times.

Security & Compliance - To support security and compliance needs, you may need granular control over how shared ML features are accessed. These needs often go beyond table and column-level access control to individual row-level access control. For example, you may want to let account representatives see rows from a sales table for only their accounts and mask the prefix of sensitive data like credit card numbers. SageMaker Feature Store together with AWS Lake Formation can be used to implement fine-grained access controls to protect feature store data and grant access based on role.

Practical Implementations

Students enrolling for any AI related course from Carnegie Training Institute have access to practical and working implementation guidelines

Sources

Amazon Sagemaker Feature Store

Robert Ayub

Kenya

+254 718 758 221

robert@ayub.co.ke

+254 718 758 221

Technology

Amazon SageMaker Feature Store

Robert Ayub

Kenya

+254 718 758 221

robert@ayub.co.ke

+254 718 758 221

Technology

Amazon SageMaker Feature Store

Related Articles

What is Artificial Intelligence

Mobile App or Website?

Multipl Linear Regression