AWS Glue 5.1 Now Generally Available with Performance Improvements and Expanded Data Lake Capabilities
Table of Contents
AWS Glue 5.1 is now generally available, bringing enhancements to performance, security, and support for open data formats like Apache Iceberg, Hudi, and Delta Lake. This release simplifies data integration workflows with upgrades to core engines and expanded AWS Lake Formation integration. https://aws.amazon.com/about-aws/whats-new/2023/11/aws-glue-5-1-generally-available/
core Engine and Library Updates
AWS glue is a fully managed, serverless data integration service that helps users discover, prepare, and integrate data from various sources.The 5.1 release includes notable updates to the underlying engines:
* Apache Spark: Upgraded to version 3.5.6, delivering performance and security improvements. https://spark.apache.org/
* Python: Updated to version 3.11.https://www.python.org/
* Scala: Updated to version 2.12.18. https://www.scala-lang.org/
These engine upgrades are complemented by updated support for popular open table format libraries:
* Apache Hudi: Version 1.0.2. https://hudi.apache.org/
* Apache Iceberg: Version 1.10.0. https://iceberg.apache.org/
* Delta Lake: Version 3.3.2. https://delta.io/
Enhanced Apache Iceberg Support
AWS glue 5.1 introduces support for Apache Iceberg format version 3.0, unlocking new capabilities for data lake management:
* Default Column Values: Allows specifying default values for columns, improving data quality and simplifying schema evolution.
* Deletion Vectors: Enables efficient deletion of data in merge-on-read tables, optimizing query performance and storage costs.
* Multi-Argument Transforms: Provides greater flexibility in data transformation operations.
* Row Lineage Tracking: Offers improved data governance and auditability by tracking the history of data changes.
Expanded AWS Lake Formation integration
A key enhancement in AWS Glue 5.1 is the extension of AWS Lake FormationS fine-grained access control to write operations. Previously, Lake Formation access control was limited to read operations. This now includes both Data Manipulation Language (DML) and Data Definition Language (DDL) operations for Spark DataFrames and Spark SQL. https://aws.amazon.com/lake-formation/
Furthermore, AWS Glue 5.1 adds full-table access control within apache Spark for Apache Hudi and Delta Lake tables, providing a more thorough security posture for data assets.
Regional availability
AWS Glue 5.1 is currently available in the following AWS regions:
* US East (N. Virginia)
* US East (Ohio)
* US West (Oregon)
* Europe (Ireland)
* Europe (Stockholm)
* Europe (Frankfurt)
* Europe (Spain)
* Asia Pacific (Hong Kong)
* Asia Pacific (Singapore)
* Asia Pacific (Sydney)
* Asia Pacific (Tokyo)
* asia Pacific (Malaysia)
* Asia Pacific (Thailand)
* Asia Pacific (Mumbai)
* South America (São Paulo)
Conclusion
AWS Glue 5.1 represents a significant step forward in simplifying and securing data integration workflows.The combination of core engine upgrades, enhanced apache iceberg support, and expanded AWS lake Formation integration provides users with a powerful and versatile toolset for building and managing modern data lakes. As data volumes and complexity continue to grow, these improvements will be crucial for organizations looking to unlock the full potential of their data.