AWS Glue Part 3: Automate Data Onboarding for Your AWS Data Lake

Choosing the right approach to populate a data lake is usually one of the first decisions made by architecture teams after deciding the technology to build their data lake with. A recent trend seems to be taking over is using Spark, since it’s fast and powerful and comes with a lot of flexibilities when used … Continue reading AWS Glue Part 3: Automate Data Onboarding for Your AWS Data Lake

AWS Glue Part 2: ETL your data and query the result in Athena

In part one of my posts on AWS Glue, we saw how Crawlers could be used to traverse data in s3 and catalogue them in AWS Athena. Glue is a serverless service that could be used to create ETL jobs, schedule and run them. In this post we'll create an ETL job using Glue, execute … Continue reading AWS Glue Part 2: ETL your data and query the result in Athena