Acquire and historicise data into Snowflake using Fivetran

Cloud has become the default choice for many organisations when they decide to build a data platform or modernise their existing ones. Cloud data platforms are so ubiquitous nowadays that even those who used to emphasise on imaginary terms such as "vendor lock-in" can't defend building an on-premise platform from scratch anymore. But not all … Continue reading Acquire and historicise data into Snowflake using Fivetran →

Featured

An Introduction to Automated Schema Evolution for BigQuery

Everything changes and nothing stays still. Even the source systems generating data across the organisations (shocking!!), which means the schema of the downstream data stores need to evolve accordingly. Schema evolution refers to the ability of downstream systems such as data warehouses to be able to adapt to the changes in the structure of data … Continue reading An Introduction to Automated Schema Evolution for BigQuery →

Control IoT Devices Using Scala on Databricks (Based on ML Model Output)

Featured

A few weeks ago I did a talk at AI Bootcamp here in Melbourne on how we can build a serverless solution on Azure that would take us one step closer to powering industrial machines with AI, using the same technology stack that is typically used to deliver IoT analytics use cases. I demoed a … Continue reading Control IoT Devices Using Scala on Databricks (Based on ML Model Output) →

Stream IoT sensor data from Azure IoT Hub into Databricks Delta Lake

Featured

IoT devices produce a lot of data very fast. Capturing data from all those devices, which could be at millions, and managing them is the very first step in building a successful and effective IoT platform. Like any other data solution, an IoT data platform could be built on-premise or on cloud. I'm a huge … Continue reading Stream IoT sensor data from Azure IoT Hub into Databricks Delta Lake →

AWS Glue Part 3: Automate Data Onboarding for Your AWS Data Lake

Choosing the right approach to populate a data lake is usually one of the first decisions made by architecture teams after deciding the technology to build their data lake with. A recent trend seems to be taking over is using Spark, since it’s fast and powerful and comes with a lot of flexibilities when used … Continue reading AWS Glue Part 3: Automate Data Onboarding for Your AWS Data Lake →

AWS Glue Part 2: ETL your data and query the result in Athena

In part one of my posts on AWS Glue, we saw how Crawlers could be used to traverse data in s3 and catalogue them in AWS Athena. Glue is a serverless service that could be used to create ETL jobs, schedule and run them. In this post we'll create an ETL job using Glue, execute … Continue reading AWS Glue Part 2: ETL your data and query the result in Athena →

AWS Glue Part 1: Discover and Catalogue Data Stored in s3

Learn how to add a Crawler in AWS Glue for data that is stored in s3