data engineering
Jim Collins
Snowflake vs Postgres and Redshift
Decisions made early in the data architecture design process can have far-reaching consequences. Criteria such as which technology to choose, data volumes predicted and typical usage drives the design.
Snowflake is winning in a lot of comparisons, their growth has been exceptional in the past few years. They generally sit in the OLAP space (analytical) rather than OLTP (transactional). This comparison is for datasets to generally be >100Gb. Why would you choose Snowflake over Redshift in a typical AWS environment?
No maintenance overhead
When you go with Postgres RDS or Redshift, you have to perform maintenance to ensure your database is operational. In practice, this means a DevOps engineer has to provision and resize your cluster as it grows. This can be difficult depending on the security and team structures you have in place.
In contrast, Snowflake is a web-based solution that allows you to upload your data and query. There is no provisioning.
Better pricing
Postgres RDS and Redshift price based on data volumes. You provision your cluster and pay based on the size of the machines. Snowflake adopts a different strategy here. They have much cheaper (think S3) charging for data and then charge per query. This means that data sets that are rarely queried are much cheaper to maintain.
Performance
Snowflake offers very granular performance tuning that is not available in Postgres or Redshift. It can allocate CPU to specific queries to provide more resources at execution. Have a slow-running but ultra-important query? Snowflake lets you tune this to a low level that traditional RDBMS and Redshift don't.
Quality of Life Features
Snowflake offers very easy ways of sharing data between user accounts. If someone wants to share a particular table, they can find another user and add them as users to their table. This is all done through user accounts in Snowflake. Doing this via RDBMS is possible but much more clunky and forces users to use low-level DB admin commands. If a data engineer doesn't need that low-level access, snowflake has user-friendly tooling.
Different Architectures under the hood
Snowflake uses micro-partitioning. This makes it more efficient under the hood than Postgres or even Redshift. This improves join performance and provides a host of other benefits around query tuning.
Declining popularity of Redshift
BigQuery rules the roost when it comes to GCP. Redshift is not proving that popular to the data engineering community (sharing of compute/store resources etc.). Snowflake is where to be in the AWS ecosystem for large-scale data these days.
Conclusion
The market has largely spoken on Snowflake. It's continuing rise will drive adoption and companies continue to migrate from Postgres. When your Postgres and Redshift instances are proving difficult to size, you are continually investing in performance optimisations and your teams are complaining about performance, it's time to move!
Related Posts
data engineering
Feb 16, 2024
Developers have been tracking dependent API changes for years, we need to include data engineering as schema consumers that also need to be made aware of changes.
governance
Jan 29, 2024
Deleting data is tough and feels like a big responsibility. The best way to avoid hoarding is to not gather unnecessary data in the first place.