elt
Andrey Kozichev
Low Code ETL Tools and Their Role in Modern Data Engineering
While low-code tools have long existed in the ETL market, Informatica, one of the oldest players in this field, has been around for more than 25 years. It's only in the past five years that the industry has rapidly picked up the pace, primarily due to the widespread adoption of SaaS services by organisations of all sizes.
Adopting low-code tools and platforms for Data Management offers clear advantages such as ease of adoption, a low entry level for non-technical personnel, and, of course, a variety of integrations provided out of the box. These factors present a compelling case not only for organisations taking their first steps in the Data Engineering space but also for well-established Data Teams.
There is a wide range of options available to choose from, spanning CDP-focused solutions like Segment or Rudderstack to more generic Data Engineering tools such as Fivetran or Airbyte. And that's not even considering entire ecosystems of tools offered by hosting platforms like Azure or AWS.
What should you consider when adopting low-code tools?
Most of them are built to function in a diverse ecosystem of disjointed tools and platforms, centered around principles of self-service, data democratization, and a low barrier of adoption for non-technical users.
Naturally, they will shine when you need to enable many people to work in parallel, for prototyping different products and technologies, or even as an ad-hoc Swiss army 'data management' knife.
But what's more interesting is to see where low code fails to deliver:
Version control: When using a GUI for connecting sources and destinations, it becomes challenging to version control the changes. While some tools allow exporting or importing configurations or even use APIs for configuring them, adopting a GUI-based management approach often makes it difficult to retrofit any source control.
Testing becomes quite challenging due to the lack of visibility and transparency into the underlying operations and the dependency on user actions. These factors, along with other challenges associated with data testing, contribute to the difficulty of the testing process.
Ongoing Support and Maintenance:
It's a misconception that when using SAAS products, you don’t have to perform any maintenance on your side. You need to stay continuously up-to-date with the SAAS provider. There will be new features, releases, and pages of changelogs that you need to read and understand the impact on your users and their pipelines.
Working with Custom or Proprietary Data Sources:
When your data comes from an internal system without an existing integration, most low-code tools allow you to write your own integration or connector. However, this often requires learning their framework, writing tests, and then maintaining it to ensure continued functionality through upgrades. If you already have a hand-cranked script or job that accomplishes the task, there might be little appetite to write a new one specifically for a new UI tool.
Handling Advanced Scenarios:
When your data arrives in a nested JSON or another format that requires manipulations before use, incorporating such complexities into an existing low-code tool can be challenging. It may even negate the benefits of using a low-code tool if users are not self-sufficient in handling these advanced scenarios.
Because of this, purely UI-driven low-code tools are not the best choice when building Data Products or other centralised Data services critical for the business.
So, what is the alternative? Do you have to choose code or low-code, or is there a golden middle?
In our opinion, yes, there is a compromise that can be achieved by combining the best features of low-code tools with established software development practices.
Many low-code players already recognize this, incorporating support for declarative languages into their products. Meanwhile, platforms like Azure and AWS are designed to be built programmatically from the start, so there are no excuses if you are using one of those.
Terraform is one of the most popular DSLs used for infrastructure management. Looking into products with their own providers will give a good idea of what can be reliably used:
https://registry.terraform.io/browse/providers?category=data-management&tier=official%2Cpartner
Besides Azure and AWS, which are favourites among 'Lego-infrastructure' enthusiasts, Fivetran, Airbyte, and Kestra are a few that we liked in this list.
Related Posts
airbyte
Mar 11, 2024
Airbyte's open-source nature aligns seamlessly with our values of commitment to transparency, flexibility, and community-driven innovation.
airbyte
Feb 27, 2024
Leveraging HL7 for NHS data analytics allows for in-depth analysis of clinical information, encompassing patient demographics, diagnoses, medications, and treatment plans. By aggregating data extracted from HL7 messages, healthcare professionals can conduct trend analysis, identify patterns, and derive valuable insights.