Airbyte Data Pipelines as code
Utilising Low-Code Data Replication Tools such as Airbyte provides immediate advantages to Data Analytics Teams.
Setting up new Data pipelines takes minutes, and the low-code connector builder allows for easy extraction of Data even from APIs without official connectors available.
This makes Data Analytics Teams self-sufficient, removing dependency on DevOps Engineers, allowing for quick prototyping, and speeding up the overall development and data integration process.
However, the GUI nature of Airbyte brings some trade-offs, such as the inability to peer-review changes, lack of configuration versioning, limited testing of new parameters, and inevitable environment divergence.
This is where Terraform comes in handy. Terraform has become the de facto standard for cloud infrastructure development. Airbyte also offers its own Terraform provider, providing all enterprise features for your data pipelines.
The central Data Engineering Team develops and manages connectors with Terraform, allowing users to consume pre-configured connectors in their data pipelines.
End users no longer need access to database passwords or need to configure connectors themselves.
The Data Engineering Team can easily roll out new connectors to all environments and make configuration or credential changes with a simple pull request.
Airbyte servers can be easily rebuilt, or jobs can be moved into the cloud, and all connections can be reconfigured from Terraform in a matter of minutes.
Before we start with Terraform, we need to set up our instance to enable access to the API.
Airbyte runs the API on port 8006 on instances using Docker(docker.io/airbyte/airbyte-api-server ). In Kubernetes deployment it is exposed via Service airbyte-api-server-svc.
You can connect to it via:
You will need to make sure that this endpoint is available to your terraform. If you are running it in the cloud and planning to connect remotely it is a good idea to protect it with a password and ssl certificates. In Kubernetes Environment this can be done via Ingress controller.
Configuring terraform provider:
Creating sources and destinations is quite easy.
Example. Replication job to pull csv files from Azure storage account and push into Postgres.
What if you already created bunch of connections? You don't have to re-create them. Just import it into a state file and reverse-populate the configuration:
The ID of the Resource to import you can find in the browser URL.
Related Posts
airbyte
Apr 24, 2024
Using a simple Airbyte job not only eliminates the need for manually passing CSV files between clients and accountants but also ensures that your data stays in sync automatically.
camunda
Mar 14, 2024
Beware: Neglecting the importance of correct timestamp format in incremental sync can lead to data chaos, system errors, and operational disruptions!