Airbyte Data Pipelines as code
Utilising Low-Code Data Replication Tools such as Airbyte provides immediate advantages to Data Analytics Teams.
Setting up new Data pipelines takes minutes, and the low-code connector builder allows for easy extraction of Data even from APIs without official connectors available.
This makes Data Analytics Teams self-sufficient, removing dependency on DevOps Engineers, allowing for quick prototyping, and speeding up the overall development and data integration process.
However, the GUI nature of Airbyte brings some trade-offs, such as the inability to peer-review changes, lack of configuration versioning, limited testing of new parameters, and inevitable environment divergence.
This is where Terraform comes in handy. Terraform has become the de facto standard for cloud infrastructure development. Airbyte also offers its own Terraform provider, providing all enterprise features for your data pipelines.
The central Data Engineering Team develops and manages connectors with Terraform, allowing users to consume pre-configured connectors in their data pipelines.
End users no longer need access to database passwords or need to configure connectors themselves.
The Data Engineering Team can easily roll out new connectors to all environments and make configuration or credential changes with a simple pull request.
Airbyte servers can be easily rebuilt, or jobs can be moved into the cloud, and all connections can be reconfigured from Terraform in a matter of minutes.
Before we start with Terraform, we need to set up our instance to enable access to the API.
Airbyte runs the API on port 8006 on instances using Docker(docker.io/airbyte/airbyte-api-server ). In Kubernetes deployment it is exposed via Service airbyte-api-server-svc.
You can connect to it via:
You will need to make sure that this endpoint is available to your terraform. If you are running it in the cloud and planning to connect remotely it is a good idea to protect it with a password and ssl certificates. In Kubernetes Environment this can be done via Ingress controller.
Configuring terraform provider:
Creating sources and destinations is quite easy.
Example. Replication job to pull csv files from Azure storage account and push into Postgres.
What if you already created bunch of connections? You don't have to re-create them. Just import it into a state file and reverse-populate the configuration:
The ID of the Resource to import you can find in the browser URL.
Related Posts
finance
Sep 4, 2024
For any financial organisation, being able to access all relevant client data quickly is not just a competitive advantage in the current market - it’s an absolute necessity for the company’s survival.
fivetran
Jun 11, 2024
Incremental sync and truncating the data in raw tables can improve the performance of your syncs.