ipaas

Andrey Kozichev

Extract Data from Postgres with an iPaas

For a data analyst, exporting data from a Postgres SQL Database is quite a typical task. They can export the entire table or the results of a query to a CSV file with the COPY TO command.

However, this can become quite a challenging task in the Enterprise organisation.

Databases naturally have a much stricter level of access than any other resources, so usually, they will be deployed behind a firewall and only a handful of people in DevOps or a DBA will have access.

Arranging VPN access and sufficient role privileges in the DB is something really hard to get (unless you work for one of the above teams).

Even if an analyst gets access to the DB, it is rare when a single SQL query is enough. Most often there will be several queries you need to run on schedule, format the output and save it as a reports, which will need to be emailed or uploaded to shared storage. Even more complex if it needs to be later ingested into another tool for analysis.

These requirements create myriads of cronjobs spread across many hosts and scripts written in an infinite number of programming languages. If done properly this trivial task often becomes a massive project in itself.

That’s where we can use iPass such as Kassette to process Postgres Data.

Example

To demonstrate how Kassette can crunch Postgres Data, let’s have a look at the example provided. To get it running pull the source of the kassette-server and fire up the docker-compose example provided:

git clone git@github.com:kassette-ai/kassette-server.git 

cd examples/postgres2postgres 

docker compose up

Several containers will be spun up

postgres

Postgres DataBase instance with multiple schemas:
Kassette – Kassette configuration and operational Database
Warehouse – Database used for simulating a destination Postgres DB used for exporting the data

kassette-server

Kassette’s main component which is responsible for processing, transformation and submission of the data pulled from Source. API runs on localhost:8088

kassette-transformer

UI Admin interface allows configuring and controlling kassette-server. Can be accessed via http://localhost:3000

kassette-postgres-agent

Kassette Postgres Agent which runs pre-configured SQL “select * from service_catalogue;” on kassette Database and submits result to the kassette-server. The schedule runs every minute.

grafana

Configured instance of grafana which will show a simple graph displaying the extracted data. Can be accessed via http://localhost:3001

Now check the configuration of the configured Source and Destination by accessing UI via http://localhost:3000

Both Source and Destination are configured as Postgres and schemas are matching. If different types of fields are defined, Kassette will attempt to convert data types. The actual table has more columns, if not specified in the schema they will be ignored.

Source Config:

Destination Config:

Configuration of the kassette-postgres-agent is quite straightforward:

...
queries:
  users:
    sql: "select * from service_catalogue;"
    schedule: "* * * * *"

In addition to the cron schedule, it supports two modes of “tailing” tables, via ID and via Timestamp for event-type Data.

After several minutes the data extracted from the Source will start appearing in the Destination which can be seen in Grafana http://localhost:3001

The example demonstrates how easy can be to extract the data from Postgres. We used the same Postgres as a destination for this proof of concept. In real scenarios, destinations can be configured as one of the SAAS tools available in the Kassette Service Catalogue or simply as a CSV file sent via email or uploaded to S3.

Create ETL jobs, run reports on schedule, and continuously extract event logs from Postgres. Try to experiment with the Data Types. Change Source Database and fields being captured. Development of Kassette is actively ongoing so be sure to raise issues if you come across them!

iPaaS transforms data extraction into a seamless and secure operation, simplifying the process, ensuring repeatability, and fostering reliability in modern integration landscapes.

Andrey Kozichev

Subscribe for the latest blogs and news updates!

ipaas

Nov 10, 2023

Choosing an iPaas

An iPaas is a great option to cut down integration time and introduce agility to your data organisation. This field is destined to grow and new players will emerge.

Extract Data from Postgres with an iPaas

Example

Andrey Kozichev

Related Posts

Choosing an iPaas