For a data analyst, exporting data from a Postgres SQL Database is quite a typical task. They can export the entire table or the results of a query to a CSV file with the COPY TO command.
However, this can become quite a challenging task in the Enterprise organisation.
Databases naturally have a much stricter level of access than any other resources, so usually, they will be deployed behind a firewall and only a handful of people in DevOps or a DBA will have access.
Arranging VPN access and sufficient role privileges in the DB is something really hard to get (unless you work for one of the above teams).
Even if an analyst gets access to the DB, it is rare when a single SQL query is enough. Most often there will be several queries you need to run on schedule, format the output and save it as a reports, which will need to be emailed or uploaded to shared storage. Even more complex if it needs to be later ingested into another tool for analysis.
These requirements create myriads of cronjobs spread across many hosts and scripts written in an infinite number of programming languages. If done properly this trivial task often becomes a massive project in itself.
That’s where we can use iPass such as Kassette to process Postgres Data.
To demonstrate how Kassette can crunch Postgres Data, let’s have a look at the example provided. To get it running pull the source of the kassette-server and fire up the docker-compose example provided:
Several containers will be spun up
Postgres DataBase instance with multiple schemas:
Kassette – Kassette configuration and operational Database
Warehouse – Database used for simulating a destination Postgres DB used for exporting the data
Kassette’s main component which is responsible for processing, transformation and submission of the data pulled from Source. API runs on localhost:8088
UI Admin interface allows configuring and controlling kassette-server. Can be accessed via http://localhost:3000
Kassette Postgres Agent which runs pre-configured SQL “select * from service_catalogue;” on kassette Database and submits result to the kassette-server. The schedule runs every minute.
Configured instance of grafana which will show a simple graph displaying the extracted data. Can be accessed via http://localhost:3001
Now check the configuration of the configured Source and Destination by accessing UI via http://localhost:3000
Both Source and Destination are configured as Postgres and schemas are matching. If different types of fields are defined, Kassette will attempt to convert data types. The actual table has more columns, if not specified in the schema they will be ignored.
Configuration of the kassette-postgres-agent is quite straightforward:
In addition to the cron schedule, it supports two modes of “tailing” tables, via ID and via Timestamp for event-type Data.
After several minutes the data extracted from the Source will start appearing in the Destination which can be seen in Grafana http://localhost:3001
The example demonstrates how easy can be to extract the data from Postgres. We used the same Postgres as a destination for this proof of concept. In real scenarios, destinations can be configured as one of the SAAS tools available in the Kassette Service Catalogue or simply as a CSV file sent via email or uploaded to S3.
Create ETL jobs, run reports on schedule, and continuously extract event logs from Postgres. Try to experiment with the Data Types. Change Source Database and fields being captured. Development of Kassette is actively ongoing so be sure to raise issues if you come across them!