airbyte

Andrey Kozichev

Aggregating UK Public Sector Tenders with Airbyte

Working with the Public Sector often involves dealing with various Public Frameworks. One example of such a portal is Contracts Finder and Find a Tender.

Both are typical Government Tendering portals with a simple web interface and basic functionality that allows users to search and filter tenders based on their publication date, keywords, supplier or client names etc.

Contracts Finder focuses on lower-value contracts, while Find a Tender handles higher-value contracts.

Navigating through tenders is a standard process for businesses seeking to collaborate with the UK Government. Hundreds of tenders are published daily, and the task of navigating through them is not always easy.

Although tender portals provide basic search functionality by dates and keyword matching, they don't allow users to build more complex queries, create visualizations, or aggregate data. Moreover, all portals operate in isolation.

Naturally, we want more flexibility when crunching this data. We aim to analyze historical data and also monitor all new opportunities to see if there is something interesting where our help might be needed.

The task at hand is to continually extract data from both of the above portals into a single database. With this, we can use simple BI tools to build useful views of the historical data and enhance our ability to search for new opportunities emerging daily.

After some quick research, the data available from the above portals can be extracted via API in the Open Contracting Data Standard (OCDS) release package.

Data extraction from an API is a typical task for an ETL process.

Our go-to tool for Data Pipeline is Airbyte. It's an open-source product that significantly simplifies the process of moving the data.

Airbyte has more than 300 connectors available, but none of the APIs from the above tendering portals are on the list.

And it’s not a problem. With Airbyte, creating a new source API connector is super simple. A great live example of doing this is given by Airbyte CEO Michel Tricot: https://www.youtube.com/watch?v=SYEj0MAAz7I&t=772s

We spent about 20 minutes creating a simple scaffolding for a connector that will pull details of tenders from the above URLs. The most difficult part is getting the pagination right

and expressing the date in the request with a relative value using Jinja macros ( last 1 Hour "{{ (today_utc() - duration('P0DT1H')).strftime('%Y-%m-%dT%H:%M:%SZ') }}" )

The source of sample connectors can be found in our Git uk-gov-tenders

The rest is even simpler: creating a source and destination. For destination, we will use a Postgres Database as our default 'go-to' destination.

We will use manual sync for the new connection to avoid unnecessary runs while we still creating the solution.

After the first sync, we can find all the data in the Postgres Database. Another handy feature of Airbyte is normalization, which spreads the data into normalized tables, making it ready for queries.

For analyzing and browsing the data, Metabase is a great tool. Filtering by price, searching keywords, and grouping by buyer's name or checking classification codes, we can create numerous dashboards and find some very interesting statistics.

Simply Browse:

Or visualise:

In this blog post, we created two new Source API Connectors in Airbyte and set up jobs to replicate new tenders into a local PostgreSQL database. We are looking to onboard a few more similar frameworks' APIs and use natural language processing to go through the descriptions, creating a short summary of the requirements.

Stay tuned for more, and subscribe to be the first to learn how to drive more business decisions using data.

Airbyte is the Swiss Army knife for ETL tasks

Airbyte is the Swiss Army knife for ETL tasks

Airbyte is the Swiss Army knife for ETL tasks

Andrey Kozichev

Subscribe for the latest blogs and news updates!

Related Posts

finance

Sep 4, 2024

For any financial organisation, being able to access all relevant client data quickly is not just a competitive advantage in the current market - it’s an absolute necessity for the company’s survival.

fivetran

Jun 11, 2024

Incremental sync and truncating the data in raw tables can improve the performance of your syncs.

© MetaOps 2024

© MetaOps 2024

© MetaOps 2024