data engineering

Jim Collins

Palantir and the NHS, the future of healthcare data

Why the NHS loves Palantir

Palanatir have been brought in to provide a federated data platform to the NHS. The NHS has a fragmented data landscape. Data is broken into functional silos split across hospitals, regions and departments.

Data collected by GPs is not connected to the NHS; it's stored in their own local systems. There are 200 different NHS trusts, each using different flavours of EHR (electronic health records). There are statistical outputs from hospitals (number of Accident and Emergency Visits) and many more sources of data.

It is the ultimate in data challenges.

How do you get all this into a data product to make informed decisions on where to recruit, identify trends and provide an accurate picture of healthcare? The NHS has a global advantage in that it's health system is both large and centralised, unlike the US where healthcare data is privatised.

The NHS haven't chosen to go with Palantir. A controversial one given the involvement with contrarian billionaire, Peter Thiel, but in defence of NHS procurement, they are likely the best able to make progress.


Why would you go with Palantir?

Palantir emerged from the "Paypal mafia", a group of highly successful technology companies that emerged in the early 00s.

They offer a full data engineering solution through their Foundry product. Their consultants are also widely recognised as top data engineering consultants.

The Palantir foundry platform brings data pipelines, transformations, monitoring and quality control into a low code type environment. This makes it easy to build data products, harmonising the data sets.

Data connectors are exposed to ingest data in a variety of means, flat files, apis, db transfers, csv… This is all foundry specific and best-in-class.

Also provided in the suite of products:

Ontology

Bind data to Object, Actions and Processes. It allows data assets to behave like a digital twin and exist in relationships with other assets.

Data lineage

Track where data has originated from, where/how it was transformed and where it is being used. This is an important capability to allow clients to trust data products.

Code repos

Transforms and pipelines are created and stored as code assets. This supports robust testing and version control.

Foundry BI

A Power BI equivalent specifically built to browse Palantir warehouses.


Palantir drawbacks

Lock in, of the most severe sort…

The main drawback with Palantir is the vendor lock in. Everything is bespoke to the Palantir ecosystem. The database, the pipelines, the ETL processes, connectors, scheduling. It will be close to impossible to migrate away from Palantir should the NHS sour on the contract.

Integrating regional data silos comes at huge costs and is frequently underestimated. This will become another lock-in property that will keep Palantir in the NHS for the long run.

Complex configuration

Palantir have FAANG level engineering capabilities and pay salaries to match. 200k+ and salary options to boot. This means they can create complex systems beyond most software engineering capabilities.

While providing great functionality, it means replacing this level of expertise will require eye-wateringly expensive staff to maintain. More likely, is that the NHS will be paying high consulting fees indefinitely.

Cost

The contract is already set to £330 million + £120 million extension. This is likely to go up as the scale of the data integration challenge is so vast.

Secrecy

UK Patient data should be guarded with national level security. While data will be hosted locally, there are concerns about what data sharing actually entails. Anonymised data is highly prized across the world. Given the unique position the NHS is in to collect this, the UK Government needs to be wary of what contracts and agreements are in place.


What does the rest of the field bring

If someone asked Metaops tomorrow to build a similar system, our answer would be of course! But also, be wary of the incredible scope in this project. A solution using a typical open source stack would include:

  • iPaas and Data integration solutions such as Glue to ETL data from existing systems.

  • Data Engineering teams that would be source specific i.e. teams to enable data pipelines from all disparate sources.

  • Databricks experts in place to setup staging areas and configure transforms.

  • Data catalogues and Ontology, open to the market and ideally interchangeable.

  • PySpark expertise to provide transformations.

  • DBT to create the necessary views and exports on the transformed data.

  • PowerBI as a warehouse browser.


Why might go wrong

The major problem we see with the Palantir initiative is scale. The sheer heft of moving all data sources into a single warehouse may take decades to achieve. Given the nature of the NHS, this will require serious coordination and some incredible leadership.

While the project may show quick wins, organisational inertia could be the killer here. It should not diminish the potential for progress however.

The second major barrier will be educating staff. Palantir configuration requires experienced professionals, this is likely to be a major stumbling block in the transition to BAU.

It's an exciting phase for NHS data engineering and the most important technical transformation for the UK government this decade.

The NHS haven't chosen to go with Palantir. A controversial one given the involvement with contrarian billionaire, Peter Thiel, but in defence of NHS procurement, they are best placed to make progress.

The NHS haven't chosen to go with Palantir. A controversial one given the involvement with contrarian billionaire, Peter Thiel, but in defence of NHS procurement, they are best placed to make progress.

The NHS haven't chosen to go with Palantir. A controversial one given the involvement with contrarian billionaire, Peter Thiel, but in defence of NHS procurement, they are best placed to make progress.

Jim Collins

Subscribe for the latest blogs and news updates!

Related Posts

data engineering

Feb 16, 2024

Developers have been tracking dependent API changes for years, we need to include data engineering as schema consumers that also need to be made aware of changes.

governance

Jan 29, 2024

Deleting data is tough and feels like a big responsibility. The best way to avoid hoarding is to not gather unnecessary data in the first place.

© MetaOps 2024

© MetaOps 2024

© MetaOps 2024