Have you encountered tasks in your organisation that can only be handled by specific individuals? These are typically non-urgent and can wait for a few days or even a week. Even if the responsible person goes on vacation, the problem quietly awaits their return. These tasks are often perceived as boring and labour-intensive, making them less appealing to others.
DO NOT IGNORE IT!
This type of activity, hidden from the eyes of peers, is usually a ticking bomb waiting to explode
Lack of accountability
Absence of change tracking
Lack of documentation
Violation of company policies
The role of every leader in the organisation is to identify it and take it down.
Until recently, the data industry offered an ideal environment for the flourishing of shadow IT and cottage industries, and for a good reason.
With no well-defined structure or established framework and approved tooling Engineers had to build ETL jobs and reporting dashboards with what they had in their disposal:
and many other ad-hoc tools
Every engineer went through at least one project in their career of “Migrating cronjobs”.
This is why adopting the right tooling and implementing a framework around Data Engineering practices are crucial. Documenting how data can be extracted and accessed, and providing the necessary tools, is the key to successful Data Engineering practices.
With the widespread adoption of Public Cloud, every user can have their own Data Lab at their fingertips. The beauty of adopting Cloud services is that they can grow along with the organization. Starting from simple UI-managed logic, you can evolve into a full-blown infrastructure-as-code pipeline with CI/CD and testing.
Don’t trust anyone? No problem! Build it yourself and host it on-prem. There are many excellent open-source products in this space.
"But we are too small for this. Our bash cronjobs work just fine." - Not a problem! Just get in touch when you finally decide to migrate them. I have the experience :)
Some of my favourites:
.NET applications in Java/Linux shop
internal web portal written in PHP + bash
bash python “backticks” jobs
hundreds of unmanaged cron-jobs scattered across random unmanaged servers
bash scripts calling a combination of java, python and perl
tools using 3rd party saas services with personal accounts of people who left the org
numerous bash scripts which run ad-hoc reports and needed to be modified for each run