Your Data Platform is a Mess. Here's How I Built an AI-Powered Tool to Fix It (Without Asking Anyone for Permission)
A pragmatic guide for data engineers to build custom, AI-assisted tooling that actually solves their problems.. and avoids corporate red tape.
Tired of wrestling with a sprawling data platform, jumping between countless dashboards and logs just to figure out why a pipeline failed? You're not alone. I was there too, drowning in alerts and yearning for a single pane of glass to make sense of the chaos. Instead of waiting for some mythical centralized tooling team, I decided to take matters into my own hands – and with the help of modern AI, you can too. This approach is agile, driven by your actual needs, and surprisingly easy to get started, allowing you to solve your specific pain points right now
Step 1: Find Your First Itch
Before you even think about lines of code, ask yourself: what's the most annoying, time-consuming task in your day-to-day management of the data platform? What makes you sigh audibly every time you have to do it? For me, it was two things:
Seeing a clear list of failing ETLs without diving into alert storms or CloudWatch metrics. I wanted a simple, product-centric view of what was broken.
Tracking data contract violations efficiently. Manually checking logs and comparing schemas felt like a Sisyphean task.
Your "itch" might be different. Maybe it's deploying new pipelines, checking data quality, or onboarding new data sources. The key is to identify a concrete problem that you experience regularly. There's no point in building a tool for a problem that doesn't exist.
Step 2: Embrace the AI-Powered Green Field
Here's the exciting part. Building the underlying platform for your tooling used to be a significant hurdle. But with the advent of coding AI like GitHub Copilot, Cursor, or Kiro AI, this has become the easiest stage. These tools shine brightest when you're starting from scratch because there's no existing code to confound them.. only your prompt and their training. This allows them to rapidly scaffold out your application, and it's also much easier to iterate on a fresh codebase, just like a real engineer.
My recommendation? Start with the tech stack you and your team are most comfortable with. For me, as a data engineer with a penchant for shiny things, Python and Flask felt like a natural fit.
Before diving into the functionalities, take a moment to think about the Look and Feel and Navigation of your application. Trust me, spending a little time on this upfront will save you headaches later. How do you want it to look? Where should the navigation be? A simple left-side menu? Top bar?
My initial prompt to my AI coding assistant looked something like this:
Create a flask app called Data Control that is very pretty and is an eye-candy. I want the navigation to be on the left side. The first page in my navigation is called "Dashboard". When this page is loaded, it should call <an API to your preferred service to filter> and list <failing jobs> in expandable boxes. When I expand the box <call an API for more information about that service>Be prepared to iterate. You might not get it perfect on the first try. The AI might generate something close but not quite right. That's okay! Refine your prompts, provide more specific instructions, modify a bit of the code, and guide the AI towards your vision. It’s like pair programming with a super-efficient, if sometimes slightly clueless, colleague.
Step 3: Run It Locally, Solve Your Immediate Need
Now you have a basic web application. The next thought might be about hosting it somewhere central. Hold your horses. If you've been "vibe-coding" like me, the security team might have a few questions (and rightfully so) about a web app with access to your cloud credentials that hasn't been thoroughly vetted.
In the early stages, don't worry about deploying it. Just run it locally on your machine. This is secure because the application leverages your existing, already-approved credentials and isn't exposed to the network. This allows you to rapidly build and iterate without the friction of deployment pipelines and security reviews.
Pragmatic Iteration: Build When You Bleed (and Don't Forget Your Discipline)
This is where the real magic happens. Instead of trying to build a comprehensive, all-encompassing tool from the outset, adopt a truly agile approach: build when you need it.
I encountered a situation where I needed to create a new data pipeline from six different sources. I had to find sample inputs from each to create robust tests and compare them against existing data contracts. The traditional way would involve navigating through my raw data stores, running queries on various folders and subfolders – a tedious and time-consuming process.
Instead of resigning myself to this multi-hour task, I went to my local "Data Control" app. I added a new function that allowed me to browse through my data lake folders and, with a single click, display the latest five prettified JSON files within that folder, all on one screen. Suddenly, I could compare multiple datasets side-by-side, drastically reducing the time and mental effort required.
Every time you encounter a task that involves jumping between multiple services, performing repetitive inputs, or generally feeling like there must be a better way, consider if you can add a small, targeted feature to your local tool. Over time, these small additions will accumulate into a surprisingly powerful and personalized data platform management solution.
And what if you build the "wrong" thing? It doesn't matter! Because of how easy it is to make these tools with AI, it's no problem disposing of one and creating a new one for newer processes, systems, or technologies. Even when you're "vibe-coding" with AI, don't forget your discipline. You're still a software engineer at the end of the day, and you still have to read and maintain the code. Make sure you guide the AI to organize the code appropriately, ensuring it remains readable and manageable as it grows.
The Future: Centralization (Maybe)
As your tool grows in functionality and becomes an indispensable part of your workflow (and potentially your team's), that's the time to start thinking about hosting it centrally and managing access properly. By then, you'll have a tangible, working application that solves real problems, making it much easier to justify the effort and navigate security considerations
Building data platform tooling doesn't have to be a grand, top-down initiative. By embracing a pragmatic, AI-assisted, and "build-what-you-need" approach, you can start improving your daily life as a data engineer today – one small, locally-run, problem-solving tool at a time. And who knows, maybe your little side project will eventually become the envy of your entire organization.




