Imagine being a healthcare compliance officer. Your desk is piled high with thousands of pages of clinical documentation. Your job is to read through every single unstructured note to ensure that patient care meets a strict set of 11 regulatory requirements. It's a manual, mind-numbing process that's both slow and prone to human error.

What if we could teach an AI to be our tireless compliance assistant, reading and analyzing these documents 24/7 with perfect consistency?

That’s exactly what I set out to do. I built a fully automated data pipeline that uses Google's powerful AI to transform this chaotic process into a streamlined, data-driven workflow. This isn't just a fun experiment; it's a real-world solution to a costly business problem.

The ETL Pipeline: From Raw Text to Actionable Insights

This project is a classic ETL (Extract, Transform, Load) data pipeline, but with an AI brain at its heart. Here’s how it works:

  1. Extract: The pipeline starts by automatically scanning a Google Cloud Storage bucket for new, unprocessed clinical notes.
  2. Transform: This is where the magic happens. For each document, a Python script sends the text to the Google Gemini AI model. But it doesn't just ask, "Is this compliant?" Instead, it uses a detailed prompt with 11 specific questions, teaching the AI to act like a trained reviewer and provide a "Yes," "No," or "Not Applicable" answer for each one, along with its reasoning.
  3. Load: The script then parses the AI's natural language response, structures it into clean data rows, and loads it directly into a Google BigQuery database table.

The Payoff: From Days of Reading to Seconds of Querying

The result is a paradigm shift. What once took days of manual reading is now a structured dataset. Instead of searching through documents, we can ask powerful questions with simple SQL:

"Show me all patient accounts that failed compliance check #4 this week."

This structured data can then be fed into a business intelligence tool like Looker Studio to create a live compliance dashboard. Managers can see trends at a glance, identify problem areas, and drill down into specific cases, all without ever touching a raw text file.

This project demonstrates the immense value of data engineering and generative AI in a practical business setting. It’s about more than just technology; it’s about creating systems that save time, reduce errors, and unlock insights that were previously buried in mountains of unstructured data. I’ve made all the code available on GitHub for anyone who wants to explore this workflow.