All Projects

PE CoPilot

A GCP-native financial data normalisation engine for private equity fund managers.

Role Full-stack Developer
Stack Python, FastAPI, GCP, Claude API
Year 2025–2026
pe-copilot-hero.jpg

Overview

PE CoPilot is a data normalisation engine built for private equity fund managers. Portfolio companies submit financial reports in wildly inconsistent formats — different labels, different accounting conventions, different file types. PE CoPilot ingests all of it and produces unified, comparable metrics through a six-layer processing pipeline powered by AI.

The Problem

In private equity, every portfolio company reports differently. One sends a Sage export as Excel, another sends Xero CSVs, a third submits PDF scans from QuickBooks. The fund team has to manually reconcile all of this into a single view just to compare performance across the portfolio. It's slow, error-prone, and doesn't scale.

The Solution

PE CoPilot automates the entire workflow. Files are uploaded via API, parsed automatically (Excel, CSV, or PDF), then run through an AI-driven mapping layer that normalises raw labels into nine canonical metrics: Revenue, Gross Profit, EBITDA, Net Income, Cash Balance, Total Debt, Net Assets, Operating Cashflow, and Headcount. Deterministic validation rules catch errors before anything reaches the dashboard.

pe-copilot-01.jpg
pe-copilot-02.jpg

How It's Built

The backend is a FastAPI application running on Google Cloud Run, containerised with Docker. Firestore handles the database layer, Cloud Storage manages file uploads, and Pub/Sub provides event-driven messaging between pipeline stages. Claude Sonnet handles the heavy lifting of financial label extraction, while Claude Haiku generates executive summaries — a tiering strategy that cut token costs by around 40–50%.

The processing pipeline has six layers: extraction (openpyxl, pandas, pdfplumber), calculation (company-specific derived metrics), AI normalisation, validation (completeness and variance checks), sanity checks (sign constraints, accounting identities), and AI summary generation.

Python 3.12 FastAPI Pydantic Google Cloud Run Firestore Cloud Storage Pub/Sub Claude API Docker

Challenges & Learnings

The biggest challenge was handling the sheer variety of financial report formats. Even within a single accounting system, companies customise their exports in unpredictable ways. Building robust parsing that could handle edge cases without breaking the pipeline required extensive testing — there are 224 automated tests covering the core logic, all running in under two seconds with mocked cloud services.

What's Next

Planned features include email ingestion (so companies can just forward reports), digest reporting, Google Sheets export, PDF report generation, OCR support for scanned documents, and multi-currency handling.

Next Project