This project was completed as my senior capstone at NC State University in partnership with Hitachi Energy, as part of the NCSU CSC Senior Design Program (Spring 2026).
Overview
Hitachi Energy’s teams managing the traction transformer program track project data across a fragmented landscape of spreadsheets, SharePoint folders, and other disparate sources. There was no automated way to consolidate this data or surface risks, schedule slippages, and resource conflicts to the right people in an efficient and timely way.
The goal of this project was to build an end-to-end BI pipeline that ingests project tracking data from across these sources daily, runs AI analysis over the combined dataset, and delivers persona-tailored reports and email digests to different stakeholder groups, all without any manual intervention.
The system runs as a scheduled Azure Container App Job on a cron schedule. Each run syncs fresh data from SharePoint, generates insights using a GPT model, builds interactive HTML reports, and dispatches email digests to configured recipients. This is entirely hands-free.
Pipeline
flowchart TD
subgraph ingest["Ingestion"]
SP["SharePoint / OneDrive"]
B1[("bi-data")]
end
subgraph ai["AI Analysis"]
IA["Insight Agent\ngpt-5.4-mini"]
TL[/"filter · stats · cross_ref · search"/]
CP["Chart Planner\ngpt-5.4-mini"]
end
subgraph deliver["Delivery"]
RPT["HTML Report · Plotly · Matplotlib"]
B2[("bi-reports")]
EM["Email Dispatch\nAzure Comms"]
end
SP -->|"Graph API"| B1
B1 --> IA
IA <-->|"tool calls"| TL
IA --> CP
CP --> RPT
RPT --> B2
B2 -.->|"report link"| EM
IA --> EM
EM --> PM(["Project Managers"])
EM --> EN(["Engineers"])
EM --> RD(["R&D"])
EM --> OP(["Operations"])
Each scheduled run executes the following steps:
-
SharePoint sync: the pipeline authenticates against the Microsoft Graph API and downloads all CSV and Excel files from a configured OneDrive/SharePoint folder into Azure Blob Storage.
-
Data ingestion: CSV files are downloaded from the
bi-datablob container and loaded into Pandas DataFrames. The pipeline raises an error early if no data is found. -
Insight generation: a LangChain agent backed by Azure AI Foundry (GPT-5.4-mini) is given a detailed system prompt and a suite of data tools. It calls tools like
filter_rows,calculate_column_statistics,cross_reference, andsearch_textto explore the data before synthesizing prioritized, evidence-backed insights. The agent is retried automatically on Azure OpenAI 429 rate-limit errors, respecting theRetry-Afterheader. -
Chart planning: a second agent call produces a structured chart plan describing which visualizations best support the insights for each persona.
-
HTML report generation: the chart plan is executed to build interactive Plotly and Matplotlib charts, assembled into a self-contained HTML report and uploaded to the
bi-reportsblob container. -
Email dispatch: Azure Communication Services sends each persona a styled HTML email containing the AI-generated insights and a link to the HTML report.
The pipeline is persona-driven: each of the four stakeholder groups (Project Managers, Engineers, R&D, and Operations) gets a different system prompt, a different chart plan, and is dispatched to its own recipient list.
Infrastructure
All cloud resources are provisioned with Azure Bicep IaC templates and deployed via a two-job GitHub Actions workflow:
| Resource | Purpose |
|---|---|
| Azure Storage Account (HNS) | bi-data (input files) and bi-reports (HTML output) containers |
| Azure AI Foundry | GPT-5.4-mini deployment for insight and chart-plan generation |
| Azure Key Vault | Stores all secrets; never in plaintext in source control |
| Azure Communication Services | Sends insight emails from a managed Azure domain |
| Azure Container Registry | Hosts the Docker image |
| Container App Job | Runs the pipeline on a cron schedule |
| User-Assigned Managed Identity | Grants the job least-privilege access to ACR and Key Vault |
The Container App Job reads secrets at deploy time via Key Vault secret references, so no credentials are passed as environment variables at runtime.
Testing
The test suite covers three layers:
- Unit tests: pure-function tests for data utilities, logging, email templating, chart generation, and tool logic.
- Integration tests: tests against live Azure services (Storage, AI, Communication Services) gated behind environment variable checks, so they are skipped in CI if credentials are absent.
- End-to-end test: runs the full pipeline against a small fixture dataset and asserts that an email was sent and a report was uploaded.
Coverage is tracked via pytest-cov and the badge is automatically updated in CI.