Projects

Near-Infrared (NIR) spectroscopy

At a time when data science and machine learning were still emerging fields, we faced the challenge of accurately and efficiently measuring the chemical composition of mushroom compost — a highly heterogeneous material that traditionally required labour intensive, manual chemical testing.

I set out to develop a scalable, intelligent solution that could automate and accelerate chemical analysis while maintaining exceptional levels of accuracy and adaptability across varying client requirements and sample types.

I integrated spectral data (captured through molecular vibrations using near-infrared (NIR) spectroscopy) with laboratory reference results to build and validate a predictive model. Applying early machine learning and chemometric techniques, I delivered a full development lifecycle — including data preprocessing (noise reduction, normalization), multivariate calibration, cross-validation, and operational deployment. Simultaneously, I re-engineered laboratory workflows to reduce sample variability, enhancing measurement standards and boosting the reliability of the model.

The deployed system consistently predicted unknown samples within ±1% accuracy, achieving strong generalization across all clients. Through the implementation of this predictive NIR system and the refinement of laboratory process, I reduced the operational days from five to just one per week (reserved only for periodic model updates). This transformation not only dramatically increased operational efficiency and reduced costs but also demonstrated the early potential of blending machine learning with traditional laboratory science to create scalable, data-driven solutions

BI Automation Using Tableau API and Python

The original multi-client reporting system, which I developed using PowerShell and Tableau's TabCmd utility, automated only the report download stage. While it offered some efficiency over fully manual processes, the solution lacked end-to-end automation and presented challenges for cloud migration, as PowerShell was tightly coupled to on-prem infrastructure.

To address these limitations and improve the system’s efficiency, scalability, and cloud readiness, I was tasked with reengineering the entire dashboard delivery pipeline. Rather than applying a quick fix, I conducted a full process review and applied systems thinking to streamline operations end-to-end. Working closely with analysts, we created standardized dashboard templates to serve most clients, while building flexible, custom solutions for exceptions.

I then engineered a new reporting pipeline using Python and the Tableau API, a platform-independent approach more suited to cloud environments. The new system connected data sources directly to dashboard outputs and included built-in monitoring and error-handling for robust, automated delivery.

This transformation turned a labour-intensive, partially automated process into a scalable, cloud-ready, data-driven reporting solution. It reduced analyst workload, improved delivery reliability, and enhanced system transparency through proactive monitoring and alerting.

Decommissioning of a multidimensional cube

Another client reporting system was built on a complex OLAP cube.Over time, the cube architecture became a significant bottleneck, causing slow refresh cycles, high maintenance costs, and delays in delivering client insights.

I was tasked with decommissioning the legacy OLAP cube and redesigning the ETL process to enable a faster, more scalable reporting infrastructure that could support business needs with greater agility.

Working collaboratively with a data engineer, we reengineered the ETL pipeline to directly transform and load data into a new set of flattened, business-optimized tables. we prioritized core reporting requirements to streamline schema design while modularizing a way to use additlion dimensions if needed to maintain flexibility. This shift eliminated the dependency on complex cube processing and allowed reporting tools like Tableau to query data more efficiently.

The decommissioning of the cube and implementation of the new ETL and table structures significantly reduced data refresh times, cut maintenance efforts by approximately 30%, and enhanced reporting performance. As a result, operational efficiency improved and the technical debt was reduced.

Nesstar

At the NHS, a third-party application Nesstar was used to publish indicators to their web site. However, the data ingestion process was entirely manual, leading to slow data refresh cycles and operational inefficiencies.

I was tasked with improving the reliability, speed, and accuracy of the data pipeline feeding into Nesstar. To address this, I self-initiated learning Java to reverse-engineer Nesstar’s backend processes, gaining a detailed understanding of its internal architecture. Leveraging this knowledge, I designed and implemented a custom automated data ingestion pipeline that integrated directly with the system’s internal structures, bypassing the manual upload interface. I managed the full project lifecycle — from system analysis and automation design through to deployment in a live production environment, ensuring minimal disruption to ongoing operations.

The automation reduced data loading times from several days to a few hours, dramatically improving data refresh rates and operational efficiency. It also freed up technical staff for more strategic work, minimized human error, and laid the groundwork for a more scalable and resilient data publishing process.

Page updated

Google Sites

Report abuse