What
I've written

TFL Backbone Filtering

Network analysis of London's transport system using graph theory to identify critical infrastructure points. This project involved constructing a multi-modal network representation combining tube, bus, and rail networks.

London transport network analysis visualization
Network topology of London public transport grid

The analysis revealed bottleneck stations and redundancy patterns using centrality measures such as betweenness centrality and eigenvector centrality. Key findings highlighted the vulnerability of interchange stations during service disruptions.

Results informed transportation planning by identifying which stations would have the greatest impact on network connectivity if removed, enabling strategic investment in infrastructure resilience.

An approach to HDD

Processing and analyzing high-dimensional datasets with thousands of features using dimensionality reduction techniques. The challenge involved maintaining information integrity while reducing computational complexity.

Implemented Principal Component Analysis (PCA) and t-SNE for visualization and feature extraction. These methods proved essential for uncovering hidden patterns and clusters within multi-angle image datasets.

The reduced feature space enabled faster model training and improved generalization, demonstrating that not all dimensions contribute equally to predictive power. Strategic feature selection reduced dimensionality by 85% while maintaining 95% of original variance.

Voting system as a classification model

Developed an ensemble voting classifier that combines predictions from multiple diverse models. This approach leveraged the strengths of different algorithms—decision trees, random forests, and gradient boosting—through a democratic voting mechanism.

The voting strategy implemented both hard voting (majority class prediction) and soft voting (averaging probability distributions). Soft voting improved accuracy by 3-5% on multi-class problems by leveraging the confidence scores of base estimators.

Cross-validation results demonstrated that the ensemble approach achieved superior performance with 89% accuracy, outperforming individual classifiers. This method proved particularly robust against overfitting and noise, making it ideal for production systems.

Nested validation for training and evaluation

Implemented nested cross-validation architecture combining inner loops for hyperparameter tuning with outer loops for unbiased performance estimation. This addresses the critical problem of optimistic bias in machine learning model evaluation.

The outer loop provides honest performance estimates using unseen test folds, while the inner loop optimizes hyperparameters using separate validation folds. This dual-loop structure prevents data leakage and inflated performance metrics that plague naive evaluation approaches.

Applied this methodology across multiple classifiers with extensive grid search for hyperparameter optimization. Results showed the importance of proper validation—reported accuracies dropped by 4-7% when compared to naive single-loop cross-validation, providing more realistic performance expectations.

A statistic agente using MDP

Created an autonomous agent for automated statistical analysis and hypothesis testing on streaming datasets. The agent identifies significant patterns, anomalies, and relationships without manual intervention.

The system integrates Bayesian inference for probabilistic reasoning and sequential hypothesis testing for efficient decision-making. It automatically selects appropriate statistical tests based on data characteristics and research questions.

Deployed in production environments for real-time monitoring and alerting when statistical significance thresholds are exceeded. The agent reduced manual analysis time by 70% while maintaining scientific rigor and providing detailed reports with confidence intervals and effect sizes.

A free-model agent using DQL

Developed a Q-Learning agent that learns optimal decision-making policies through interaction with its environment. The agent iteratively explores actions, observes rewards, and updates value estimates to maximize cumulative performance.

Implemented experience replay and target networks to stabilize learning. These techniques addressed temporal correlation issues and moving target problems inherent in Q-Learning, enabling faster convergence and better stability.

Achieved near-optimal performance on complex sequential decision tasks with branching action spaces. The agent demonstrated robust generalization to unseen states and maintained consistent performance across 100+ evaluation episodes, validating the learned policy's reliability for deployment.

Heuristic & A* search agents

Implemented A* search and heuristic-based agents for pathfinding and planning problems. The agents combine cost functions with domain-specific heuristics to efficiently navigate large state spaces toward optimal solutions.

Designed and evaluated multiple admissible heuristics including Manhattan distance, Euclidean distance, and custom domain heuristics. Comparative analysis demonstrated that well-designed heuristics reduce node expansions by up to 60% compared to uninformed search strategies.

Applied to constraint satisfaction and scheduling problems where informed search dramatically outperforms brute-force approaches. Results validated that A* with an admissible heuristic guarantees optimality while significantly reducing computational overhead.

Pattern mining algorithms

Exploration of pattern mining techniques such as frequent itemset mining, association rules, and sequential pattern mining applied to structured datasets.

Techniques include Apriori, FP-Growth, and SPADE for extracting frequent patterns and understanding sequences and co-occurrences in transactional data.

Semantic classification

Semantic classification using word embeddings, transformer encoders and graph-based label propagation for rich text classification tasks.

Includes experiments with fine-tuning, zero-shot classification and hybrid rule-based systems to boost precision on domain-specific datasets.

Numerical Analysis

Approximation, interpolation and numerical integration: from Lagrange interpolation and the Lebesgue function to rational and adaptive approximations (Floater–Hormann, AAA).

Regression splines and LSPIA for efficient spline fitting. Euler–Maclaurin expansion used for extrapolation and accelerated quadrature.