Portfolio Project

Minesweeper Solver

Reinforcement Learning (RL)

Open interactive view View all projects

Machine Learning Automation Python PyTorch AWS Docker

Context

I wanted to train reinforcement learning agents that can reliably play Minesweeper and expose them in a friendly web demo.

Approach

Built a custom Minesweeper environment that generates boards on demand (5x5 to 10x10) with 10% to 20% mines and safe first-click logic.
Trained 12 model variants across DQN, Double DQN, and Dueling DQN architectures with multiple CNN pooling heads and replay buffers.
Packaged the best model behind an AWS Lambda + container image endpoint for interactive inference.

Impact

Top model reached a 10x10 grid with about 35% success on the hardest tier.
Current demo model runs 9x9 boards with roughly a 35% success rate.

Environment & Data Generation

The training data is created on the fly, so every episode is a fresh Minesweeper board.

Generated boards dynamically with randomized mine placement (10% to 20% density) and a protected 3x3 first click.
Normalized cell values between -0.25 and 1 to speed convergence and keep inputs stable.
Validated board shapes and reveal rules with a debug mode to guarantee consistent training inputs.

Model Variants

Evaluated DQN, Double DQN, and Dueling DQN policies with max-pool, adaptive-pool, and global-average CNN heads.
Compared regular vs. prioritized replay buffers across each architecture (12 total combinations).
Found Double DQN with adaptive pooling to be the strongest family on larger grid sizes.

Training & Curriculum

Trained each model for 10,000 episodes with 64 parallel games per episode and a periodically updated target network.
Advanced grid size after hitting 50% success on a 100-game test set three times in a row.
Reward shaping favored safe reveals (+0.3), penalized blind guesses (-0.3), and heavily penalized mine hits (-1.0).

What I'd Improve

Add flagging actions and a second head for mine probability to reduce late-game guesswork.
Pair the RL policy with a lightweight search layer for harder boards.
Expand evaluation with a fixed benchmark suite to compare against classical Minesweeper solvers.

Links

Live Demo PDF