Portfolio Project
Minesweeper Solver
Reinforcement Learning (RL)
Context
I wanted to train reinforcement learning agents that can reliably play Minesweeper and expose them in a friendly web demo.
Approach
- Built a custom Minesweeper environment that generates boards on demand (5x5 to 10x10) with 10% to 20% mines and safe first-click logic.
- Trained 12 model variants across DQN, Double DQN, and Dueling DQN architectures with multiple CNN pooling heads and replay buffers.
- Packaged the best model behind an AWS Lambda + container image endpoint for interactive inference.
Impact
- Top model reached a 10x10 grid with about 35% success on the hardest tier.
- Current demo model runs 9x9 boards with roughly a 35% success rate.
Environment & Data Generation
The training data is created on the fly, so every episode is a fresh Minesweeper board.
- Generated boards dynamically with randomized mine placement (10% to 20% density) and a protected 3x3 first click.
- Normalized cell values between -0.25 and 1 to speed convergence and keep inputs stable.
- Validated board shapes and reveal rules with a debug mode to guarantee consistent training inputs.
Model Variants
- Evaluated DQN, Double DQN, and Dueling DQN policies with max-pool, adaptive-pool, and global-average CNN heads.
- Compared regular vs. prioritized replay buffers across each architecture (12 total combinations).
- Found Double DQN with adaptive pooling to be the strongest family on larger grid sizes.
Training & Curriculum
- Trained each model for 10,000 episodes with 64 parallel games per episode and a periodically updated target network.
- Advanced grid size after hitting 50% success on a 100-game test set three times in a row.
- Reward shaping favored safe reveals (+0.3), penalized blind guesses (-0.3), and heavily penalized mine hits (-1.0).
What I'd Improve
- Add flagging actions and a second head for mine probability to reduce late-game guesswork.
- Pair the RL policy with a lightweight search layer for harder boards.
- Expand evaluation with a fixed benchmark suite to compare against classical Minesweeper solvers.