Blackjack vs AI
A browser blackjack game with an AI opponent that learned to play on its own, using Monte Carlo reinforcement learning.
- TypeScript
- R
- Reinforcement Learning
- Vite
Blackjack vs AI is a browser game where you play hands against a computer opponent that decides whether to hit or stay. Nothing about its strategy is hand written. The agent learned how to play by itself, by playing the game hundreds of thousands of times and keeping track of what tended to work.
The learning agent is built in R and the game runs in the browser in TypeScript. During training the agent plays out simulated hands, gets a reward at the end of each one, and slowly works backward to a policy: for any given situation, the move with the best expected payoff. That finished policy is what you play against in the browser.
How the agent learns
- It uses a first visit Monte Carlo method. The agent plays a full hand, sees the result, and then credits every state it passed through with the final reward.
- Each state is three numbers: the dealer's visible card, the player's hand value, and whether the hand holds a soft ace. The agent picks between two actions, hit or stay.
- Rewards are simple: +1 for a win, 0 for a tie, and -1 when the dealer wins. Gamma is set to 1, so a win counts the same no matter how many moves it took to get there.
- Exploration uses epsilon decay tracked per state, so a state the agent has barely seen stays curious while a well understood one settles on its best move.
Training
- The policy is trained over 250,000 episodes of simulated play.
- Instead of dealing realistic hands, training starts each episode from a randomly generated state. That covers rare situations far more evenly than natural card distribution would, which sharpens the policy in spots a real game rarely reaches.
- The learned policy is exported so the browser game can look up the agent's move for any state instantly.
Results and strategy
- The trained agent wins 43.3% of hands, compared to about 28% for a baseline that plays at random.
- Strategy heatmaps visualize the finished policy, showing where the agent chooses to hit or stay across every combination of dealer card and hand value.
The game
- The frontend is TypeScript built with Vite, and it plays the learned policy live in the browser so you can take the agent on directly.
- The scope is deliberately tight: no splitting, doubling, or surrendering, blackjack pays 1:1, and there is no card counting. The point is the learning, not a full casino.
Tech stack
The agent is trained in R with a first visit Monte Carlo method. The game is built in TypeScript with Vite, linted with ESLint, and deployed on GitHub Pages.