Problem Description
Here is a program that can be taught to play
tictactoe. It learns from it's mistakes and successes to improve
it's play over time. This is the virtual world version of
a game that uses 304 match boxes representing the possible unique board positions
for the 1st player. Each match box has a diagram of the board that
it represents, and colored beads. Each bead color uniquely identifies an available cell on the board.
For the machine move, a bead is selected randomly from
the box representing the current board configuration (one color for
each available space on the board) and an X placed
in that square. The machine always plays first and uses X's.
The boxes used remain open and the bead selected remains on top of the
box, At the end of the game, if X wins, it is rewarded by adding
beads of the selected colors to each open box. If it loses,
the machine is punished by removing the beads that led to the
defeat.
The program learns by playing
against human or computer opponents.
Background & Techniques
This is another program based on an article
in Martin Gardner's The Colossal Book of Mathematics.
He describes the Naughts and Crosses machine designed by Donald Michie in
1961. Dr. Michie was a professor at the University of
Edinburgh, where tictactoe is known as Naughts and Crosses.
(Actually all Brits know tictactoe as Naughts and Crosses.) He called his machine MENACE
(Matchbox Educable Naughts And Crosses Engine).
Gardner devotes a full chapter to the machine, so I'll just present an
outline here. Michie placed 4 beads for each of the 9 colors
representing the 9 available cells in the initial move boxes, 3 beads for
each of the 7 available cells for the machine's 2nd move, 2 beads for
each of the 5 available cells for the machine's 3rd moves, 1 bead
for each of the 3 available cells for the machine's 4th move. A 5th
move might be required to fill in the last available cell, but we didn't
need matchboxes and beads for that case since there is only one
choice. When
the machine won, each open box was rewarded with 3 beads of the color on
top of the box. For draws, the reward was one bead; and for losses
the bead on top of the open boxes was confiscated.
Michie reports that he trained his machine in 220 games, being
beaten by the machine in 8 of the last 10 games. He probably
played intelligent games to train it that rapidly. My computerized
version takes 1000 or 2000 games to learn how to play well
when playing against a computerized opponent that always plays randomly in
any open cell. Or maybe there's just a bug in the program that
makes my machine a slow learner.
Nonprogrammers can jump to the bottom of the
page to download the executable version of the machine.
Notes for Programmers
The main problems solved were:
 Selecting a representation of board configurations. My
original design used 2 bits to represent the state of each of the 9
board positions forming an 18 bit integer. I decided this was
overkill, and made debugging difficult so I switch to using a 9
character string made up of X, O, and  characters. Much simpler
and fast enough. A TWeight record was defined to
contain information about the beads associated with each box.
The boxes are kept a Stringlist, Positions, and the Objects
property holds pointers to the TWeight records.
 Generating the virtual matchboxes  Gardner says that the Michie
machine used 300 matchboxes. It took a while but I finally got
down to 304 boxes. There are 2201 nonwinning positions
presented to the first player in the first four moves. The
transforms described below reduce this to 304 positions. Note
that the 2201 positions would work just fine in the computerized
version, probably faster than using the transform procedure. I
wanted to get down to 300 boxes "just for fun".
Procedure GenNext generates a list of positions.
The transform version takes several seconds, so the resulting list is
saved to a file named Positions.str and only recreated when the
file does not exist.
 The 2201 true board positions are transformed into the 304 virtual
matchboxes by rotating and/or flipping the board. I ended
up with 7 transforms, (8 counting the identity transform):
 rotate right 90 degrees
 rotate left 90 degrees
 rotate 180 degrees
 mirror (vertical flip, exchange 1st and 3rd rows)
 mirror + left 90
 mirror + right 90
 mirror + 180
Each transform is defined by a 9 digit array with the new index
positions for each of the input cells. This happens in procedure
Transform. An inverse transform procedure, InvTransform,
exists for putting transformed boards back in their original
configuration after a move has been made. All of the transforms
are symmetric except for rotateleft and rotateright. For the
other 5, performing the transform a second time get us back to the
original configuration.
 Selecting a move based on the relative frequency of the bead colors
in a box is a little tricky. It's probably
easiest to explain the procedure with an example: Assume we have
1 red bead, 2 blue beads and 3 green beads to choose from, represented
by the array [1,2,3]. We want to simulate putting them all in a
hat and drawing one randomly. I do it by summing the
number of beads (6), selecting a random number in the range 1 to total
beads, say 4, and running through the array of bead counts
accumulating partial sums, until the partial sum is equals or exceeds
our random number. In this case partial sums are 1, 3 and
6 so 6 is the first partial sum exceeding our random number so we select the third entry and place our X in the position
represented by the green bead.
At startup time we create a 3X3 array of Tedit components to represent
the playing board. An OnClick exit records opponent
moves. The machine moves by using function FindTransform
procedure to locate the Position list entry that corresponds to the
current board configuration. The Objects property in
the list contains a pointer to a TWeightRec record that has the
bead counts we'll use to make the move.
Quite a bit more to explore here if you are interested.
Addendum November 23, 2004: As a viewer recently
reported, on occasion. the beads remaining in a box seem to get out of
sync with the available open cells. This could result in the machine
placing its X in an occupied cell. I haven't found the cause of the
problem yet but did post a new version today that bypasses the
erroneous result. The problem was only detectable during
manual play when the machine would occasionally overwrite your blocking
"O" with his "X" and claim a win
anyway.
Running/Exploring the Program
Suggestions for Further Explorations

Would a smarter opponent train the program
faster? It bothers me that this program is such a slow learner
compared to Michie's version. 

It
would be nice to highlight the list entry used to make each move. 

The
list of board positions (matchboxes) must be sorted in order to
efficiently find our current position. It would more
pleasing if the level numbers were prepended to the key data when the
list is built and when doing lookups so that entries appeared in move
number sequence. 
Original: June 10, 2002 
Modified: February 18, 2016

