July 17, 2008

This page hosts an excellent example of gesture/writing recognition implemented in pure Javascript. The author has a blog post here with more information about his implementation. The algorithm presented works fantastically on the web and if you follow the gestures in the PDF you will find it gets good accuracy even with clumsy, mouse-drawn lines. Even better, the source code is compact and quite clear. Inspired by the simplicity of the algorithm and the small size of the code, I ported it to pure Python. Lately I have been thinking a lot about Jython and Python on the Nintendo DS, and so I've kept the module clean and hopefully very portable with no dependencies on other libraries, and only one small dependency on a Python built-in library. I've also tried to remain dimensionally agnostic where possible, meaning that you should be able to detect 'gestures' in any kind of data of any dimensionality (3d movements, musical pitches, whatever) with a little tweaking. See below for the full source code listing. Copy the first three bits of code and paste them into a file called OneStrokeGesture.py(.) The last couples of bits of code are demos, which require the pygame library.

First up is the Path class. This class holds a single drawn line and is what is created when raw point data is passed to the library. The Path class requires the only dependency of this module, which is the sqrt function from the standard math library included with Python.

from math import sqrt

class Path:
    def __init__(self, points=[]):
        self.lens = []
        self.points = []
        self.normalized = False

        [self.AddPoint(p) for p in points]

        if len(self.points):
            self.Normalize()

    def __len__(self):
        return len(self.points)

    def AddPoint(self, point):
        self.points.append(point)

        if len(self.lens):
            l = self.Distance(self.points[-2], self.points[-1])
            self.lens.append(self.lens[-1] + l)
        else:
            self.lens.append(0)

    def Distance(self, a, b):
        return sqrt(sum([pow(a[i] - b[i], 2) for i in range(len(a))]))

    def DistancePath(self, path):
        if len(self) == len(path):
            return sum(self.Distance(self.points[i], path.points[i]) for i in range(len(self)))

    def Interpolate(self, array, index, ratio):
        return float(array[index]) + float(array[index + 1] - array[index]) * ratio

    def Normalize(self):
        if not self.normalized:
            self.normalized = True

            normLen = 10
            newPoints = []
            len = self.lens[-1]

            mx_min = min([x for x, y in self.points])
            mx_max = max([x for x, y in self.points])
            my_min = min([y for x, y in self.points])
            my_max = max([y for x, y in self.points])
            mx_diff = mx_max - mx_min
            my_diff = my_max - my_min

            if my_diff == 0:
                proportion = 10000000000000.0
            else:
                proportion = abs(float(mx_max - mx_min) / float(my_max - my_min))

            curIndex = 0
            for i in range(normLen):
                targetLen = i * len / normLen;

                while (self.lens[curIndex+1] < targetLen):
                    curIndex += 1

                ratio = (targetLen - self.lens[curIndex]) / (self.lens[curIndex + 1] - self.lens[curIndex])
                x = self.Interpolate([px for px, py in self.points], curIndex, ratio)
                y = self.Interpolate([py for px, py in self.points], curIndex, ratio)

                if (proportion < 0.2):
                    newPoints.append((0, (y - my_min) / (my_diff)))
                elif (proportion > 5):
                    newPoints.append(((x - mx_min) / (mx_diff), 0))
                else:
                    newPoints.append(((x - mx_min) / (mx_diff), (y - my_min) / my_diff))

            self.points = newPoints

    def PosValue(self, x, y):
        if (x < 0.5 and y < 0.5):
            return locals.StartTopLeft
        if (x >= 0.5 and y < 0.5):
            return locals.StartTopRight
        if (x < 0.5 and y >= 0.5):
            return locals.StartBottomLeft
        if (x >= 0.5 and y >= 0.5):
            return locals.StartBottomRight

    def MatchConstraint(self, c):
        if not c:
            return True
        if not self.normalized:
            return False

        startValue = self.PosValue(self.points[0][0], self.points[0][1])
        endValue = self.PosValue(self.points[-1][0], self.points[-1][1]) << 4

        return ((startValue | endValue) & (~c)) == 0

As you can see, the core algorithm requires that all paths are 'normalized' so that they contain the normLen number of points describing the generalised shape of the path. The Euclidean distance from each descriptive point to the corresponding point in each of a library of potential lines is then compared, and the line with the closest match is likely to be the line that the user was trying to draw. If you want more complex lines you could increase the number in normLen, but it should be fine for most letter/gesture detecting activities using hand-drawn lines.

The next piece of the puzzle is the following 'locals' class which holds a bunch of static variables and a default simplistic alphabet of gestures that correspond to letters of the alphabet. This is a rough approximation of the Palm 'Graffiti' alphabet:

class locals:
    StartTopLeft = 1 # starting in the top left corner is allowed
    StartTopRight = 1 << 1
    StartBottomLeft = 1 << 2
    StartBottomRight = 1 << 3
    StartAny = StartTopLeft | StartTopRight | StartBottomLeft | StartBottomRight

    EndTopLeft = 1 << 4
    EndTopRight = 1 << 5
    EndBottomLeft = 1 << 6
    EndBottomRight = 1 << 7
    EndAny = EndTopLeft | EndTopRight | EndBottomLeft | EndBottomRight
    Any = StartAny | EndAny

    capitals = [
        ["A", Path(zip([0, 5, 10], [10, 0, 10])), StartBottomLeft | EndBottomRight],
        ["B", Path(zip([0, 0, 0, 3, 3, 0], [0, 10, 7, 7, 10, 10])), StartTopLeft | EndBottomLeft],
        ["C", Path(zip([10, 0, 0, 10], [0, 0, 10, 10])), StartTopRight | EndBottomRight],
        ["D", Path(zip([0, 0, 10, 10, 0], [10, 0, 0, 10, 10])), StartBottomLeft | EndBottomLeft],
        ["E", Path(zip([10, 0, 0, 3, 0, 0, 10], [0, 0, 5, 5, 5, 10, 10])), StartTopRight | EndBottomRight],
        ["F", Path(zip([10, 0, 0], [0, 0, 10])), StartTopRight | EndBottomLeft],
        ["G", Path(zip([10, 0, 0, 10, 10, 5], [0, 0, 10, 10, 5, 5])), StartTopRight | EndAny],
        ["H", Path(zip([0, 0, 0, 3, 3], [0, 10, 7, 7, 10])), StartTopLeft | EndBottomRight],
        ["I", Path(zip([5, 5], [0, 10])), StartTopLeft | EndBottomLeft],
        ["J", Path(zip([10, 10, 0], [0, 10, 10])), StartTopRight | EndBottomLeft | EndTopLeft],
        ["K", Path(zip([10, 0, 0, 10], [0, 10, 0, 10])), StartTopRight | EndBottomRight],
        ["L", Path(zip([0, 0, 10], [0, 10, 10])), StartTopLeft | EndBottomRight],
        ["M", Path(zip([0, 0, 5, 10, 10], [10, 0, 5, 0, 10])), StartBottomLeft | EndBottomRight],
        ["N", Path(zip([0, 0, 10, 10], [10, 0, 10, 0])), StartBottomLeft | EndTopRight],
        ["O", Path(zip([5, 0, 0, 10, 10, 5], [0, 0, 10, 10, 0, 0])), StartTopLeft | StartTopRight | EndTopLeft | EndTopRight],
        ["P", Path(zip([0, 0, 0, 10, 10, 0], [0, 10, 0, 0, 5, 5])), StartBottomLeft | EndAny],
        ["P2", Path(zip([0, 0, 10, 10, 0], [10, 0, 0, 5, 5])), StartBottomLeft | EndAny],
        ["Q", Path(zip([4, 0, 0, 4, 4], [0, 0, 4, 4, 7])), None],
        ["R", Path(zip([0, 0, 0, 10, 10, 0, 10], [0, 10, 0, 0, 5, 5, 10])), StartBottomLeft | EndAny],
        ["R2", Path(zip([0, 0, 10, 10, 0, 10], [10, 0, 0, 5, 5, 10])), StartBottomLeft | EndBottomRight],
        ["S", Path(zip([10, 0, 0, 10, 10, 0], [0, 2, 4, 6, 8, 10])), StartTopRight | EndBottomLeft],
        ["T", Path(zip([0, 8, 8], [0, 0, 10])), StartTopLeft | EndBottomRight],
        ["U", Path(zip([0, 5, 10], [0, 10, 0])), StartTopLeft | EndTopRight],
        ["U2", Path(zip([0, 0, 10, 10], [0, 10, 10, 0])), StartTopLeft | EndTopRight],
        ["V", Path(zip([10, 5, 0], [0, 10, 0])), StartTopLeft | EndTopRight],
        ["V2", Path(zip([0, 3, 6, 10], [0, 10, 0, 0])), StartTopLeft | EndTopRight],
        ["W", Path(zip([0, 0, 5, 10, 10], [0, 10, 5, 10, 0])), StartTopLeft | EndTopRight],
        ["X", Path(zip([0, 10, 10, 0], [10, 0, 10, 0])), StartBottomLeft | EndTopLeft],
        ["Y", Path(zip([0, 0, 5, 5, 5, 5, 5, 10], [0, 5, 5, 0, 5, 10, 5, 5])), StartTopLeft | EndAny],
        ["Z", Path(zip([0, 10, 0, 10], [0, 0, 10, 10])), StartTopLeft | EndBottomRight],
    ]

Finally, the algorithm which executes the stroke by stroke, point by point search is implemented in Python in the following class:

class OneStrokeGestureException(Exception):
    pass

class OneStrokeGesture:
    """
        Can detect single-stroke pen gestures.
        Pass in the strokes variable to supply your own strokes to be recognised.

        Adapted by Chris McCormick <chris@mccormick.cx> from the Javascript code at:
        http://blog.monstuff.com/archives/images/JS-graffiti.html
    """
    paths = []

    def __init__(self, strokes=locals.capitals):
        self.strokes = strokes
        self.path = None
        self.down = False

    def PenDown(self, pos):
        self.path = Path()
        self.path.AddPoint(pos)
        self.down = True

    def PenTo(self, pos):
        self.path.AddPoint(pos)

    def PenUp(self, pos):
        self.down = False
        if len(self.path) > 1:
            self.path.AddPoint(pos)
            return self.FindBestMatch()

    def Learn(self, id, points):
        self.strokes += [[id, Path(points), locals.Any]]

    def FindGesture(self, points):
        self.path = Path(points)
        return self.FindBestMatch()

    def FindBestMatch(self):
        if self.path.points[0] != self.path.points[1]:
            self.path.Normalize()
            minDist = 100
            minStroke = [None, None, None]
            results = [(s[0], self.path.DistancePath(s[1])) for s in self.strokes if self.path.MatchConstraint(s[2])]
            results.sort(lambda a,b: cmp(a[1], b[1]))
            return results
        else:
            raise OneStrokeGestureException("Paths are different lengths")

This module will return a list of tuples of the names of all of the paths that match, and how closely they were matched. Generally, the first item matched will be the one you're looking for. You run the detection algorithm on the line that the user has created using FindGesture and passing in the list of points which make up the line. You can also call the PenDown, PenTo, and PenUp methods to have the class build the path itself incrementally as the user draws a line.

Together the above classes make up the gesture recognition code. Paste them into a file called OneStrokeGesture.py and then you are ready to begin detecting gestures. Here is a bit of test code which uses the Pygame library to help you try out the gesture recognition against the Graffiti alphabet of drawn symbols:

from OneStrokeGesture import OneStrokeGesture
import sys, pygame

pygame.init()

screen = pygame.display.set_mode((320, 240))
g = OneStrokeGesture()
paths = []

while True:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            sys.exit()

        if event.type == pygame.MOUSEBUTTONDOWN:
            g.PenDown(event.pos)
            paths.append(g.path.points)
        if event.type == pygame.MOUSEMOTION:
            if g.down:
                g.PenTo(event.pos)
        if event.type == pygame.MOUSEBUTTONUP:
            print g.PenUp(event.pos)

    screen.fill([255, 255, 255])
    [pygame.draw.aalines(screen, [0, 0, 0], False, p, 1) for p in paths if len(p) > 1]
    pygame.display.flip()

I added the method Learn, which allows you to take existing paths drawn by the user and add them to the library of known paths. I have found this to be really effective. The algorithm finds already-learned paths with a high degree of accuracy - much higher in some cases than the preset Graffiti alphabet. This next bit of test code makes you teach five letters (a, b, c, d, e) three times before starting the recognition process. You'll want to run this on the command line as this test will print out the letter it wants to learn next. Once all letters are learned, the demo goes into recognition mode and you can freely draw strokes which will be detected against those that you input earlier.

import sys, pygame
from random import shuffle

from OneStrokeGesture import OneStrokeGesture

pygame.init()

screen = pygame.display.set_mode((320, 240))
paths = []
letters = [chr(c) for c in range (ord('a'), ord('f'))] * 3
g = OneStrokeGesture([])
print g.strokes

print "learning:", letters[0]

while True:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            sys.exit()

        if event.type == pygame.MOUSEBUTTONDOWN:
            g.PenDown(event.pos)
            paths.append(g.path.points)
        if event.type == pygame.MOUSEMOTION:
            if g.down:
                g.PenTo(event.pos)
        if event.type == pygame.MOUSEBUTTONUP:
            if len(letters):
                g.Learn(letters[0], g.path.points)
                letters = letters[1:]
                if len(letters):
                    print "learning:", letters[0]
                g.PenUp(event.pos)
            else:
                print "found:", g.PenUp(event.pos)

    screen.fill([255, 255, 255])
    [pygame.draw.aalines(screen, [0, 0, 0], False, p, 1) for p in paths if len(p) > 1]
    pygame.display.flip()

Have fun!

July 5, 2008

Futuristic Building #1

I recently bought a 'digital notepad'. It's a device that looks kind of like a clipboard, and you can clip regular paper notepads into. When you draw on the paper with a special pen, your pen strokes are digitally recorded. Later you can hook the thing up to a computer to copy the vectorised notes across and get your hand-drawings in a nice flexible vector format.

Futuristic Building #2

This one is made by a company called Laser here in Australia but pretty much every digital notepad on the market is made by one taiwanese company, Waltop, who sell them to other companies for rebranding. The device runs fine under Debian in both USB disk mode, and tablet mode. Here is a link to a blog post with a heap of information about the digital notepad and running it under GNU/Linux (there is lots of info in the comments too):

http://eddie.niese.net/20071129/new-digital-notepad-gadget/

Futuristic Building #3

Modes

The notepad operates in two distinct modes. The first mode is as a regular notepad that internally digitises everything that you write in ink while it's not connected to your computer. This is reasonably effective although I have found that very occasionally, maybe one to three times per page, the pen will miss a stroke. I think this is due to the angle I write on, or not pressing the pen down hard enough when I do a light stroke. For what I am using the notepad for it's no problem since I import the notes as SVG files and then modify them in tablet mode later. I use a small Python script by Jorik Blaas to turn the notes from the .top file format they are stored as, to svg format. You can copy the .top files from the device to your computer when it's connected via the USB cable. Click here to download the Python script (I got it from the URL listed above).

Futuristic Building #4

Which brings me to the other mode; tablet mode. When the notepad is connected via USB to a computer it acts like a USB disk that you can copy notes off, but it simultaneously makes the notepad behave like a primitive graphics tablet. I say primitive because it has no pressure sensitivity, and it really just emulates a mouse - as you move the pen about on the surface, the mouse cursor follows. When you press the pen down, the mouse clicks. So far this has been fine for the little sketches I've been making.

Futuristic Building #5

Other stuff

The box also came with a black portfolio to carry the notepad around in, a USB cable, batteries, replacement ink cartridges, a plastic tipped cartridge for tablet mode, and a bunch of Windows software that I did not use. The whole package cost me about $160(AU) plus twenty five bucks or so postage.

Futuristic Building #6

I am pretty excited about this device and it's been fun playing with it over the last week or so. One of the difficult things about being a solo indie games programmer is not having graphics for your games. I have always found the hand-drawn aesthetic quite endearing in video games so I might have a shot at sketching some characters and backgrounds.

Futuristic Building #7

I have a bunch of other ideas for applications using tablet/touchscreen, so hopefully I'll get around to coding up some of those. Exciting times!

Futuristic Building #8

June 26, 2008

UfoLiberation is a game I wrote a couple of years ago in a one week game-jam-like session with my friend Crispin (he made a Missile Command clone that totally ruled). After the week I spent a few months of spare time polishing it up, adding menus, tutorial, icons, and an installer. My aim was to sell this as a casual kids game, but I don't think I'm ever going to get around to that, so here it is. At the moment there's just a Windows installer, but I plan on adding a Mac OSX one sometime.

My aim was to make a game which combines the game mechanics of two of my favorite arcade games; Choplifter and Joust. In Choplifter the core game mechanic revolves around landing on the ground to rescue people whilst enemies attack you. In Joust the game mechanic is to try and land on enemies from above and avoid being landed on from above. I felt like these two game mechanics would nicely balance eachother; if you need to fly low to pick people up, and at the same time try and hit enemy UFOs from above, there is some kind of risk/reward tradeoff that the player must make.

I find it most fun to play with a USB gamepad, but you can use joystick, mouse, or keyboard.

Click here to download it.

Enjoy!

June 11, 2008

For the last few months I have been helping my friends Rodney Glick (Artist) and Moshe Y Bernstein (Rabbi) with the technical aspects of one of Rodney's artworks. The artwork, entitled "Master Of Prayer" will be showing among many other fascinating works in Rodney's exhibition. It opens tommorrow night (Thursday), 6pm at the Lawrence Wilson Art Gallery at the University of Western Australia. I'm really excited about this project. I got to use Pure Data, voice synthesis (Mbrola), and a network based pseudo-AI to help Rodney and Moshe create a really compelling and thought provoking artwork. Here's the blurb that Moshe wrote about it:

"In the Jewish tradition the full prayer service can be performed only in a quorum of ten adult males known in Hebrew as a minyan. The main part of the service, which occurs three times daily, is the Shmona Esrei, or Eighteen Benedictions. These blessings are first recited silently by the entire congregation. Afterwards, during the morning and afternoon liturgies, they are repeated aloud by the cantor, often referred to as the Ba'al Tefillah or 'Master of Prayer'. In orthodox Judaism any male, whether layman or cleric, over the age of thirteen can lead the prayers. During the repetition of the Shmona Esrei, also called the Amidah, or 'standing prayer', the congregation answers responsively to each of the benedictions recited. In this installation each computer has been individually programmed to respond to the blessings recited by the main computer, the 'Master of Prayer', leading the afternoon Mincha service. Though the installation appears to parody the human condition of prayer by rote, on a deeper level it asks a haunting question about the inherent nature of artificial intelligence. The Jewish sages require kavannah, or 'proper intent' for prayer to be truly acceptable. To the extent that computers can be programmed to 'think', might they not be programmed to this 'proper intent' as well? In a tentative answer to that question, 'Master of Prayer' can be experienced as a high-tech, Jewish version of the Tibetan prayer-wheel or Christian rosary beads."

June 10, 2008

Sfxr is an inspiring little program which generates old-school sound effects for use in games or music, or whatever. It's one of those delightful little nuggets of software that just does exactly what it says it will and nothing more. It's great fun and looks great too, and its source-code fits in a single c++ file! Finally, it's cross platform, running on all the major desktops. Wonderful stuff.

http://www.ludumdare.com/compo/2007/12/13/sfxr-sound-effects-for-all/