July 17, 2008

This page hosts an excellent example of gesture/writing recognition implemented in pure Javascript. The author has a blog post here with more information about his implementation. The algorithm presented works fantastically on the web and if you follow the gestures in the PDF you will find it gets good accuracy even with clumsy, mouse-drawn lines. Even better, the source code is compact and quite clear. Inspired by the simplicity of the algorithm and the small size of the code, I ported it to pure Python. Lately I have been thinking a lot about Jython and Python on the Nintendo DS, and so I've kept the module clean and hopefully very portable with no dependencies on other libraries, and only one small dependency on a Python built-in library. I've also tried to remain dimensionally agnostic where possible, meaning that you should be able to detect 'gestures' in any kind of data of any dimensionality (3d movements, musical pitches, whatever) with a little tweaking. See below for the full source code listing. Copy the first three bits of code and paste them into a file called OneStrokeGesture.py(.) The last couples of bits of code are demos, which require the pygame library.

First up is the Path class. This class holds a single drawn line and is what is created when raw point data is passed to the library. The Path class requires the only dependency of this module, which is the sqrt function from the standard math library included with Python.

from math import sqrt

class Path:
    def __init__(self, points=[]):
        self.lens = []
        self.points = []
        self.normalized = False

        [self.AddPoint(p) for p in points]

        if len(self.points):
            self.Normalize()

    def __len__(self):
        return len(self.points)

    def AddPoint(self, point):
        self.points.append(point)

        if len(self.lens):
            l = self.Distance(self.points[-2], self.points[-1])
            self.lens.append(self.lens[-1] + l)
        else:
            self.lens.append(0)

    def Distance(self, a, b):
        return sqrt(sum([pow(a[i] - b[i], 2) for i in range(len(a))]))

    def DistancePath(self, path):
        if len(self) == len(path):
            return sum(self.Distance(self.points[i], path.points[i]) for i in range(len(self)))

    def Interpolate(self, array, index, ratio):
        return float(array[index]) + float(array[index + 1] - array[index]) * ratio

    def Normalize(self):
        if not self.normalized:
            self.normalized = True

            normLen = 10
            newPoints = []
            len = self.lens[-1]

            mx_min = min([x for x, y in self.points])
            mx_max = max([x for x, y in self.points])
            my_min = min([y for x, y in self.points])
            my_max = max([y for x, y in self.points])
            mx_diff = mx_max - mx_min
            my_diff = my_max - my_min

            if my_diff == 0:
                proportion = 10000000000000.0
            else:
                proportion = abs(float(mx_max - mx_min) / float(my_max - my_min))

            curIndex = 0
            for i in range(normLen):
                targetLen = i * len / normLen;

                while (self.lens[curIndex+1] < targetLen):
                    curIndex += 1

                ratio = (targetLen - self.lens[curIndex]) / (self.lens[curIndex + 1] - self.lens[curIndex])
                x = self.Interpolate([px for px, py in self.points], curIndex, ratio)
                y = self.Interpolate([py for px, py in self.points], curIndex, ratio)

                if (proportion < 0.2):
                    newPoints.append((0, (y - my_min) / (my_diff)))
                elif (proportion > 5):
                    newPoints.append(((x - mx_min) / (mx_diff), 0))
                else:
                    newPoints.append(((x - mx_min) / (mx_diff), (y - my_min) / my_diff))

            self.points = newPoints

    def PosValue(self, x, y):
        if (x < 0.5 and y < 0.5):
            return locals.StartTopLeft
        if (x >= 0.5 and y < 0.5):
            return locals.StartTopRight
        if (x < 0.5 and y >= 0.5):
            return locals.StartBottomLeft
        if (x >= 0.5 and y >= 0.5):
            return locals.StartBottomRight

    def MatchConstraint(self, c):
        if not c:
            return True
        if not self.normalized:
            return False

        startValue = self.PosValue(self.points[0][0], self.points[0][1])
        endValue = self.PosValue(self.points[-1][0], self.points[-1][1]) << 4

        return ((startValue | endValue) & (~c)) == 0

As you can see, the core algorithm requires that all paths are 'normalized' so that they contain the normLen number of points describing the generalised shape of the path. The Euclidean distance from each descriptive point to the corresponding point in each of a library of potential lines is then compared, and the line with the closest match is likely to be the line that the user was trying to draw. If you want more complex lines you could increase the number in normLen, but it should be fine for most letter/gesture detecting activities using hand-drawn lines.

The next piece of the puzzle is the following 'locals' class which holds a bunch of static variables and a default simplistic alphabet of gestures that correspond to letters of the alphabet. This is a rough approximation of the Palm 'Graffiti' alphabet:

class locals:
    StartTopLeft = 1 # starting in the top left corner is allowed
    StartTopRight = 1 << 1
    StartBottomLeft = 1 << 2
    StartBottomRight = 1 << 3
    StartAny = StartTopLeft | StartTopRight | StartBottomLeft | StartBottomRight

    EndTopLeft = 1 << 4
    EndTopRight = 1 << 5
    EndBottomLeft = 1 << 6
    EndBottomRight = 1 << 7
    EndAny = EndTopLeft | EndTopRight | EndBottomLeft | EndBottomRight
    Any = StartAny | EndAny

    capitals = [
        ["A", Path(zip([0, 5, 10], [10, 0, 10])), StartBottomLeft | EndBottomRight],
        ["B", Path(zip([0, 0, 0, 3, 3, 0], [0, 10, 7, 7, 10, 10])), StartTopLeft | EndBottomLeft],
        ["C", Path(zip([10, 0, 0, 10], [0, 0, 10, 10])), StartTopRight | EndBottomRight],
        ["D", Path(zip([0, 0, 10, 10, 0], [10, 0, 0, 10, 10])), StartBottomLeft | EndBottomLeft],
        ["E", Path(zip([10, 0, 0, 3, 0, 0, 10], [0, 0, 5, 5, 5, 10, 10])), StartTopRight | EndBottomRight],
        ["F", Path(zip([10, 0, 0], [0, 0, 10])), StartTopRight | EndBottomLeft],
        ["G", Path(zip([10, 0, 0, 10, 10, 5], [0, 0, 10, 10, 5, 5])), StartTopRight | EndAny],
        ["H", Path(zip([0, 0, 0, 3, 3], [0, 10, 7, 7, 10])), StartTopLeft | EndBottomRight],
        ["I", Path(zip([5, 5], [0, 10])), StartTopLeft | EndBottomLeft],
        ["J", Path(zip([10, 10, 0], [0, 10, 10])), StartTopRight | EndBottomLeft | EndTopLeft],
        ["K", Path(zip([10, 0, 0, 10], [0, 10, 0, 10])), StartTopRight | EndBottomRight],
        ["L", Path(zip([0, 0, 10], [0, 10, 10])), StartTopLeft | EndBottomRight],
        ["M", Path(zip([0, 0, 5, 10, 10], [10, 0, 5, 0, 10])), StartBottomLeft | EndBottomRight],
        ["N", Path(zip([0, 0, 10, 10], [10, 0, 10, 0])), StartBottomLeft | EndTopRight],
        ["O", Path(zip([5, 0, 0, 10, 10, 5], [0, 0, 10, 10, 0, 0])), StartTopLeft | StartTopRight | EndTopLeft | EndTopRight],
        ["P", Path(zip([0, 0, 0, 10, 10, 0], [0, 10, 0, 0, 5, 5])), StartBottomLeft | EndAny],
        ["P2", Path(zip([0, 0, 10, 10, 0], [10, 0, 0, 5, 5])), StartBottomLeft | EndAny],
        ["Q", Path(zip([4, 0, 0, 4, 4], [0, 0, 4, 4, 7])), None],
        ["R", Path(zip([0, 0, 0, 10, 10, 0, 10], [0, 10, 0, 0, 5, 5, 10])), StartBottomLeft | EndAny],
        ["R2", Path(zip([0, 0, 10, 10, 0, 10], [10, 0, 0, 5, 5, 10])), StartBottomLeft | EndBottomRight],
        ["S", Path(zip([10, 0, 0, 10, 10, 0], [0, 2, 4, 6, 8, 10])), StartTopRight | EndBottomLeft],
        ["T", Path(zip([0, 8, 8], [0, 0, 10])), StartTopLeft | EndBottomRight],
        ["U", Path(zip([0, 5, 10], [0, 10, 0])), StartTopLeft | EndTopRight],
        ["U2", Path(zip([0, 0, 10, 10], [0, 10, 10, 0])), StartTopLeft | EndTopRight],
        ["V", Path(zip([10, 5, 0], [0, 10, 0])), StartTopLeft | EndTopRight],
        ["V2", Path(zip([0, 3, 6, 10], [0, 10, 0, 0])), StartTopLeft | EndTopRight],
        ["W", Path(zip([0, 0, 5, 10, 10], [0, 10, 5, 10, 0])), StartTopLeft | EndTopRight],
        ["X", Path(zip([0, 10, 10, 0], [10, 0, 10, 0])), StartBottomLeft | EndTopLeft],
        ["Y", Path(zip([0, 0, 5, 5, 5, 5, 5, 10], [0, 5, 5, 0, 5, 10, 5, 5])), StartTopLeft | EndAny],
        ["Z", Path(zip([0, 10, 0, 10], [0, 0, 10, 10])), StartTopLeft | EndBottomRight],
    ]

Finally, the algorithm which executes the stroke by stroke, point by point search is implemented in Python in the following class:

class OneStrokeGestureException(Exception):
    pass

class OneStrokeGesture:
    """
        Can detect single-stroke pen gestures.
        Pass in the strokes variable to supply your own strokes to be recognised.

        Adapted by Chris McCormick <chris@mccormick.cx> from the Javascript code at:
        http://blog.monstuff.com/archives/images/JS-graffiti.html
    """
    paths = []

    def __init__(self, strokes=locals.capitals):
        self.strokes = strokes
        self.path = None
        self.down = False

    def PenDown(self, pos):
        self.path = Path()
        self.path.AddPoint(pos)
        self.down = True

    def PenTo(self, pos):
        self.path.AddPoint(pos)

    def PenUp(self, pos):
        self.down = False
        if len(self.path) > 1:
            self.path.AddPoint(pos)
            return self.FindBestMatch()

    def Learn(self, id, points):
        self.strokes += [[id, Path(points), locals.Any]]

    def FindGesture(self, points):
        self.path = Path(points)
        return self.FindBestMatch()

    def FindBestMatch(self):
        if self.path.points[0] != self.path.points[1]:
            self.path.Normalize()
            minDist = 100
            minStroke = [None, None, None]
            results = [(s[0], self.path.DistancePath(s[1])) for s in self.strokes if self.path.MatchConstraint(s[2])]
            results.sort(lambda a,b: cmp(a[1], b[1]))
            return results
        else:
            raise OneStrokeGestureException("Paths are different lengths")

This module will return a list of tuples of the names of all of the paths that match, and how closely they were matched. Generally, the first item matched will be the one you're looking for. You run the detection algorithm on the line that the user has created using FindGesture and passing in the list of points which make up the line. You can also call the PenDown, PenTo, and PenUp methods to have the class build the path itself incrementally as the user draws a line.

Together the above classes make up the gesture recognition code. Paste them into a file called OneStrokeGesture.py and then you are ready to begin detecting gestures. Here is a bit of test code which uses the Pygame library to help you try out the gesture recognition against the Graffiti alphabet of drawn symbols:

from OneStrokeGesture import OneStrokeGesture
import sys, pygame

pygame.init()

screen = pygame.display.set_mode((320, 240))
g = OneStrokeGesture()
paths = []

while True:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            sys.exit()

        if event.type == pygame.MOUSEBUTTONDOWN:
            g.PenDown(event.pos)
            paths.append(g.path.points)
        if event.type == pygame.MOUSEMOTION:
            if g.down:
                g.PenTo(event.pos)
        if event.type == pygame.MOUSEBUTTONUP:
            print g.PenUp(event.pos)

    screen.fill([255, 255, 255])
    [pygame.draw.aalines(screen, [0, 0, 0], False, p, 1) for p in paths if len(p) > 1]
    pygame.display.flip()

I added the method Learn, which allows you to take existing paths drawn by the user and add them to the library of known paths. I have found this to be really effective. The algorithm finds already-learned paths with a high degree of accuracy - much higher in some cases than the preset Graffiti alphabet. This next bit of test code makes you teach five letters (a, b, c, d, e) three times before starting the recognition process. You'll want to run this on the command line as this test will print out the letter it wants to learn next. Once all letters are learned, the demo goes into recognition mode and you can freely draw strokes which will be detected against those that you input earlier.

import sys, pygame
from random import shuffle

from OneStrokeGesture import OneStrokeGesture

pygame.init()

screen = pygame.display.set_mode((320, 240))
paths = []
letters = [chr(c) for c in range (ord('a'), ord('f'))] * 3
g = OneStrokeGesture([])
print g.strokes

print "learning:", letters[0]

while True:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            sys.exit()

        if event.type == pygame.MOUSEBUTTONDOWN:
            g.PenDown(event.pos)
            paths.append(g.path.points)
        if event.type == pygame.MOUSEMOTION:
            if g.down:
                g.PenTo(event.pos)
        if event.type == pygame.MOUSEBUTTONUP:
            if len(letters):
                g.Learn(letters[0], g.path.points)
                letters = letters[1:]
                if len(letters):
                    print "learning:", letters[0]
                g.PenUp(event.pos)
            else:
                print "found:", g.PenUp(event.pos)

    screen.fill([255, 255, 255])
    [pygame.draw.aalines(screen, [0, 0, 0], False, p, 1) for p in paths if len(p) > 1]
    pygame.display.flip()

Have fun!