Dispatcher Pattern Safety

This post is a rehash of a lightening talk I gave at the last TriZPUG meeting.

One of the more useful programming patterns when applied to Python is the Dispatcher Pattern. The getattr built-in function practically implements the entire pattern. It's one of those Python goodies which leads people to say that programming patterns are already built into Python.

The Dispatcher Pattern allows you to create plug-in architectures for your code. The idea is that you have a number of handler code objects for handling different types of data. When your code encounters data which needs handling, dispatcher code selects which handler should be used.

Some code is worth a thousand words. Here's an example I use when teaching the Dispatcher Pattern at PyCamp:

class Plugin(object):
    """Implement a pluggable architecture."""

    def handle_html(self, name):
        print "The HTML file", name, "has been handled."

    def handle_pdf(self,name):
        print "The PDF file", name, "has been handled."

    def handle_rtf(self, name):
        print "The RTF file", name, "has been handled."

    def handle_default(self, name):
        print "The file", name, "has been handled."

if __name__ == '__main__':
    import sys, os.path
    name = sys.argv[1]
        ext = os.path.splitext(name)[1][1:].lower()
    except IndexError:
        ext = 'default'
    plugin = Plugin()
    getattr(plugin, 'handle_' + ext, plugin.handle_default)(name)

The string name of a handler method is created within getattr's argument list. getattr returns the appropriate handler method and the handler is directly dispatched by call.

The handler objects need not be methods in a class. The handlers could be functions in a module which is imported. getattr doesn't care what kind of object of which the handlers are attributes.

A plug-in architecture truly occurs when the handlers are modules or subpackages within a package for containing pluggable handlers. If the __all__ attribute of the package is properly maintained, then new handler modules may simply be dropped into the plug-in package's directory. By maintaining a naming convention for the handlers, the __init__.py module of the package may glob for a list of handlers to extend onto __all__:

import os, glob

__all__ = [os.path.splitext(os.path.basename(handler))[0]
           for path in __path__
           for handler in glob.glob(os.path.join(path, 'handle_*.py'))]

Such pluggable modules should implement well-defined functions by name (e.g., run, process, create, update) which may be accessed when the handler module is dispatched through getattr operating on the namespace into which the handlers are imported:

import sys
from plugins import *

datatype = 'pony'
getattr(sys.modules[__name__], 'handle_' + datatype).run()

The code above should execute the run function of the handle_pony module in the plugins package if the __all__ attribute of the plugins package was properly maintained in __init__.py.

Obviously, modules placed in a plugins package are trusted code. And as trusted code, we should expect such modules to handle all anticipated exceptions and clean up after themselves.

But in the real world, unanticipated exceptions occur. File formats being handled can change. Web service APIs might morph. Any number of conditions might occur which could lead to unhandled exceptions in a plug-in.

Hopefully our plug-ins don't simply swallow unhandled exceptions. But if they are unanticipated exceptions, we may expect them to bubble up to the dispatcher code.

If the dispatch code is one-shot, that is not executed repeatedly or in a long running process, then allowing the unanticipated exception to halt our program and display as a traceback may suffice. But for the most part, we are interested in not allowing plug-ins to crash our code as well as not silencing the exceptions which plug-ins don't handle.

Because we don't know the type of the unanticipated exception, our plug-in exception handler must cover all the bases. The traceback module is handy for making sure we know what occurred:

import sys, traceback

    getattr(sys.modules[__name__], 'handle_' + datatype).run()

For long running processes, one further refinement helps us stay sane:

import sys, traceback

    getattr(sys.modules[__name__], 'handle_' + datatype).run()
except KeyboardInterrupt:

By using these techniques, you can implement a plug-in architecture which won't crash your programs and will also let you know what went wrong when a plug-in goes awry.