Friday, February 24, 2012

Lesson Learned: Managing Large Numbers of Plots, with an Example in Python


Problem Statement:

A project generates hundreds, thousands, or more graphs over its life. These graphs are copied and pasted into e-mail, power point slides, etc. The plots become divorced from any of the documents they were originally distributed with. Invariably, at some point in the project, a plot is brought back and the question is what were the assumptions used to generate this graph. With only the graph available, it can be difficult or impossible to answer this question.

To complicate matters, the plots are generated using legacy codes and modifying all of the existing code base is a detailed endeavor.

How can this situation be improved?


There are two problems here. First, a given graph is not traceable to its origin. This can be remedied in one of many ways.  If the source data is well controlled and can be described using a short phase, then adding that phrase somewhere on the chart is helpful. If the source data is constantly changing or requires too much information to describe with a short phrase, then something else is needed. A hash of the input data can help identify and verify the source data set used to generate the graph. A universally unique ID (UUID) can be used to give a graph a unique name. If the source data, assumptions, etc are stored using that same UUID, then when a graph if brought back for review, all of the necessary parts can be found.

The second problem is handling legacy code. There are at least three choices.

  • The first is probably the easiest. A function to add hash and uuid could be created and inserted at the appropriate location in each of the major pieces of plotting code. This is problematic because there are several interfaces and actions in the scripts which could make this work poorly. Also, every plotting routine would need to be modified.
  • A second choice is to wrap the plotting routines into function, then pass this function and its data to a wrapper function which would add a hash and uuid as the last thing done by the plotting routine. If the plotting functions already exist, then this can be done without changing any of the plotting code.   
  • A third choice is to create a decorator which wraps plotting routines, adding the hash and uuid. This has the same issues as using a wrapper call and requires changes to the plotting routines source. However, the changes consist of an import statement and application of the decorator at the correct location.

For this problem, the decorator solution is used. The code that following implements a decorator that creates a hash of the plot data and a UUID which are added to the right side of the plot.  This way, no matter where the plot goes, there is a high likelihood that its pedigree can be preserved.

Script to demonstrate the use of decorators to add a unique identifier to
   a plot. The identifier incudes a hash of input data, to help see if 
   version of a plot really have different data, and a UUID to uniquely 
   identify this plot independent of when or where it was generated.

__author__  = 'Ed Tate'
__email__   = 'edtate<AT>gmail-dot-com'
__website__ = ''
__license__ = 'Creative Commons Attribute By -'

from matplotlib.pylab import *
import random
import md5
import uuid

def identifiable_plot(fn):
    def newfun(*args,**kwargs):
        # do before decorated function call
        # do after decorated function call
        # create the tag string from a hash of the data and a 
        #    universially unique ID
        x = args[0]
        y = args[1]
        xy = zip(x,y)
        m =
        this_uuid = str(uuid.uuid1())
        this_tag = 'hash=' + m.hexdigest() + ',' + 'UUID=' + this_uuid
        # write the tag to the figure for future reference
        figtext(1,0.5,this_tag ,rotation='vertical',
                size = 'x-small',
        return this_uuid

    return newfun

def my_plot(x,y):
x = [random.random() for i in range(100)]
y = [random.random() for i in range(100)]
plot_uuid = my_plot(x,y)



Test Configuration:
  • win7
  • PythonXY

This work is licensed under a Creative Commons Attribution By license.

No comments:

Post a Comment