Sunday, April 11, 2021

Merging Terabytes of Files - Part 1

Problem: Given terabytes of data, in millions of files, on multiple drives merging this dataset is a daunting task. 


Solution

There are several things that need to happen.

  1. Duplicate files need to be identified, even if they have different names and dates.
  2. Duplicate directories need to be found,  even if the directories, subdirectories, and files have different names.
  3. Similar directories should be merged and naming differences reconciled.

This problem can be broken down in to three steps. The first is to build a catalog of the directories and files. The second is to process the catalog to find duplicates and similarities. The third and final step is to build the new, merged file system.


Step #1 - Cataloging the file system


To catalog the files the following is done.
  1. Each directory is crawled to collect the following
    • filename
    • file extension
    • full file path
    • md5 hash (to uniquely identify the file)
    • creation date
    • modification date
    • file size
  2. A text output file is generated with all of this data



The Cataloging Script


For python 3.8+, the following script will build a catalog with path, names, size, dates, and has for each file. Once created, it easy to load this file in Excel and review the catalog.


import glob
import os
import pathlib
import datetime
import hashlib

# starting points for the catalog
root_dir = 'D:/' 
# file to store the catalog
outfilename = "Directory-Report.txt"

with open(outfilename, 'w') as outfile:

    for fullfilename in glob.iglob(root_dir + '**/**', recursive=True):
         print(fullfilename)
         
         # try:
         if True:
            path, filenameext = os.path.split(fullfilename)
            filename, file_extension = os.path.splitext(filenameext)
             
            # get md5 hash
            try:
                with open(fullfilename, "rb") as f:
                    file_hash = hashlib.md5()
                    while chunk := f.read(8192):
                         file_hash.update(chunk)
                         file_hash_str = file_hash.hexdigest()
            except:
                file_hash_str = ""


            # get create date
            fname = pathlib.Path(fullfilename)
            createdate = datetime.datetime.fromtimestamp(fname.stat().st_ctime)
            # get modification date
            moddate = datetime.datetime.fromtimestamp(fname.stat().st_mtime)
            # get file size
            size = os.path.getsize(fullfilename)
                         
             outfile.write('"%s","%s","%s","%s","%s",%s,%s,%s,%s\n' % 
                   (fullfilename,path,filenameext,filename,file_extension,createdate,moddate,size,file_hash_str))
                   
                   

Tuesday, February 16, 2021

Simulating and Visualizing 3D Sine Waves and Interference Using Blender - An Antenna Array Factor Example

Complex equations make it hard to understand how a behavior of a system changes with different design. Visualizing the result of design changes helps develop insight and can lead to better designs. Matlab, Python, and other tools are typically used to do this kind of exploration. However, other tools can also be very useful. 

Blender is used to visualize scientific results from simulations and scripts like shown here. Surprisingly, it can also be used to simulate and gain insight into real physical systems. The animation below shows a visualization of the interactions that contribute to the 'Array Factor' for an array of antennas as the spacing between 5 elements varies. Because of how optimized Blender is for rendering and geometry, this animation can be interacted and rendered in real time. The way this is generated is to used the 'shaders' in Blender to calculate the array factor. 


Discussion

The array factor for an antenna is defined as 


This calculation describes how the radiation from an antenna is scaled when it is part of a phased array. A key aspect is that by putting antennas in arrays, the radiation becomes stronger in some directions and weaker in others. This equation describes how the waves from each antenna are combined. 
This equation uses complex numbers, can be seen differently using Euler's formula
By expanding the AF equation, it becomes evident that this can be simulated using a repeating wave. This is where Blender can be used.

In Blender, there is high performance, highly optimized code used to texture the geometry. This is referred to as shaders. There are standard shaders and it is even possible to write customer shaders in OSL. Shaders can be defined as functions of x,y,z in space along with inputs from geometry and other attributes in the Blender mode.

By using the shader logic its possible to solve for the interference patterns in array antennas and visualize the results.

References


Tools


Details

The first step is defining sinusoidal waves that are a function of the origin of each antenna. This is done using the wave texture configured for spherical waves. The 'Vector' input locates the wave in space.




This wave texture is generated for each antenna in the array. In this example, there are 5 antennas, so there are 5 wave textures used.


To tie this into the antennas, the coordinates from each antenna are introduced into the shader using a texture coordinate from the antenna object. This way, if the antenna moves, then the origin of each wave moves.



This logic describes the radiation anywhere in the model. It is just a matter of assigning it to a volume or surface to visualize. This requires converting the output of the sine waves into a color, then defining how it is used to color a surface.



Putting all of this together, the antennas can be positioned anywhere, and the resulting interference pattern of the waves will be quickly and correctly calculated, then visualized. The array factor for two different antenna arrays are shown below. 




Additional detail and rendering options can be found in this stack exchange answer - https://blender.stackexchange.com/questions/213194/i-need-to-simulate-the-interference-of-two-sinewaves/213260#213260




Monday, February 15, 2021

Animated GIFs for Scientific Visualization - An Example Antenna Array Animation using Python, Matplotlib & GIMP

This animation shows how the radiation patterns from an array of antennas as more antennas are added. Using an animation its a lot easier to quickly see how the patterns changes with the number of antennas. Saving the results as a GIF file rather than an animation using Python allows the results to be used in a website or in a Powerpoint presentation. As of Matplotlib 3.3, there is not a gif export capability, but this king of animation can be generated using Python, Matplotlib, and GIMP. 


Discussion

Often scientific information and simulation results need to be visualization to be explained and understood. Live animations in a script are useful, but distributing scripts has a lot of issues. Creating movies (mp4, avi) sometime have issues with encoders. GIF animations are a reliable way to share a lot of information. They can be embedded in PowerPoint, send via email, or embedded in a website with minimal effort and the results just work. 

Once a script to generate a Matplotlib animation, it can usually be modified to generate a sequence of images and save them using the 'savefig' command. Then there are several tools that can convert the images to an animated GIF. With GIMP, the GIF can be generated in a a few minutes and there are a lot of options to control the speed, resolution, and resolution of the result.


Steps to Generate An Animated GIF

  1. Generate a sequence of images of identical size using Matplotlib and save using savefig.
  2. Open GIMP
    • File -> Load the plots as layers
      • Select all image files
    • Filters -> Animation -> Optimize (for GIF)
      • Image->Mode->Index
      • Generate optimum palette
      • Maximum number of colors: 256
      • Color dithering: Positioned
      • Enable dithering of transparency: checked
      • Select Convert
    • Filter -> Animation -> Optimize (GIF)
    • File -> Export
      • Name the file with a .gif extension
      • Select 'Export'
      • On new dialog
        • Select as animation
        • Set timing to 20 msec

    References

    Sample Script


    
    # -*- coding: utf-8 -*-
    
    import matplotlib.pyplot as plt
    from matplotlib.colors import BoundaryNorm
    from matplotlib.ticker import MaxNLocator
    import numpy as np
    
    
    
    # make these smaller to increase the resolution
    dx, dy = 0.005, 0.005
    
    # list of antenna locations
    wavelength = 0.1
    spacing = 0.25
    x_list = np.array([0,1,-1,2,-2,3,-3,4,-4,5,-5,6,-6,7,-7])*spacing*wavelength
    y_list = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    phase_deg_list = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,] 
    phase_offset_deg_list = range(0,720,9)
    
    num_antennas = [1+int(pp*8/720)*2 for pp in phase_offset_deg_list]
    
    
    
    for idx,(phase_anim,antennas) in enumerate(zip(phase_offset_deg_list,num_antennas)):
        
        # generate 2 2d grids for the x & y bounds
        y, x = np.mgrid[slice(-1, 1 + dy, dy),
                        slice(-1, 1 + dx, dx)]
        
        z_list= []
        for xx,yy,phase_offset in zip(x_list[:antennas],y_list,phase_deg_list):
            phase_shift = 2*np.pi*(phase_offset + phase_anim)/360
            zz = np.real(np.exp(1j*phase_shift)*np.exp(np.pi*2/wavelength*-1j*np.sqrt((x-xx)**2 + (y-yy)**2)))
            z_list += [zz]
        
        
        z = sum(z_list)/len(z_list)
        
        # x and y are bounds, so z should be the value *inside* those bounds.
        # Therefore, remove the last value from the z array.
        z = z[:-1, :-1]
        levels = MaxNLocator(nbins=15).tick_values(z.min(), z.max())
        
        
        # pick the desired colormap, sensible levels, and define a normalization
        # instance which takes data values and translates those into levels.
        cmap = plt.get_cmap('bwr')
        norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
        
        fig, (ax1) = plt.subplots(figsize=(12,10),dpi=75)
        
        
        # contours are *point* based plots, so convert our bound into point
        # centers
        ax1.plot(x_list,y_list,'o',linestyle='',markerfacecolor='black')
        ax1.plot(x_list[:antennas],y_list[:antennas],'ko',linestyle='',markerfacecolor='white')
        cf = ax1.contourf(x[:-1, :-1] + dx/2.,
                          y[:-1, :-1] + dy/2., z, levels=levels,
                          cmap=cmap,
                          vmin=-1, vmax=1.0)
        #plt.clim(-1,1)
        fig.colorbar(cf, ax=ax1)
        ax1.set_title('Number of Antennas = %i' % antennas)
        
        
        # adjust spacing between subplots so `ax1` title and `ax0` tick labels
        # don't overlap
        fig.tight_layout()
        plt.savefig('image-%03i.png' % idx)
        
        plt.show()
    
    

    Sunday, February 14, 2021

    Generating 3D Plots and Exporting the Geometry - An Example with Antenna Radiation in Matplotlib, PyVista and Blender

    This was made starting from a Python script using Matplotlib. This post shows how. 

    Discussion

    Matplotlib is a great tool for generating beautiful 2D and 3D images quickly. 


    However, sometimes its desirable to generate something more visual and appealing to help explain a concept more clearly. Often this is not possible in Matplotlib. Other tools must be used.  However, there is not facility in Matplotlib (as of 3.2.1) to export that plot surface for use in other tools. 
    Fortunately, using almost the same syntax as Matplotlib, a 3D plot can be created in PyVista and exported in one of the common 3D file formation like stl, ply, or fbx. Once this is done, its possible to import into a tool like Blender and create a high quality rendering or animation to help explain a concept. 

    This posting will cover the key steps in doing this and use the plotting of the radiation pattern of an antenna to help illustrate.


    Setup

    You'll need Python 3 with Matplotlib, PyVista, and Blender to replicate this. My personal setup uses the portable version of Python 3 and Blender.
    1. Install WinPython portable (https://winpython.github.io/)
    2. Use pypi to install PyVista (https://pypi.org/project/pyvista/)
    3. Install Blender portable (https://www.blender.org/download/ and chose the portable version)

    Example

    Sunday, March 18, 2012

    Installing sailfish-CFD under PythonXY

    image

    Discussion

    Sailfish CFD is an interesting python program. It uses OpenCL to solve fluid flow problems using the Lattice-Boltzmann method. Sailfish uses PyOpenCL or PyCuda to manage the simulations and pygame as one of its visualization packages. The following steps can be used to add Sailfish CFD to PythonXY and execute the examples using OpenCL.

    Installation Steps:

    1. Install the OpenCL driver for the computer from the CPU/GPU vendor. NVIDIA drivers can be found here. Intel Drivers can be found here.
    2. Download the source using Git following the links here. If you do not use Git or don’t want to add it to your machine, a portable version is available here.
    3. Since the developers do not offer packages (yet?), use git to clone their repository. I clone to a temp directory, then work from there.
      • git clone git://gitorious.org/sailfish/sailfish.git c:\temp\sailfish
      • The repository can be viewed through a GUI using the gitk command.
    4. I like to keep the python pieces together, so I copy the contents of the sailfish directory from my temp location to the a sailfish directory created under c:\Python27.
    5. There appears to be an issue with the location of the Mako files. So following the recommendation on the Sailfish google group in this exchange, copy all of the template files from “C:\Python27\sailfish\sailfish\templates” to “C:\Python27\sailfish\sailfish”.
    6. Create a file named “sailfish.pth” at “C:\Python27\Lib\site-packages”. Edit the file and add a single line with “C:\Python27\sailfish\”. This tells python to look in that directory for the sailfish modules.
    7. Test the installation by opening Python and typing “import sailfish” at the prompt. If it imports without errors, your installation is probably good.
    8. Since PythonXY works with PyOpenCL out of the box and Sailfish works with OpenCL, the examples can now be run by using the “—backend=opencl” option. Open up “C:\Python27\sailfish\examples\lbm_cylinder.py” and try to run using the opencl option.
    9. If your system does not have a GPU, you might encounter an error like “pyopencl.LogicError: Context failed: invalid value”. This is because Sailfish is configured to only use GPUs. I was able to get my installation to work on a system without a GPU, but with Intel OpenCL installed. This was done by modifying “C:\Python27\sailfish\sailfish\backend_opencl.py”  modifying line 31 from “devices = platform.get_devices(device_type=cl.device_type.GPU)” to “devices = platform.get_devices()”.
    The image at the top of the posting was generated from “C:\Python27\sailfish\examples\lbm_cylinder.py”.

    References:


    Test Environment:

    • Win 7 Professional
    • PythonXY 2.7.2.1
    • Sailfish git commit with SHA 1 of 29d7c934be8c02cf714ce2307796a4550428b512 from 2011-11-06 at 14:35:56
    • Intel OpenCL
    • i7 CPU, no GPU
    This work is licensed under a Creative Commons Attribution By license.

    Saturday, March 17, 2012

    Yet Another Countdown Timer Application built using .HTA (Windows HTml Applications)

    image

    Discussion:

    The time until something is due is always important. There are a lot of countdown timers which are available. There are a lot of interesting ways to implement a countdown timer. In restricted IT environments, where installing applications is frowned upon, one approach to creating a lightweight application is using HTML Applications (HTA).  The following example shows how to use an HTA to build a standalone countdown timer which is displayed as a window.

    Several techniques are used to make this countdown timer work. First, the name of the file is parsed to determine the target date and time. This way, the remaining time can be set by renaming the file. Next, javascript is used to rewrite a portion of the document by assigning a value to the ‘innerHTML’  of  div element. Then, the countdown function is called once when the body of the document is loaded. However, once the function is called, the last thing it does is reschedule itself to run again in 1 second. Finally, this file, which looks like an HTML file is saved as with the extension HTA.

    Code Example:

    <html>
    <head>
    	<TITLE>Countdown Timer</TITLE>
    	<HTA:APPLICATION 
    		ID="oMyApp" 
    		APPLICATIONNAME="CountDownTimer" 
    		BORDER="yes"
    		CAPTION="yes"
    		SHOWINTASKBAR="yes"
    		SINGLEINSTANCE="yes"
    		SYSMENU="yes"
    		WINDOWSTATE="normal"
    		MAXIMIZEBUTTON="no">
        <script>
    		// set the window size and place it on the screen in the upper right hand corner
    		var windowWidth = 600
    		var windowHeight = 150
    		window.resizeTo(600,150);  // Can only be done in HTA
    		//half the screen height minus half the new window height (plus title and status bars).
    		iMyHeight = (window.screen.height/4) - (windowHeight/2 + 50)/2;
            window.moveTo(window.screen.width-windowWidth-5,5);
        </script>
    	<style type="text/css">
    		body {
    			background-color: green;
    			color: black; 
    			text-align: center; 
    			font-family: arial; 
    			font-weight: bold; 
    			font-size: 20px;
    			font-variant: small-caps;
    			border: medium double rgb(0,255,15);
    			width:100%;
    			overflow: hidden;
    			filter:progid:DXImageTransform.Microsoft.Gradient(endColorstr='#00C000',
    						startColorstr='#00FFFF', gradientType='0'); 			
    			}
    	</style>
    </head>
    <body>
    	<!-- Define a division for displaying the countdown message -->
    	<div id="countdown" ></div>
    	<!-- Define the actions to take when the body loads -->
    	<script>
    	// parse the filename to get the target date and time
    	var filename = String(window.location)              // get filename
    	var iSubStrEnd = filename.lastIndexOf('.')       // get point in string where the date data may end
    	var iSubStrStart = filename.lastIndexOf('.',iSubStrEnd-1) // get point in string where the date data may start
    	if ((iSubStrEnd>0)&&(iSubStrStart>0)&&(iSubStrStart<iSubStrEnd)) {
    		// get the date string from the file name
    		endDateTime = filename.slice(iSubStrStart+1,iSubStrEnd)
    		}
    	else {
    		// enter the stop date here that will be used if the name is not formatted correctly
    		// format for this string is YYYY,MM,DD,HH,MM,SS
    		endDateTime = '2015,01,01,0,0,0'
    		}
    		
    	// parse the date
    	var params = endDateTime.split(',')
    	lyear  = parseInt(params[0])
    	lmonth = parseInt(params[1])
    	lday   = parseInt(params[2])
    	// call the countdown function which will hook itself into a timer and
    	//     continue to run every second.
    	countdown(lyear,lmonth,lday,params[3],params[4],params[5])	
    
    	//-------------------------------------------------------
    	function countdown(yr,m,d,hh,mm,ss){
    		var montharray=new Array("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
    		hh = ("00"+hh).slice(-2);
    		mm = ("00"+mm).slice(-2); //if (mm<10) mm = "0"+(mm+1);
    		ss = ("00"+ss).slice(-2); //if (ss<10) ss = "0"+ss;
    		theyear=yr; themonth=m; theday=d;
    		thehh=parseInt(hh); themm=parseInt(mm); thess=parseInt(ss);
    		var today=new Date();
    		var todayy=today.getFullYear();
    		var todaym=today.getMonth()+1;
    		var todayd=today.getDate();
    		var todayh=today.getHours();
    		var todaymin=today.getMinutes();
    		var todaysec=today.getSeconds();
    		var todaystring=montharray[todaym]+" "+todayd+", "+todayy+" "+todayh+":"+todaymin+":"+todaysec
    		futurestring   =montharray[m-1]   +" "+d     +", "+yr    +" "+hh    +":"+mm      +":"+ss
    		var dd    = Date.parse(futurestring)-Date.parse(todaystring);
    		var dday  = Math.floor(dd/(60*60*1000*24)*1);
    		var dhour = Math.floor((dd%(60*60*1000*24))/(60*60*1000)*1);
    		var dmin  = Math.floor(((dd%(60*60*1000*24))%(60*60*1000))/(60*1000)*1);
    		var dsec  = Math.floor((((dd%(60*60*1000*24))%(60*60*1000))%(60*1000))/1000*1);
    		if(dday<=0&&dhour<=0&&dmin<=0&&dsec<=0){
    			// put text in the "countdown" division to show the countdown expired
    			document.getElementById("countdown").innerHTML = "<br> Time has run out! <br>";
    			return;
    			}
    		else {
    			dhour = ("00"+dhour).slice(-2);
    			dmin  = ("00"+dmin).slice(-2);
    			dsec  = ("00"+dsec).slice(-2);
    			// format the delta time and put in the countdown division to display
    			document.getElementById("countdown").innerHTML = 
    				"<br> Times out in "+dday+ 
    				" days, "+dhour+" hours, "+dmin+" minutes, and "+dsec+
    				" seconds  <br>";
    			}
    		// trigger this routine to run again in 1 second
    		setTimeout("countdown(theyear,themonth,theday,thehh,themm,thess)",1000);
    		} 
    
    	</script>
    	</body>
    </html>

     

    TEsting Environment:

    • IE9
    • Win7
    This work is licensed under a Creative Commons Attribution By license.

    Friday, February 24, 2012

    Lesson Learned: Managing Large Numbers of Plots, with an Example in Python

    28056200-5ef7-11e1-8741-08002700a056

    Problem Statement:

    A project generates hundreds, thousands, or more graphs over its life. These graphs are copied and pasted into e-mail, power point slides, etc. The plots become divorced from any of the documents they were originally distributed with. Invariably, at some point in the project, a plot is brought back and the question is what were the assumptions used to generate this graph. With only the graph available, it can be difficult or impossible to answer this question.

    To complicate matters, the plots are generated using legacy codes and modifying all of the existing code base is a detailed endeavor.

    How can this situation be improved?

    Discussion:

    There are two problems here. First, a given graph is not traceable to its origin. This can be remedied in one of many ways.  If the source data is well controlled and can be described using a short phase, then adding that phrase somewhere on the chart is helpful. If the source data is constantly changing or requires too much information to describe with a short phrase, then something else is needed. A hash of the input data can help identify and verify the source data set used to generate the graph. A universally unique ID (UUID) can be used to give a graph a unique name. If the source data, assumptions, etc are stored using that same UUID, then when a graph if brought back for review, all of the necessary parts can be found.

    The second problem is handling legacy code. There are at least three choices.

    • The first is probably the easiest. A function to add hash and uuid could be created and inserted at the appropriate location in each of the major pieces of plotting code. This is problematic because there are several interfaces and actions in the scripts which could make this work poorly. Also, every plotting routine would need to be modified.
    • A second choice is to wrap the plotting routines into function, then pass this function and its data to a wrapper function which would add a hash and uuid as the last thing done by the plotting routine. If the plotting functions already exist, then this can be done without changing any of the plotting code.   
    • A third choice is to create a decorator which wraps plotting routines, adding the hash and uuid. This has the same issues as using a wrapper call and requires changes to the plotting routines source. However, the changes consist of an import statement and application of the decorator at the correct location.

    For this problem, the decorator solution is used. The code that following implements a decorator that creates a hash of the plot data and a UUID which are added to the right side of the plot.  This way, no matter where the plot goes, there is a high likelihood that its pedigree can be preserved.

    '''
    Script to demonstrate the use of decorators to add a unique identifier to
       a plot. The identifier incudes a hash of input data, to help see if 
       version of a plot really have different data, and a UUID to uniquely 
       identify this plot independent of when or where it was generated.
    '''
    
    __author__  = 'Ed Tate'
    __email__   = 'edtate<AT>gmail-dot-com'
    __website__ = 'exnumerus.blogspot.com'
    __license__ = 'Creative Commons Attribute By - http://creativecommons.org/licenses/by/3.0/us/'
    
    from matplotlib.pylab import *
    import random
    import md5
    import uuid
    
    def identifiable_plot(fn):
        def newfun(*args,**kwargs):
            # do before decorated function call
            fn(*args,**kwargs)
            # do after decorated function call
            # create the tag string from a hash of the data and a 
            #    universially unique ID
            x = args[0]
            y = args[1]
            xy = zip(x,y)
            m = md5.new()
            m.update(str(xy))
            this_uuid = str(uuid.uuid1())
            this_tag = 'hash=' + m.hexdigest() + ',' + 'UUID=' + this_uuid
            # write the tag to the figure for future reference
            figtext(1,0.5,this_tag ,rotation='vertical',
                    horizontalalignment='right',
                    verticalalignment='center',
                    size = 'x-small',
                    )
            return this_uuid
    
        return newfun
    
    ###############################
        
    @identifiable_plot
    def my_plot(x,y):
        plot(x,y,'o')
        grid(True)
        
    ###############################
        
    x = [random.random() for i in range(100)]
    y = [random.random() for i in range(100)]
        
    plot_uuid = my_plot(x,y)
    savefig(plot_uuid+'.png')
    
    show()
    

     


    Test Configuration:
    • win7
    • PythonXY 2.7.2.1

    References:
    This work is licensed under a Creative Commons Attribution By license.

    Using HTML to View Large Sets of Plots - An Example in Python



    This example doesn't work because of Blogger limitations. However if you run the example you will be able to select graphs in the generated HTML page.

    Problem Statement

    You have a program which generates lots of similar plots that end users would like to compare and explore. The end users may not be able to install any code. You can't setup a web server to nagivate the data set. You can not install any new programs on their window's desktop. How to you provide a solution?

    Discussion

    You can assume that any modern computer at least has a copy of Firefox, Safari, or Explorer. Since there browers support javascript (except in the worst case security settings), you can build a very lightweight data viewer using a few simple methods. The most important design decisions when generating the plots is to name the plots so they are easy to recreate from selections an end user might make.

    Example

    The following snippet of Python code generates 9 graphs that have random numbers plotted on two axes using different colors and markers. There are three choices of colors and three choices of markers. After generating the plots and saving them, the script creates an HTML file which simplies the navigation of the images. A user can open the HTML page and select the graph by changing the form selections at the top of the page.

    There are a couple of key concepts that help make this work:
    • The plot names can be created from selections using javascript. For example, in this example there are three colors and three different markers. All of the plot file names are formed by concatenating the color and marker description to form a plot name.
    • When a user changes their choice of color or marker a javascript function rebuilds the plot file name and causes the browser to reload the image by change the img source.
    • The python script uses templates to set up the bulk of the HTML page, then substitutes in specific options for the user after the plots have been generated. 

    import pylab as plt
    from random import random
    from string import Template
    
    colors  = {'Red_Plot':'r',
               'Blue_Plot':'b',
               'Green_Plot':'g',
               }
    markers = {'Circle_Plot':'o',
               'Square_Plot':'s',
               'Diamond_Plot':'d',
               } 
    
    plt.figure()
    for c_key in colors.keys():
        for m_key in markers.keys():
            plt.clf()
            plot_name = c_key + ',' + m_key + '.png'
            x = [random() for i in range(0,100)]
            y = [random() for i in range(0,100)]
            color = colors[c_key]
            marker = markers[m_key]
            plt.plot(x,y,color+marker,markersize=15)
            plt.savefig(plot_name)
            
    HTML_template = Template('''<head>
       <script language="JavaScript"><!--
          function sel_plot() {
             // only do this if the brower supports images
             if(document.images) {
                // get plot color name
                var e=document.getElementById("color_name");
                var c_name = e.options[e.selectedIndex].text;
                // get plot marker name
                var e=document.getElementById("marker_name");
                var m_name = e.options[e.selectedIndex].text; // create the filename
                // build the filename from these selections
                var plot_filename = c_name + "," + m_name + ".png";
                // cause the correct plot to be loaded
                document["plot"].src = plot_filename;
                }
             }
          // select the plot to display initially after loading document
          window.onload = function() { sel_plot() };
          // silence errors
          window.onerr = null;
       </script>
    </head>
    <body>
       <center>
          <form name="Plot Select Form" id="plot_select_form">
             <select id="color_name" size="3"
                onchange="sel_plot()">
                $color_options
             </select>
             <select id="marker_name" size="3"
                onchange="sel_plot()">
                $marker_options
             </select>
          </form>
          <img name="plot" src="dummy.png" height="200"
             width="500">
       </center>
    </body>
    ''')
            
    
    # build the option strings
    def build_options(opt_dict):
        s = ''
        for i,key in enumerate(opt_dict.keys()):
            if i==1:
                s += '<option selected>'
            else:
                s += '<option>'
            s += key
            s += '</option>\n'
        return s    
    
    color_options = build_options(colors)
    marker_options = build_options(markers)
    
    # build the HTML page        
    HTML = HTML_template.substitute(color_options=color_options,
                         marker_options=marker_options)
        
    
    # write the HTML page
    f = open('example.html','wb')
    f.write(HTML)
    f.close()

    Test Configuration

    • PythonXY 2.7.2.1
    • IE 9


    This work is licensed under a Creative Commons Attribution By license.

    Saturday, December 24, 2011

    How to Remove Noise from a Signal using Fourier Transforms: An Example in Python

    Problem Statement:

    Given a signal, which is regularly sampled over time and is “noisy”, how can the noise be reduced while minimizing the changes to the original signal.

    Discussion:

    A common problem in reconstructing data is elimination of noise. Noise can corrupt a signal through many means: quantization, measurement noise, errors in sampling time, sensor bias, sensor nonlinearities, signal cross coupling, etc. There are many methods which can be used to eliminate the noise on a signal. Some common approaches include use of a linear filter, Kalman filtering, Wiener filtering, construction of a custom optimization problem, and any number of ad-hoc approaches. Furthermore, the filtering of the signal can be done causally or noncausally. In a causal filter, the filtering for a particular sample if determined based only on the sample received before that sample. In acausal or noncausal filtering, all of the data before and after the sample can be used to determine the ‘best’ value for each sample.
    One way to quickly filter a dataset without much effort is to use a Fourier transform. A Fourier transform is a way to decompose a signal into a sum of sine waves. The amplitude and phase associated with each sine wave is known as the spectrum of a signal. If the spectrum of the noise if away from the spectrum of the original signal, then original signal can be filtered by taking a Fourier transform, filtering the Fourier transform, then using the inverse Fourier transform to reconstruct the signal.
    Lets start with a  simple example. Consider a signal consisting of a single sine wave, \(s(t)=sin(w*t)\). Let the signal be subject to white noise which is added in during measurement, \(s_{measured}(t)=s(t)+n\). Let the \(F\) be the Fourier transform of \(s\). Now by setting the value of \(F\) to zero for frequencies above and below \(w\), the noise is reduced. Let \(F_{filtered}\) be the filtered Fourier transform. Taking the inverse Fourier transform of \(F_{filtered}\) yields \(s_{filtered}(t)\). The code sample which follows illustrates these operations.

     """
    Script to demonstrate the use of Fourier Transforms to acausally filter 
       a signal.
    
    """
    
    
    __author__ = 'Ed Tate'
    __email__  = 'edtategmail-dot-com'
    __website__ = 'exnumerus.blogspot.com'
    __license__ = 'Creative Commons Attribute By - http://creativecommons.org/licenses/by/3.0/us/''' 
    
    
    from pylab import *
    
    # setup the problem
    num_samples  = 1000 # number of samples
    
    # generate an ideal signal
    f_signal  = 6   # signal frequency  in Hz
    dt = 0.01 # sample timing in sec
    p  = 30   # 30 degrees of phase shift
    a  = 1    # signal amplitude
    s = [a*sin((2*pi)*f_signal*k*dt) for k in range(0,num_samples)]
    s_time = [k*dt for k in range(0,num_samples)]
    
    # simulate measurement noise
    from random import gauss
    mu = 0
    sigma = 2
    n = [gauss(mu,sigma) for k in range(0,num_samples)]
    
    # measured signal
    s_measured = [ss+nn for ss,nn in zip(s,n)]
    
    # take the fourier transform of the data
    F = fft(s_measured)
        
    # calculate the frequencies for each FFT sample
    f = fftfreq(len(F),dt)  # get sample frequency in cycles/sec (Hz)
    
    # filter the Fourier transform
    def filter_rule(x,freq):
        band = 0.05
        if abs(freq)>f_signal+band or abs(freq)<f_signal-band:
            return 0
        else:
            return x
            
    F_filtered = array([filter_rule(x,freq) for x,freq in zip(F,f)])
    
    # reconstruct the filtered signal
    s_filtered = ifft(F_filtered)
    
    # get error
    err = [abs(s1-s2) for s1,s2 in zip(s,s_filtered) ]
    
    figure()
    subplot(4,1,1)
    plot(s_time,s)
    ylabel('Original Signal')
    xlabel('time [s]')
    
    subplot(4,1,2)
    plot(s_time,s_measured)
    ylabel('Measured Signal')
    xlabel('time [s]')
    
    subplot(4,1,3)
    semilogy(f,abs(F_filtered),'or')
    semilogy(f,abs(F),'.')
    legend(['Filtered Spectrum','Measured Spectrum',])
    xlabel('frequency [Hz]')
    
    subplot(4,1,4)
    plot(s_time,s_filtered,'r')
    plot(s_time,s,'b')
    legend(['Filtered Signal','Original Signal'])
    xlabel('time [s]')
    
    
    show()
    
    
    
    The output from this script shows the initial signal, followed by the measured signal which is corrupted with noise. The spectrum of this signal is filtered to recover the original signal.





    References and useful links:


    Test Configuration:
    • win7
    • PythonXY 2.7.2.1
    
    

    Thursday, August 4, 2011

    Random Notes: Validation of models vs physical systems when controls are developed independently

    Problem Statement

    How do you validate a model of a system against a physical system when a controller is necessary to make the system operate and the the operational policies of the controllers were developed independently.

    Discussion

    Consider a development process with a well defined operation cost model which drives both the model and physical system to optimization that operations cost. If the model uses full state feedback for control and the physical system uses either full state feedback or implements output feedback with state estimators and relies on certainty equivalence, there is no guarantee that under identical test conditions the trajectories of the cost and the trajectories of the states will be identical even if both system are optimal with respect to the operational costs. This is because optimal operational costs are unique, but there are no guarantees that optimal tratectories are unique. Furthermore, if the controls in both cases are not globally optimal, by only near optimal, then likelihood of non-unique trajectories is even more likely. However, because the operational costs can be unique, the validation exercise can be decomposed into two validation steps.

    First, the equations which model the physics can be validated against test data on the physical system by measuring the states in the real system, then substituting the integrator in the model with the state measurements. Ideally, the physical system could execute both policies. The error in costs, and derivative calculations can be compared to quantify the error between the model of the physics and the real physics.

    Second, once the errors in the model of the physics are quantified, the error in the costs under the different controllers can be quantified. Ideally, from this step, the optimality of the controls wrt to a globally optimal controller (or minimizing controller) can be established. Once interesting possibility is to use policy improvement to see if the independently developed policies can merged for better performance. Alternatively, if there are unexplained differences, then the constraints respected by the different policies need to be reconciled. Things like robustness may also contribute to differences. Robustness will in general be a driven by different view of noise and risk sensitivity. Following on that, there is the possibility that the equation structure in the different policies lead to different performance limitations. Again, policy improvement may provide a way to identify these structure imposed limitations.

    This work is licensed under a Creative Commons Attribution By license.

    Wednesday, August 3, 2011

    Random Notes: Thoughts on metrics for engineering tools

    • If training is required for successful usage, what is the average half-life of the training. In other words, if a group of users is trained, how long until 50% of the users will forget some key aspect of tool usage which drives them to abandon the tool?
    • What is the average time for user to need to go to the help files to complete a task if they do not use the tool constantly?
    • Can a user successfully use the tool without training?
    • Are the documentation and examples sufficient for self learning?
    • How many actions are required to complete a ‘quickstart’ example?
    • How many decisions are required to complete a ‘quickstart’ example?
    • How many choices are the in each decision in a typical workflow?
    • How difficult is it to integrate the tool into automated work flows?
    • How difficult is it to customize the tool?
      • Can a power user customize the tool?
    • How long does it take to introduce a new feature in the tool?
    • How many sentences does it take to describe why a user should adopt the tool?
    • In the absence of process enforcement, would the users naturally adopt this solution?
    • What is the time saving for the individual, team, and organization from the adoption of the tool?
    • If the tool reduces error rates, is there feedback to the users to help them understand the improvement?
    • Can the input and output to the tool be reused so that the effort can be reapplied?
    • What is the ‘activation potential’ to get a new user to adopt the tool?
      • Do new users request access to the tool?
    • In a corporate setting, how difficult are the permissions to manage?
      • If a new user if not setup, will the team be able to duplicate the permissions with without calling the developers?

    This work is licensed under a Creative Commons Attribution By license.

    Monday, August 1, 2011

    Draft: Notes on dynamic programming equations which solve cost models for dynamic systems

     

    Deterministic Cost Models

    Description

    Cost Model

    Dynamic Programming Equations

    Restrictions

    Finite Horizon Total Cost

    \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\)

    \( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\}  \)

    \(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\)

    \(0\leq\alpha<1\)

    Infinite Horizon Total Cost

    \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

    \(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

    \(0\leq\alpha<1\)

    Finite Horizon Shortest Path

    \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\)

    \( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\} \)

    \(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\)

    \(0\leq\alpha\leq1\)

    \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

    Infinite Horizon Shortest Path

    \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

    \(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

    \(0\leq\alpha\leq1\)

    \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

    Average Cost

    \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

    \(V^{\pi}\left(x\right)+\lambda=c\left(x,\pi\left(x\right)\right)+V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

    \(0\leq\alpha<1\)

    \(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\)

     

    Stochastic Cost Models

    Description

    Cost Model

    Dynamic Programming Equations

    Restrictions

    Finite Horizon Total Cost

    \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

    \(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

    \(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\)

    \(0\leq\alpha<1\)

    Infinite Horizon Total Cost

    \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

    \(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

    \(0\leq\alpha<1\)

    Finite Horizon Shortest Path

    \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

    \(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

    \(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\)

    \(0\leq\alpha\leq1\)

    \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

    Infinite Horizon Shortest Path

    \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

    \(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

    \(0\leq\alpha\leq1\)

    \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

    Average Cost

    \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

    \(V^{\pi}\left(x\right)+\lambda=E\left[c\left(x,\pi\left(x\right),w\right)+V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

    \(0\leq\alpha<1\)

    \(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\)

     

    Risk Aware/Averse Stochastic Cost Models

    Description

    Cost Model

    Dynamic Programming Equations

    Restrictions

    Certainty Equivalence with exponential utility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot\frac{1}{\gamma}\cdot\ln\left(E^{W}\left[\exp\left(\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\)    
    Mean-Variance      
           
           
           

     

    Cost Models That don’t work or have issues

    Description

    Cost Model

    Issues

    Expected exponential disutility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot E^{W}\left[\textrm{sgn}\left(\gamma\right)\cdot\exp\left(\gamma\cdot\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\) Does not discriminate among policies
    Different version of expected exponential disutility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{\gamma}\cdot\log\left(E^{W}\left[\exp\left(\gamma\cdot\frac{\gamma}{K}\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\) Generally reduces to cost average
         
         
         

     

    References

    This work is licensed under a Creative Commons Attribution By license.

    Sunday, July 31, 2011

    Notes on Policy Improvement and Controlled Dynamic Systems

    See introductory background.

    A controlled dynamic system has inputs which can steer the evolution of the state of the system.

     

    \(x_{k+1}=f\left(x_{k},u_{k}\right)\)

    (1)

    The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.

     

    \(x_{k+1}=f\left(x_{k},\pi\left(x_{k}\right)\right)=\widetilde{f}\left(x_{k}\right)\)

    (2)

    Given a cost of operation for the dynamic system,

     

    \(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\),

    (3)

    a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is

     

    \(V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

    (4)

    One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If

     

    \(c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)\)

    (5)

    then the choice of \(u\) at \(x\) is an improvement on the policy \(\pi\) and will reduce the operating costs.

    Other key ideas:

    This work is licensed under a Creative Commons Attribution By license.

    Notes on Dynamic Systems and the Value Function

     

    Dynamic systems can be described by differential and difference equations. Without a loss of generality, consider a dynamic system represented by a difference equation: \(x_{k+1}=f\left(x_{k}\right)\). The state of the system is represented by \(x\) and the function \(f\) is the mapping from one state to the next. One way to characterize a dynamic system is with an additive cost: \(J\). An additive cost summarizes the cost of operation for the system from some initial state, \(x_{0}\). To ensure that the sums are finite an exponential weighting factor, \(alpha\), is introduced. This factor has a value between between 0 and 1.  Under some circumstances, \(alpha\) can be equal to one. One special case is where the system is guaranteed to eventually enter a zero cost state. However, in general, it will need to be less than one. This additive cost is a function of each initial condition:

     

    \(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\).

    (1)

    The value of the additive cost can be solved using the dynamic programming equations:

     

    \(V\left(x\right)=c\left(x\right)+\alpha^{k}\cdot V\left(f\left(x\right)\right)\).

    (2)

    The function \(V\) is referred to as the value function.

    This work is licensed under a Creative Commons Attribution By license.

    Wednesday, July 27, 2011

    Organizational Incentives and Responsibilities: An Observation

    In a large organization, properly aligning incentives and responsibilities is always a challenge. However, sometimes it all comes together like a grand plan. Saw a situation today that demonstrated this. A year ago, a particular facility was very undesirable:  old paint, old carpets, intermittent wireless, worn furniture, etc. In a large organization, fixing mundane issues like this can be a royal challenge: forms, approvals, policy, and more. By chance (at least to a distant observer), the machinery of the organization moved people responsible for facilities into the area. One year later… new paint, new carpets, new furniture. The other groups in the area had their productivity and satisfaction improved. Responsibilities met incentives and action was taken.

    This work is licensed under a Creative Commons Attribution By license.

    Thursday, July 7, 2011

    Where to find the Octave Grammar

    Problem Statement:

    Where is a grammar for Octave?

    Discussion:

    Octave implements an open source language which is very close to Matlab’s language. Octave is not an exact clone, but is close enough to make conversion of Matlab scripts into Octave relatively easy. Since this grammar is defined and available in an open source form, it can be reused in other projects.

    See:

    Also answers:

    • Where can I find a Matlab grammar
    This work is licensed under a Creative Commons Attribution By license.

    Friday, June 17, 2011

    An example of improving performance using PyOpenCL: Further improvements to the Ellipse Fit Program

    openCL objective call times

    Figure 1

    Problem Statement:

    Can using OpenCL with Python significantly improve the performance of the Ellipse Fit problem? This problem was introduced here and improved on here using vectorization, and here using parallelization.

    Discussion:

    OpenCL is an open, royalty free, cross platform standard for parallel programming on heterogeneous systems. PyOpenCL is a wrapper for OpenCL which allows blending an OpenCL program into a Python script. PyOpenCL requires some setup to incorporate into a script. Any OpenCL program must be configured to work with a Python script. In general, a text representation of the OpenCL function (work task) is created. The device which will execute the program needs to be selected. The function needs to have input and output buffers defined to move data between the main program and the device which will execute the work task. Before the work task is called, the input buffers must be filled with data. The text representation of the work task must be compiled before execution. Finally, the function is called and it loads data into the output buffers when it completes.

    One advantage of OpenCL is the ability to move a program among different devices for execution. One of the disadvantages is that there are multiple tuning aspects to tune which can affect performance. The best configuration for one device is usually not optimal for other devices.

    Testing Setup:

    To recreate these experiments, recreate the test setup here, then add the objective function from here. Then add the new function objective_openCL.py. Finally, overwrite the main.py script with the one here.

    objective_openCL.py

    '''
    This module contains an objective function for the ellipse fitting problem.
    The objective is coded using vectorized operations.
    
    '''
    
    from numpy import *
    import pyopencl as cl
    from string import Template
    
    # local imports
    import objective_scalar
    
    
    class Objective(objective_scalar.Objective):
    
        def __init__(self,parameters):
            objective_scalar.Objective.__init__(self,parameters)
       
            precision_dict = {
                'float':float32,
                'double':float64}    
                
            self.platform_name      = parameters.get('platform_name',None)
            self.platform_precision = parameters.get('platform_precision','double') 
            self.partial_sum_loc    = parameters.get('partial_sum_loc','__global')           
                
            self.numpy_prec = precision_dict[self.platform_precision]
            openCL_prec = self.platform_precision
        
            point_list = self._p    
        
            x,y  = zip(*point_list)
            self.x  = array(x,dtype=self.numpy_prec)
            self.y  = array(y,dtype=self.numpy_prec)
            self.f1 = array([0,0],copy=True,dtype=self.numpy_prec)
            self.f2 = array([0,0],copy=True,dtype=self.numpy_prec)
            self.a  = array([0],copy=True,dtype=self.numpy_prec)
            
            self.num_elements = len(x)
            
            platforms = cl.get_platforms()
            if len(platforms)>1 and not self.platform_name is None:
                # select the the first device in list
                for platform in platforms:
                    if self.platform_name in platform.name:
                        # use first device for context
                        for device in platform.get_devices():
                            self.ctx = cl.Context([device])
                            self.device = device
                            break
                        break
            else:
                self.ctx = cl.create_some_context()    
                
            if 'NVIDIA' in self.platform_name:
                compiler_options = '-cl-mad-enable -cl-fast-relaxed-math'
            else:
                compiler_options = ''
            
            #self.ctx = cl.create_some_context()
            self.queue = cl.CommandQueue(self.ctx)
            
            # define the input buffers and transfer data into them
            mf = cl.mem_flags
            self.x_buf  = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.x)
            self.y_buf  = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.y)
            self.f1_buf = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.f1)
            self.f2_buf = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.f2)
            self.a_buf  = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.a)
        
            # define the output buffer
            self.partial_sum_buf = cl.Buffer(self.ctx, mf.READ_WRITE, self.x.nbytes)
            self.full_sum_buf    = cl.Buffer(self.ctx, mf.READ_ONLY, array([0],dtype=self.numpy_prec).nbytes) 
    
        
            # prep the container to the output
            self.partial_sum = empty(self.x.shape,dtype=self.numpy_prec)
            self.full_sum    = empty(array([0],dtype=self.numpy_prec).shape,dtype=self.numpy_prec)
        
            # define local memory if needed
            if not self.partial_sum_loc=='__global':
                self.local_partial_sum = cl.LocalMemory(self.x.nbytes)            
        
            code_pattern= Template("""
                #pragma OPENCL EXTENSION cl_khr_fp64: enable
                __kernel void partial_sum(
                __global const ${precision} *x,
                __global const ${precision} *y, 
                __global const ${precision} *f1,
                __global const ${precision} *f2,
                __global const ${precision} *a,
                ${partial_sum_loc} ${precision} *partial_sum)
                {
                  int gid = get_global_id(0);
                  
                  ${precision} x_f1_diff = x[gid]-f1[0];
                  ${precision} x_f2_diff = x[gid]-f2[0];
                  ${precision} y_f1_diff = y[gid]-f1[1];
                  ${precision} y_f2_diff = y[gid]-f2[1];
                  
                  ${precision} x_f1_diff_sq = x_f1_diff*x_f1_diff; 
                  ${precision} x_f2_diff_sq = x_f2_diff*x_f2_diff;
                  ${precision} y_f1_diff_sq = y_f1_diff*y_f1_diff;
                  ${precision} y_f2_diff_sq = y_f2_diff*y_f2_diff;
                  
                  ${precision} norm_pt_to_f1 = sqrt(x_f1_diff_sq+y_f1_diff_sq);
                  ${precision} norm_pt_to_f2 = sqrt(x_f2_diff_sq+y_f2_diff_sq);
            
                  ${precision} t1 = norm_pt_to_f1+norm_pt_to_f2-2*a[0];
            
                  partial_sum[gid]=t1*t1;
                }
                
                __kernel void full_sum(
                ${partial_sum_loc} ${precision} *partial_sum,
                __global ${precision} *full_sum)
                {
                  long i;
                  ${precision} sum=0;
                  
                  for(i=0; i<${num_elements}; i++) {
                      sum += partial_sum[i];
                      }
                  full_sum[0] = sum;
                }            
                
                __kernel void full_sum2(
                __global ${precision} *partial_sum)
                {
                  int gid = get_global_id(0);
                  long i;
                  long mid_point = (${num_elements}+1)/2;
                  if (gid>mid_point) {
                      return;
                  } else {
                      partial_sum[gid]+=partial_sum[gid+mid_point];
                  }
                }             
            """)
            substitutions = {'precision':openCL_prec,
                             'num_elements':self.num_elements,
                             'partial_sum_loc':self.partial_sum_loc}
            prg = code_pattern.substitute(substitutions)
        
              
        
            # define the function to be executed
            self.prg = cl.Program(self.ctx, prg).build(options=compiler_options)
    
    
        def __call__(self,x):
            '''
            Calculate the objective cost in the optimization problem.
            '''
                
            a      = x[0]
            self.a  = array([x[0]],copy=False,dtype=self.numpy_prec)
            self.f1 = array([x[1],x[2]],copy=False,dtype=self.numpy_prec)
            self.f2 = array([x[3],x[4]],copy=False,dtype=self.numpy_prec)
            
            # trigger transfer of values to the GPU buffers
            # only those values which changed before this evaluation need to be updated
            mf = cl.mem_flags        
            self.f1_buf = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.f1)
            self.f2_buf = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.f2)
            self.a_buf  = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.a)
          
            # execute the partial_sum function
            if self.partial_sum_loc == '__global':
                self.prg.partial_sum(self.queue, self.x.shape, None,
                            self.x_buf,self.y_buf,self.f1_buf,self.f2_buf,self.a_buf,self.partial_sum_buf)
                # trigger the copy of the results into the partial_sum container after completion
                cl.enqueue_read_buffer(self.queue, self.partial_sum_buf, self.partial_sum).wait()
                # execute the full_sum function
                self.prg.full_sum(self.queue, (1,), None,
                                self.partial_sum_buf,self.full_sum_buf)
            else:
                self.prg.partial_sum(self.queue, self.x.shape, (8,),
                            self.x_buf,self.y_buf,self.f1_buf,self.f2_buf,
                            self.a_buf,self.local_partial_sum)
                self.prg.full_sum(self.queue, (1,), None,
                            self.local_partial_sum,self.full_sum_buf)
                            
        
            # trigger the copy of the results into the partial_sum container after completion
            cl.enqueue_read_buffer(self.queue, self.full_sum_buf, self.full_sum).wait()
    
            # compare sums
            #print (sum(self.full_sum)-sum(self.partial_sum))
            #print (sum(self.full_sum),sum(self.partial_sum))
    
            # finish up the calculations 
            point_list = self._p
            n = float(len(point_list))
            _lambda =0.1
            #full_sum = sum(self.partial_sum)
            total = (self.full_sum/float(n) + 
                _lambda*self._ahat_max*self._sigma*exp((a/self._ahat_max)**4))
            
            return total
    
    
        def _del_(self):
            pass
    
    
    if __name__=='__main__':
        import time
        from random import uniform,normalvariate,seed
        # local modules
        import ellipse
        
        ####################################################################
    
        # setup test conditions
        num_reps = 1   
        num_pts = 1e6
        precision = 'float'
        seed(1234567890)      # set the random generator to get repeatable results
        
        # setup the reference ellipse
        
        # define the foci locations
        foci1_ref = array([2,-1])
        foci2_ref = array([-2,1])
        # define distance from foci to ellipse circumference
        a_ref = 2.5
        
        point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)
        
        parameters = { "point_list" : point_list,
                       "platform_name" : 'NVIDIA',
                       "platform_precisions" : precision}
        
        # test the function on an NVIDIA card
        t0 = time.time()
        my_objective = Objective(parameters)  
        x = my_objective.x0
    
        t1 = time.time()
        for i in range(0,num_reps):
            y  = my_objective(x)    
        t2 = time.time()
        
        print ''
        print 'NVIDIA initialization took %f sec' % (t1-t0)
        print 'NVIDIA execution took %f sec' % (t2-t1)
        print 'Executed %i times on %s' % (num_reps,my_objective.device.name)
        print ''
        
        # test the function on an intel CPU
        
        parameters = { "point_list" : point_list,
                       "platform_name" : 'Intel',
                       "platform_precisions" : precision}
        
        
        t0 = time.time()
        my_objective = Objective(parameters)
        x = my_objective.x0
    
        t1 = time.time()
        for i in range(0,num_reps):
            y  = my_objective(x)    
        t2 = time.time()    
        
        print ''
        print 'Intel initialization took %f sec' % (t1-t0)
        print 'Intel execution took %f sec' % (t2-t1)
        print 'Executed %i times on %s' % (num_reps,my_objective.device.name)
        print ''
        

     

    main.py

    '''
    This module tests the performance of different ways of
       forming the ellipse fit objective function and
       its impact on the ellipse fit optimization
       problem.
    '''
    
    __author__ = 'Ed Tate'
    __email__  = 'edtate<at>gmail-dot-com'
    __website__ = 'exnumerus.blogspot.com'
    __license__ = 'Creative Commons Attribute By - http://creativecommons.org/licenses/by/3.0/us/'''
    
    
    import time
    from random import uniform, normalvariate, seed
    from numpy import *
    import pylab as plt
    from openopt import NLP
    import gc
    
    # local modules
    import ellipse
    import objective_scalar
    import objective_vectorized
    import objective_vectorized_parallel
    import objective_openCL
    
    ####################################################################
    
    # setup test conditions
    num_reps = 2
    
    seed(1234567890)      # set the random generator to get repeatable results
    
    # setup the reference ellipse
    
    # define the foci locations
    foci1_ref = array([2,-1])
    foci2_ref = array([-2,1])
    # define distance from foci to ellipse circumference
    a_ref = 2.5
    
    ######################################################
    # setup the OpenCL Objective classes for CPU and GPU verions of the 
    #     OpenCL version of the objective function
    
    class objective_openCL_GPU_Objective(objective_openCL.Objective):
        def __init__(self,parameters):
            parameters["platform_name"]="NVIDIA"
            objective_openCL.Objective.__init__(self,parameters)
            
    class objective_openCL_CPU_Objective(objective_openCL.Objective):
        def __init__(self,parameters):
            parameters["platform_name"]="Intel"
            objective_openCL.Objective.__init__(self,parameters)
    
    #####################################################
    #
    objective_functions = [objective_scalar.Objective,
                           objective_vectorized.Objective,
                           objective_vectorized_parallel.Objective,
                           objective_openCL_GPU_Objective,
                           objective_openCL_CPU_Objective]
                          
    legends = ['scalar','vectorized','vectorized parallel',
               'OpenCL on NVIDIA GPU',
               'OpenCL on Intel i7 CPU']
    
    ######################################################
    # measure the timing to setup and call the objective function
    
    num_pts_list = [10,100,250,500,1000,2500,5000,10000,50000,1e5,1e6]
    #num_pts_list = [10,100,250]
    
    
    setup_times = []
    call_times  = []
    for num_pts in num_pts_list:
    
        seed(1234567890)      # set the random generator to get repeatable results    
        point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)
        
        parameters = { "point_list" : point_list }
        
        test_setup_times = []
        test_call_times = []
        for objective_function in objective_functions:
            t0 = time.time()
            my_objective = objective_function(parameters)
            x0 = my_objective.x0
            t1 = time.time()
            for i in range(0,num_reps):
                y  = my_objective(x0)    
            t2 = time.time()
            test_setup_times.append(t1-t0)
            test_call_times.append(t2-t1)
            
            del my_objective
            
        setup_times.append(test_setup_times)
        call_times.append(test_call_times)
        print 'Evaluated %i points' % num_pts
    
    # plot the results
    plt.figure()
    plt.loglog(num_pts_list,call_times,lw=2)
    plt.legend(legends,loc='upper left')
    plt.xlabel('Number of points')
    plt.ylabel('Execution Time [sec]')
    plt.title('Times for %i calls to objective function' % num_reps)
    plt.grid(True)
    
    
    #############################################################
    # test the performance when used in an optimizer
    
    num_pts_list = [10,100,250,500,1000,2500,5000]
    #num_pts_list = [10,100,250]
    
    setup_times = []
    call_times  = []
    for num_pts in num_pts_list:
        seed(1234567890)      # set the random generator to get repeatable results    
        point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)
        
        parameters = { "point_list" : point_list }
        
        test_setup_times = []
        test_call_times = []
        for objective_function in objective_functions:
            t0 = time.time()
            my_objective = objective_function(parameters)
            x0 = my_objective.x0
            p_vectorized = NLP(my_objective, x0, iprint = -1)
            t1 = time.time()
            opt_result = p_vectorized.solve('ralg')
            t2 = time.time()
            test_setup_times.append(t1-t0)
            test_call_times.append(t2-t1)
            
            print '%s : %s' % (opt_result.xf,objective_function)
            
            del my_objective
            gc.collect()   # force release of resources held by the objective
            
        setup_times.append(test_setup_times)
        call_times.append(test_call_times)
        print 'Evaluated %i points' % num_pts
        print
    
    # plot the results
    plt.figure()
    plt.loglog(num_pts_list,call_times,lw=2)
    plt.legend(legends,loc='upper left')
    plt.xlabel('Number of points')
    plt.ylabel('Execution Time [sec]')
    plt.title('Times to solve the ellipse fit problem')
    plt.grid(True)
    
    plt.show()

    Testing results:

       Figure 1 shows the difference in time to call the objective function for the various methods of implementing the objective function. From this chart, the worst performing method is to implement the objective function as a scalar. The best performing implementation is to implement the function using OpenCL on the CPU. This variation in performance is not as stark when the objective function is used in the Ellipse Fit problem. In fact, because the ellipse fit problem measures data transfer times and computation times, the ordering of the performance changes. When just measuring the time to evaluate the objective function, the increasing order of performance is scalar, parallelized & vectorized, vectorized, OpenCL on GPU, and the best performance is OpenCL on the CPU. Furthermore, as the problem scale changed, the relative performance ratios stay relatively constant.When the ellipse fit problem is solved, the ordering changes and the relative slopes change. One reason for this is the overhead associated with the optimization routines. Another reason is the increased time which includes time to transfer data into the GPU. Another interesting result is that the execution times do not change much as the problem size increases. This is probably due to the efficiency of the computation and the efficiency of block transfers of data.

    openCL fit call times

    Figure 2

    One other thing to note is that these different methods of computation have different resolutions in the floating point numbers. Because of these differences in the resolution, the optimization problem finds slightly different answers. The console output shows the differences in the results of the optimization. In general, for the larger problems, the difference are about one part in 10,0000.

    Console Output:

    Evaluated 10 points
    Evaluated 100 points
    Evaluated 250 points
    Evaluated 500 points
    Evaluated 1000 points
    Evaluated 2500 points
    Evaluated 5000 points
    Evaluated 10000 points
    Evaluated 50000 points
    Evaluated 100000 points
    Evaluated 1000000 points

     

    [ 2.53977781 -1.9791083   1.11795369  1.90068376 -0.97398384] : <class 'objective_scalar.Objective'>
    [ 2.53978396 -1.97911311  1.11797418  1.90068553 -0.97399285] : <class 'objective_vectorized.Objective'>
    [ 2.53980864 -1.9791325   1.11805566  1.90069183 -0.97402764] : <class 'objective_vectorized_parallel.Objective'>
    [ 2.53972748 -1.97906577  1.1177896   1.90067428 -0.9739137 ] : <class '__main__.objective_openCL_GPU_Objective'>
    [ 2.53977167 -1.97910332  1.11793354  1.90068234 -0.97397521] : <class '__main__.objective_openCL_CPU_Objective'>
    Evaluated 10 points

     

    [ 2.566147   -1.95135142  1.0549304   2.16680656 -1.10160299] : <class 'objective_scalar.Objective'>
    [ 2.56614697 -1.95134842  1.05492776  2.16681334 -1.10160238] : <class 'objective_vectorized.Objective'>
    [ 2.56614691 -1.95134982  1.05492914  2.16680969 -1.1016027 ] : <class 'objective_vectorized_parallel.Objective'>
    [ 2.56614661 -1.95134401  1.05492394  2.16682259 -1.10160128] : <class '__main__.objective_openCL_GPU_Objective'>
    [ 2.56614672 -1.95134937  1.05492891  2.16680991 -1.10160258] : <class '__main__.objective_openCL_CPU_Objective'>
    Evaluated 100 points

     

    [ 2.56330383 -2.0896741   1.06288154  2.04214077 -1.05030978] : <class 'objective_scalar.Objective'>
    [ 2.5633038  -2.08967399  1.06288158  2.04214065 -1.05031   ] : <class 'objective_vectorized.Objective'>
    [ 2.56330231 -2.08967161  1.06288221  2.04213788 -1.05031417] : <class 'objective_vectorized_parallel.Objective'>
    [ 2.56330174 -2.08967069  1.0628825   2.04213675 -1.05031584] : <class '__main__.objective_openCL_GPU_Objective'>
    [ 2.56330119 -2.08966978  1.0628827   2.04213573 -1.05031742] : <class '__main__.objective_openCL_CPU_Objective'>
    Evaluated 250 points

     

    [ 2.55213105 -2.06212704  1.05749405  2.02108216 -1.05369719] : <class 'objective_scalar.Objective'>
    [ 2.55390449 -2.0623902   1.05823288  2.02404738 -1.05682491] : <class 'objective_vectorized.Objective'>
    [ 2.54907192 -2.05769363  1.05956421  2.01575473 -1.05475975] : <class 'objective_vectorized_parallel.Objective'>
    [ 2.55212491 -2.06209759  1.05747762  2.02105559 -1.05370312] : <class '__main__.objective_openCL_GPU_Objective'>
    [ 2.5518505  -2.06217932  1.05803857  2.02239691 -1.05392966] : <class '__main__.objective_openCL_CPU_Objective'>
    Evaluated 500 points

     

    [ 2.61796899 -2.14539556  1.11170214  2.12027043 -1.09253186] : <class 'objective_scalar.Objective'>
    [ 2.61708437 -2.14479931  1.10835605  2.12087692 -1.090721  ] : <class 'objective_vectorized.Objective'>
    [ 2.61761133 -2.14669529  1.1089428   2.12139812 -1.09022015] : <class 'objective_vectorized_parallel.Objective'>
    [ 2.61753131 -2.14662822  1.10869732  2.12152618 -1.09010436] : <class '__main__.objective_openCL_GPU_Objective'>
    [ 2.61763094 -2.14672541  1.10895434  2.12136288 -1.09017078] : <class '__main__.objective_openCL_CPU_Objective'>
    Evaluated 1000 points

     

    [ 2.60471429 -2.11563461  1.06998647  2.13374471 -1.0697982 ] : <class 'objective_scalar.Objective'>
    [ 2.60416974 -2.11432462  1.07083334  2.13291023 -1.07026969] : <class 'objective_vectorized.Objective'>
    [ 2.60416139 -2.1136674   1.0722599   2.13226116 -1.07119697] : <class 'objective_vectorized_parallel.Objective'>
    [ 2.60527276 -2.11598152  1.07174066  2.13404717 -1.07158798] : <class '__main__.objective_openCL_GPU_Objective'>
    [ 2.60428855 -2.11418258  1.07163536  2.13309981 -1.07148272] : <class '__main__.objective_openCL_CPU_Objective'>
    Evaluated 2500 points

     

    [ 2.65236805 -2.17113736  1.11832436  2.18090649 -1.10477807] : <class 'objective_scalar.Objective'>
    [ 2.65236757 -2.17113597  1.11832359  2.18090679 -1.10477991] : <class 'objective_vectorized.Objective'>
    [ 2.65236655 -2.1711387   1.1183202   2.18090519 -1.10477049] : <class 'objective_vectorized_parallel.Objective'>
    [ 2.65124389 -2.17148832  1.11786432  2.17945815 -1.10311976] : <class '__main__.objective_openCL_GPU_Objective'>
    [ 2.65124383 -2.17148872  1.11786446  2.17945781 -1.10311962] : <class '__main__.objective_openCL_CPU_Objective'>
    Evaluated 5000 points

     

     

    Conclusions:

    • Using PyOpenCl is significantly faster then Numpy in this problem when using the CPU or the GPU to perform the calculation. The OpenCL version of the objective function solved on the GPU was over 10 times faster than the Numpy version of the objective function. The OpenCL version solved on the CPU was almost 100 times faster then the Numpy version.
    • The OpenCL version of the objective function was able to be redirected from the the GPU to the CPU using a a single line of code.
    • Using the native CPU with OpenCL is significantly faster then using the GPU for this problem. For this problem, the performance difference is related to the fact that the computational load is roughly proportional the quantity of data which is transferred into the device. When the OpenCL program executes on the CPU, the data transfer times are very low. When the OpenCL program runs on the GPU, global memory is used, which requires a data transfer from the CPU’s memory space to the GPU memory space. This is usually relatively slow.
    • Because of the differences in the numeric representation of the floating point values, the OpenCL calculations introduce differences in the calculations compared to the scalar, the Numpy, and parallelized Numpy representations.

     

    Hardware Configuration:

    • Pythonxy 2.6.6
    • PyOpenCL 0.92
    • Sony Vaio VPCF1
      • Intel i7  Q740 @ 1.73 GHz
      • NVIDIA GeForce GT 425M @ 1.12 GHz

     

    References:

    • Khronos Group OpenCL Standard
    • NVIDIA OpenCL manuals
    • Intel OpenCL manuals
    • PyOpenCL Home Page
    • PyOpenCL Documents
    This work is licensed under a Creative Commons Attribution By license.