## Problem Statement:

The optimization problem to fit an ellipse is too slow.

## Discussion:

The easiest thing to improve performance in a script is to try compiling it using a tool like psyco or py2exe. Beyond that, the time critical portions of the code can be written in C/C++ and embedded into the script using a library like scipy.weave or a tool like Cython.

If a Python script has elements which are naturally parallelizable, then there are additional approaches to improving the script’s performance. One approach is to break the code up and execute it among several CPU cores using a library like the multiprocessing library or parallel python. Another approach to try taking advantage of the CPU and GPU hardware using PyOpenCL. Finally, the script can be vectorized, where all of the equations are expressed as vector operations.

Since the objective in the Ellipse fitting problem is easily decomposed into a set of parallel equations, the solution considered here is the use of numpy and vectorization of the code.

## Solution:

The original problem, presented previously, is broken into a couple of modules. The first module generates the ellipse data used in the fitting exercise.

ellipse.py

"""
Module to generate noisy data samples about an ellipse.
"""
from numpy import *
from numpy import linalg as LA
from random import uniform,normalvariate

def generate_noisy_pts(a,foci1,foci2,
num_pts=200, angles=None,
x_noise = None, y_noise=None):

'''
Generate points for an ellipse given
the foci, and
the distance to the intersection of the minor axis and ellipse.

Optionally,
the number of points can be specified,
the angles for the points wrt to the centroid of the ellipse, and
a noise offset for each point in the x and y axis.
'''
c = (1/2.0)*LA.norm(foci1-foci2)
b = sqrt(a**2-c**2)
x1 = foci1[0]
y1 = foci1[1]
x2 = foci2[0]
y2 = foci2[1]
if angles is None:
t = arange(0,2*pi,2*pi/float(num_pts))
else:
t = array(angles)

ellipse_x = (x1+x2)/2 +(x2-x1)/(2*c)*a*cos(t) - (y2-y1)/(2*c)*b*sin(t)
ellipse_y = (y1+y2)/2 +(y2-y1)/(2*c)*a*cos(t) + (x2-x1)/(2*c)*b*sin(t)
try:
# try adding noise to the ellipse points
ellipse_x = ellipse_x + x_noise
ellipse_y = ellipse_y + y_noise
except TypeError:
pass
return (ellipse_x,ellipse_y)

def generate_point_list(num_samples,a_ref,foci1,foci2,sigma=0.2):
# generate list of noisy samples on the ellipse
angles = [uniform(-pi,pi) for i in range(0,num_samples)]
sigma = 0.2
x_noise = [normalvariate(0,sigma) for t in angles]
y_noise = [normalvariate(0,sigma) for t in angles]
x_list,y_list = generate_noisy_pts(a_ref,foci1,foci2,
angles  = angles,
x_noise = x_noise,
y_noise = y_noise)
point_list = [array([x,y]) for x,y in zip(x_list,y_list)]
return point_list

if __name__=='__main__':
import matplotlib.pylab as plt

# define the foci locations
foci1 = array([2,-1])
foci2 = array([-2,1])
# define distance from foci to ellipse circumference
a_ref = 2.5

num_pts = 200
point_list = generate_point_list(num_pts,a_ref,foci1,foci2)

plt.figure()
x,y = zip(*point_list)
plt.plot(x,y,'.b')
plt.show()

The next two modules define the objective function in two different ways. In the first, the code calculates the objective function using scalar math. In the second, the objective function is calculated using numpy vector math.

The first module, objective_scalar.py, uses a class to define the objective function. This is done for several reasons. The first is that the class allows the parameters, the values which will not be changed by the optimization search, to be bound to the objective without a need to pass them through the optimizer. Additionally, the __init__ and __del__ methods in the class provide a convenient place to allocate special purpose services and hardware which may be used when evaluating the objective. To simplify the use of the class, the __call__ method is used so the instance of the class looks like a function to the optimizer.

objective_scalar.py

'''
This module contains an objective function and logic to determine the starting
point for the ellipse fitting problem.

This objective is coded using operations on scalar quantities.

'''

from numpy import *
from numpy import linalg as LA

class Objective(object):
def __init__(self,parameters):
# setup parameters which will be used later
self._p = parameters['point_list']
self._sigma = parameters.get('sigma',0.2)
self.x0 = self._find_x0()
self._ahat_max = self.x0[0]

def _find_x0(self):
'''
Determine the initial value, x0, for the optimization problem.
'''
points = array(self._p)
x_list,y_list = zip(*points)
# find x mean
x_mean = array(x_list).mean()
# find y mean
y_mean = array(y_list).mean()
# find point farthest away from mean
center = array([x_mean,y_mean])
distances = zeros((len(x_list),1))
for i,point in enumerate(points):
distances[i,0]=LA.norm(point-center)
ind = where(distances==distances.max())
max_pt = points[ind[0],:][0]
# find point between mean and max point
foci1 = (max_pt+center)/2.0
# find point opposite from
foci2 = 2*center - max_pt
return [distances.max(), foci1[0],foci1[1],foci2[0],foci2[1]]

def __call__(self,x):
'''
Calculate the objective cost in the optimization problem using scalar
equations which are summed.
'''

point_list = self._p

foci1 = array([x[1],x[2]])
foci2 = array([x[3],x[4]])
a     = x[0]
n = float(len(point_list))
lambda_   = 0.1
sigma    = self._sigma
ahat_max = self._ahat_max
total = 0
for point in point_list:
total += ((LA.norm(point-foci1,2)+LA.norm(point-foci2,2)-2*a)**2)/n
total += lambda_*ahat_max*sigma*exp((a/ahat_max)**4)

if __name__=='__main__':
import time
from random import uniform, normalvariate, seed

# local modules
import ellipse

####################################################################

# setup test conditions
num_reps = 100
num_pts = 256
precision = 'float'
seed(1234567890)      # set the random generator to get repeatable results

# setup the reference ellipse

# define the foci locations
foci1_ref = array([2,-1])
foci2_ref = array([-2,1])
# define distance from foci to ellipse circumference
a_ref = 2.5

point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)

parameters = { "point_list" : point_list }

# test the function on an NVIDIA card
t0 = time.time()
my_objective = Objective(parameters)
x0 = my_objective.x0
t1 = time.time()
for i in range(0,num_reps):
y  = my_objective(x0)
t2 = time.time()

print ''
print 'Initialization took %f sec' % (t1-t0)
print 'Execution took %f sec' % (t2-t1)
print 'Executed %i times.' % (num_reps)
print ''



The next module vectorizes the codes for faster execution. To simplify the coding, the vectorized objective inherits the interfaces and initialization from the scalar objective code module.

objective_vectorized.py

'''
This module contains an objective function for the ellipse fitting problem.
The objective is coded using vectorized operations.

'''

from numpy import *
from numpy import linalg as LA
import objective_scalar

class Objective(objective_scalar.Objective):

def __call__(self,x):
'''
Calculate the objective cost in the optimization problem using
vectorized equations.
'''

point_list = self._p
foci1 = array([x[1],x[2]])
foci2 = array([x[3],x[4]])
a     = x[0]
n = float(len(point_list))
_lambda = 0.1
sigma    = self._sigma
ahat_max = self._ahat_max

pt_diff1 = point_list - foci1
pt_diff2 = point_list - foci2

x_f1_diff = pt_diff1[:,0]
x_f2_diff = pt_diff2[:,0]
y_f1_diff = pt_diff1[:,1]
y_f2_diff = pt_diff2[:,1]

x_f1_diff_sq = power(x_f1_diff,2)
x_f2_diff_sq = power(x_f2_diff,2)
y_f1_diff_sq = power(y_f1_diff,2)
y_f2_diff_sq = power(y_f2_diff,2)

norm_pt_to_f1 = power(x_f1_diff_sq+y_f1_diff_sq,0.5)
norm_pt_to_f2 = power(x_f2_diff_sq+y_f2_diff_sq,0.5)

temp = power(norm_pt_to_f1+norm_pt_to_f2-2*a,2)/n
total = sum(temp)
total += _lambda*ahat_max*sigma*exp((a/ahat_max)**4)

if __name__=='__main__':
import time
from random import seed

# local modules
import ellipse

####################################################################

# setup test conditions
num_reps = 100
num_pts = 256
precision = 'float'
seed(1234567890)      # set the random generator to get repeatable results

# setup the reference ellipse

# define the foci locations
foci1_ref = array([2,-1])
foci2_ref = array([-2,1])
# define distance from foci to ellipse circumference
a_ref = 2.5

point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)

parameters = { "point_list" : point_list }

# test the function on an NVIDIA card
t0 = time.time()
my_objective = Objective(parameters)
x0 = my_objective.x0
t1 = time.time()
for i in range(0,num_reps):
y  = my_objective(x0)
t2 = time.time()

print ''
print 'Initialization took %f sec' % (t1-t0)
print 'Execution took %f sec' % (t2-t1)
print 'Executed %i times.' % (num_reps)
print ''


Executing objective_scalar.py and objective_vectorized.py will provide some sense of the difference in speed between the two approach. However, it is a good idea to measure the performance under different conditions. The next script, main.py, evaluates the implementation of the ellipse fit objective when different numbers of points are available for the fit. It also evaluates how the time to solve the ellipse fit problem changes between the scalar and the vectorized objectives when the number of points changes. To reduce the variation due to changes in the data points, the random seed is set to the same value each time points are generated.

main.py

'''
This module contains an objective function for the ellipse fitting problem.
The objective is coded using vectorized operations.

'''

__author__ = 'Ed Tate'
__email__  = 'edtate<at>gmail-dot-com'
__website__ = 'exnumerus.blogspot.com'

import time
from random import uniform, normalvariate, seed
from numpy import *
import pylab as plt
from openopt import NLP

# local modules
import ellipse
import objective_scalar
import objective_vectorized

####################################################################

# setup test conditions
num_reps = 100

seed(1234567890)      # set the random generator to get repeatable results

# setup the reference ellipse

# define the foci locations
foci1_ref = array([2,-1])
foci2_ref = array([-2,1])
# define distance from foci to ellipse circumference
a_ref = 2.5

######################################################
# measure the timing to setup and call the objective function

num_pts_list = [10,100,250,500,1000,2500,5000,10000,50000]

objective_functions = [objective_scalar,objective_vectorized]
setup_times = []
call_times  = []
for num_pts in num_pts_list:
seed(1234567890)      # set the random generator to get repeatable results
point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)

parameters = { "point_list" : point_list }

test_setup_times = []
test_call_times = []
for objective_function in objective_functions:
t0 = time.time()
my_objective = objective_function.Objective(parameters)
x0 = my_objective.x0
t1 = time.time()
for i in range(0,num_reps):
y  = my_objective(x0)
t2 = time.time()
test_setup_times.append(t1-t0)
test_call_times.append(t2-t1)

setup_times.append(test_setup_times)
call_times.append(test_call_times)
print 'Evaluated %i points' % num_pts

# plot the results
plt.figure()
plt.loglog(num_pts_list,call_times)
plt.legend(['scalar','vectorized'],loc='upper left')
plt.xlabel('Number of points')
plt.ylabel('Execution Time [sec]')
plt.title('Times for %i calls to objective function' % num_reps)
plt.grid(True)

#############################################################
# test the performance when used in an optimizer

num_pts_list = [10,100,250,500,1000,2500,5000]

objective_functions = [objective_scalar,objective_vectorized]
setup_times = []
call_times  = []
for num_pts in num_pts_list:
seed(1234567890)      # set the random generator to get repeatable results
point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)

parameters = { "point_list" : point_list }

test_setup_times = []
test_call_times = []
for objective_function in objective_functions:
t0 = time.time()
my_objective = objective_function.Objective(parameters)
x0 = my_objective.x0
p_vectorized = NLP(my_objective, x0, iprint = -1)
t1 = time.time()
opt_result = p_vectorized.solve('ralg')
t2 = time.time()
test_setup_times.append(t1-t0)
test_call_times.append(t2-t1)

print opt_result.xf

setup_times.append(test_setup_times)
call_times.append(test_call_times)
print 'Evaluated %i points' % num_pts
print

# plot the results
plt.figure()
plt.loglog(num_pts_list,call_times)
plt.legend(['scalar','vectorized'],loc='upper left')
plt.xlabel('Number of points')
plt.ylabel('Execution Time [sec]')
plt.title('Times to solve the ellipse fit problem')
plt.grid(True)

plt.show()

## Testing Results and Conclusions

To easily see the results which cover a large range of numbers, the results are plotted on a log-log graph. Comparing the scalar calculations against the vectorized calculations of the objective function shows almost an order of magnitude improvement in computing time.

When the vectorized objective is used in the ellipse fit problem, the performance improvement is almost an order of magnitude. For a small number of points, the overhead from the optimization routine is significant and reduces the improvement (as should be expected).