Problem Statement:
The optimization problem to fit an ellipse is too slow.
Discussion:
The easiest thing to improve performance in a script is to try compiling it using a tool like psyco or py2exe. Beyond that, the time critical portions of the code can be written in C/C++ and embedded into the script using a library like scipy.weave or a tool like Cython.
If a Python script has elements which are naturally parallelizable, then there are additional approaches to improving the script’s performance. One approach is to break the code up and execute it among several CPU cores using a library like the multiprocessing library or parallel python. Another approach to try taking advantage of the CPU and GPU hardware using PyOpenCL. Finally, the script can be vectorized, where all of the equations are expressed as vector operations.
Since the objective in the Ellipse fitting problem is easily decomposed into a set of parallel equations, the solution considered here is the use of numpy and vectorization of the code.
Solution:
The original problem, presented previously, is broken into a couple of modules. The first module generates the ellipse data used in the fitting exercise.
ellipse.py
""" Module to generate noisy data samples about an ellipse. """ from numpy import * from numpy import linalg as LA from random import uniform,normalvariate def generate_noisy_pts(a,foci1,foci2, num_pts=200, angles=None, x_noise = None, y_noise=None): ''' Generate points for an ellipse given the foci, and the distance to the intersection of the minor axis and ellipse. Optionally, the number of points can be specified, the angles for the points wrt to the centroid of the ellipse, and a noise offset for each point in the x and y axis. ''' c = (1/2.0)*LA.norm(foci1-foci2) b = sqrt(a**2-c**2) x1 = foci1[0] y1 = foci1[1] x2 = foci2[0] y2 = foci2[1] if angles is None: t = arange(0,2*pi,2*pi/float(num_pts)) else: t = array(angles) ellipse_x = (x1+x2)/2 +(x2-x1)/(2*c)*a*cos(t) - (y2-y1)/(2*c)*b*sin(t) ellipse_y = (y1+y2)/2 +(y2-y1)/(2*c)*a*cos(t) + (x2-x1)/(2*c)*b*sin(t) try: # try adding noise to the ellipse points ellipse_x = ellipse_x + x_noise ellipse_y = ellipse_y + y_noise except TypeError: pass return (ellipse_x,ellipse_y) def generate_point_list(num_samples,a_ref,foci1,foci2,sigma=0.2): # generate list of noisy samples on the ellipse angles = [uniform(-pi,pi) for i in range(0,num_samples)] sigma = 0.2 x_noise = [normalvariate(0,sigma) for t in angles] y_noise = [normalvariate(0,sigma) for t in angles] x_list,y_list = generate_noisy_pts(a_ref,foci1,foci2, angles = angles, x_noise = x_noise, y_noise = y_noise) point_list = [array([x,y]) for x,y in zip(x_list,y_list)] return point_list if __name__=='__main__': import matplotlib.pylab as plt # define the foci locations foci1 = array([2,-1]) foci2 = array([-2,1]) # define distance from foci to ellipse circumference a_ref = 2.5 num_pts = 200 point_list = generate_point_list(num_pts,a_ref,foci1,foci2) plt.figure() x,y = zip(*point_list) plt.plot(x,y,'.b') plt.show()
The next two modules define the objective function in two different ways. In the first, the code calculates the objective function using scalar math. In the second, the objective function is calculated using numpy vector math.
The first module, objective_scalar.py, uses a class to define the objective function. This is done for several reasons. The first is that the class allows the parameters, the values which will not be changed by the optimization search, to be bound to the objective without a need to pass them through the optimizer. Additionally, the __init__ and __del__ methods in the class provide a convenient place to allocate special purpose services and hardware which may be used when evaluating the objective. To simplify the use of the class, the __call__ method is used so the instance of the class looks like a function to the optimizer.
objective_scalar.py
''' This module contains an objective function and logic to determine the starting point for the ellipse fitting problem. This objective is coded using operations on scalar quantities. ''' from numpy import * from numpy import linalg as LA class Objective(object): def __init__(self,parameters): # setup parameters which will be used later self._p = parameters['point_list'] self._sigma = parameters.get('sigma',0.2) self.x0 = self._find_x0() self._ahat_max = self.x0[0] def _find_x0(self): ''' Determine the initial value, x0, for the optimization problem. ''' points = array(self._p) x_list,y_list = zip(*points) # find x mean x_mean = array(x_list).mean() # find y mean y_mean = array(y_list).mean() # find point farthest away from mean center = array([x_mean,y_mean]) distances = zeros((len(x_list),1)) for i,point in enumerate(points): distances[i,0]=LA.norm(point-center) ind = where(distances==distances.max()) max_pt = points[ind[0],:][0] # find point between mean and max point foci1 = (max_pt+center)/2.0 # find point opposite from foci2 = 2*center - max_pt return [distances.max(), foci1[0],foci1[1],foci2[0],foci2[1]] def __call__(self,x): ''' Calculate the objective cost in the optimization problem using scalar equations which are summed. ''' point_list = self._p foci1 = array([x[1],x[2]]) foci2 = array([x[3],x[4]]) a = x[0] n = float(len(point_list)) lambda_ = 0.1 sigma = self._sigma ahat_max = self._ahat_max total = 0 for point in point_list: total += ((LA.norm(point-foci1,2)+LA.norm(point-foci2,2)-2*a)**2)/n total += lambda_*ahat_max*sigma*exp((a/ahat_max)**4) return total if __name__=='__main__': import time from random import uniform, normalvariate, seed # local modules import ellipse #################################################################### # setup test conditions num_reps = 100 num_pts = 256 precision = 'float' seed(1234567890) # set the random generator to get repeatable results # setup the reference ellipse # define the foci locations foci1_ref = array([2,-1]) foci2_ref = array([-2,1]) # define distance from foci to ellipse circumference a_ref = 2.5 point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref) parameters = { "point_list" : point_list } # test the function on an NVIDIA card t0 = time.time() my_objective = Objective(parameters) x0 = my_objective.x0 t1 = time.time() for i in range(0,num_reps): y = my_objective(x0) t2 = time.time() print '' print 'Initialization took %f sec' % (t1-t0) print 'Execution took %f sec' % (t2-t1) print 'Executed %i times.' % (num_reps) print ''
The next module vectorizes the codes for faster execution. To simplify the coding, the vectorized objective inherits the interfaces and initialization from the scalar objective code module.
objective_vectorized.py
''' This module contains an objective function for the ellipse fitting problem. The objective is coded using vectorized operations. ''' from numpy import * from numpy import linalg as LA import objective_scalar class Objective(objective_scalar.Objective): def __call__(self,x): ''' Calculate the objective cost in the optimization problem using vectorized equations. ''' point_list = self._p foci1 = array([x[1],x[2]]) foci2 = array([x[3],x[4]]) a = x[0] n = float(len(point_list)) _lambda = 0.1 sigma = self._sigma ahat_max = self._ahat_max pt_diff1 = point_list - foci1 pt_diff2 = point_list - foci2 x_f1_diff = pt_diff1[:,0] x_f2_diff = pt_diff2[:,0] y_f1_diff = pt_diff1[:,1] y_f2_diff = pt_diff2[:,1] x_f1_diff_sq = power(x_f1_diff,2) x_f2_diff_sq = power(x_f2_diff,2) y_f1_diff_sq = power(y_f1_diff,2) y_f2_diff_sq = power(y_f2_diff,2) norm_pt_to_f1 = power(x_f1_diff_sq+y_f1_diff_sq,0.5) norm_pt_to_f2 = power(x_f2_diff_sq+y_f2_diff_sq,0.5) temp = power(norm_pt_to_f1+norm_pt_to_f2-2*a,2)/n total = sum(temp) total += _lambda*ahat_max*sigma*exp((a/ahat_max)**4) return total if __name__=='__main__': import time from random import seed # local modules import ellipse #################################################################### # setup test conditions num_reps = 100 num_pts = 256 precision = 'float' seed(1234567890) # set the random generator to get repeatable results # setup the reference ellipse # define the foci locations foci1_ref = array([2,-1]) foci2_ref = array([-2,1]) # define distance from foci to ellipse circumference a_ref = 2.5 point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref) parameters = { "point_list" : point_list } # test the function on an NVIDIA card t0 = time.time() my_objective = Objective(parameters) x0 = my_objective.x0 t1 = time.time() for i in range(0,num_reps): y = my_objective(x0) t2 = time.time() print '' print 'Initialization took %f sec' % (t1-t0) print 'Execution took %f sec' % (t2-t1) print 'Executed %i times.' % (num_reps) print ''
Executing objective_scalar.py and objective_vectorized.py will provide some sense of the difference in speed between the two approach. However, it is a good idea to measure the performance under different conditions. The next script, main.py, evaluates the implementation of the ellipse fit objective when different numbers of points are available for the fit. It also evaluates how the time to solve the ellipse fit problem changes between the scalar and the vectorized objectives when the number of points changes. To reduce the variation due to changes in the data points, the random seed is set to the same value each time points are generated.
main.py
''' This module contains an objective function for the ellipse fitting problem. The objective is coded using vectorized operations. ''' __author__ = 'Ed Tate' __email__ = 'edtate<at>gmail-dot-com' __website__ = 'exnumerus.blogspot.com' __license__ = 'Creative Commons Attribute By - http://creativecommons.org/licenses/by/3.0/us/''' import time from random import uniform, normalvariate, seed from numpy import * import pylab as plt from openopt import NLP # local modules import ellipse import objective_scalar import objective_vectorized #################################################################### # setup test conditions num_reps = 100 seed(1234567890) # set the random generator to get repeatable results # setup the reference ellipse # define the foci locations foci1_ref = array([2,-1]) foci2_ref = array([-2,1]) # define distance from foci to ellipse circumference a_ref = 2.5 ###################################################### # measure the timing to setup and call the objective function num_pts_list = [10,100,250,500,1000,2500,5000,10000,50000] objective_functions = [objective_scalar,objective_vectorized] setup_times = [] call_times = [] for num_pts in num_pts_list: seed(1234567890) # set the random generator to get repeatable results point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref) parameters = { "point_list" : point_list } test_setup_times = [] test_call_times = [] for objective_function in objective_functions: t0 = time.time() my_objective = objective_function.Objective(parameters) x0 = my_objective.x0 t1 = time.time() for i in range(0,num_reps): y = my_objective(x0) t2 = time.time() test_setup_times.append(t1-t0) test_call_times.append(t2-t1) setup_times.append(test_setup_times) call_times.append(test_call_times) print 'Evaluated %i points' % num_pts # plot the results plt.figure() plt.loglog(num_pts_list,call_times) plt.legend(['scalar','vectorized'],loc='upper left') plt.xlabel('Number of points') plt.ylabel('Execution Time [sec]') plt.title('Times for %i calls to objective function' % num_reps) plt.grid(True) ############################################################# # test the performance when used in an optimizer num_pts_list = [10,100,250,500,1000,2500,5000] objective_functions = [objective_scalar,objective_vectorized] setup_times = [] call_times = [] for num_pts in num_pts_list: seed(1234567890) # set the random generator to get repeatable results point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref) parameters = { "point_list" : point_list } test_setup_times = [] test_call_times = [] for objective_function in objective_functions: t0 = time.time() my_objective = objective_function.Objective(parameters) x0 = my_objective.x0 p_vectorized = NLP(my_objective, x0, iprint = -1) t1 = time.time() opt_result = p_vectorized.solve('ralg') t2 = time.time() test_setup_times.append(t1-t0) test_call_times.append(t2-t1) print opt_result.xf setup_times.append(test_setup_times) call_times.append(test_call_times) print 'Evaluated %i points' % num_pts print # plot the results plt.figure() plt.loglog(num_pts_list,call_times) plt.legend(['scalar','vectorized'],loc='upper left') plt.xlabel('Number of points') plt.ylabel('Execution Time [sec]') plt.title('Times to solve the ellipse fit problem') plt.grid(True) plt.show()
Testing Results and Conclusions
To easily see the results which cover a large range of numbers, the results are plotted on a log-log graph. Comparing the scalar calculations against the vectorized calculations of the objective function shows almost an order of magnitude improvement in computing time.
When the vectorized objective is used in the ellipse fit problem, the performance improvement is almost an order of magnitude. For a small number of points, the overhead from the optimization routine is significant and reduces the improvement (as should be expected).
Test Conditions:
- Python 2.6.6 (PythonXY 2.6.6.0)
- win7 with Intel i7 CPU
No comments:
Post a Comment