Jon and I were discussing the minnow scheduler and I suggested to him that it might be possible to use GNU Octave to run an automated analysis of his code and determine how well it will scale as the number of messages sent increases. The general notion is to use application run times to try and determine if the application has exponential, linear, polynomial, or logarithmic complexity. We can characterize the runtime of a function using regression analysis to generate a polynomial that represents our data and comparing the properties of the generated function to the known algorithm complexities we are concerned about. Specifically, we will be looking at the acceleration (2nd derivative) and jerk (3rd derivative) of our function. Positive acceleration means that the each unit of work performed takes longer to execute than the previous unit took. Positive jerk means that the acceleration is increasing, which is very bad. The characteristics are as follows:

  1. Exponential ([math]2^n[/math]): Positive Acceleration ([math]2^n \log(2)^2[/math]), Positive Jerk ([math]\log(2)^3 2^n[/math])
  2. Polynomial ([math]n^2[/math]): Positive Acceleration (2), No Jerk
  3. Logarithmic ([math]n \log n[/math]): Low Positive Acceleration ([math]\frac{1}{n}[/math]), Low Negative Jerk ([math]\frac{-1}{n^2}[/math])
  4. Linear (n): No Acceleration, No Jerk

If the function generated by Octave for us presents any of the above characteristics we can broadly characterize the complexity of our application. To that end, we have created an application which runs the application to be tested through a number of different inputs. It takes the timing result from each input and uses Octave to create and analyze a trend line for our timing data.

Exponential Test:

jason@ubuntu:~/Programming$ ./run_timing ./two_to_the_n 20 25 26 27 28 29 30 
20: 36437 
25: 507890 
26: 807668 
27: 1584727 
28: 3093271 
29: 6157129 
30: 12325504 
Early in analysis:  
Time To Execute: 777300 
Velocity: -442890 
Acceleration: 422510 
Jerk: 396000 
Your run time is accelerating, this is bad, indicating polynomial complexity. 
Your run time has positive jerk, this means that your acceleration is increasing, indicating possible exponential complexity (very bad). 
------------------- 
Later in analysis:  
Time To Execute: 6.6098e+06 
Velocity: 4.4151e+06 
Acceleration: 2.0065e+06 Jerk: 396000 
Your run time is accelerating, this is bad, indicating polynomial complexity. 
Your run time has positive jerk, this means that your acceleration is increasing, indicating possible exponential complexity (very bad).

Polynomial Test:

jason@ubuntu:~/Programming$ ./run_timing ./n_squared 10000 11000 12000 15000 20000 22000 23000 25000 
10000: 1253335 
11000: 1401532 
12000: 1740649 
15000: 2596077 
20000: 4602049 
22000: 5576732 
23000: 6087936 
25000: 7166800 
Early in analysis:  
Time To Execute: 1.4535e+06 
Velocity: 231.89 
Acceleration: 0.0285 
Jerk: -7.0737e-07 
Your run time is accelerating, this is bad, indicating polynomial complexity. 
Your acceleration is neither increasing nor decreasing. 
------------------- 
Later in analysis:  
Time To Execute: 6.0845e+06 
Velocity: 522.96 
Acceleration: 0.020012 
Jerk: -7.0737e-07 
Your run time is accelerating, this is bad, indicating polynomial complexity. 
Your acceleration is neither increasing nor decreasing.

Linear Test:

jason@ubuntu:~/Programming$ ./run_timing ./n 400000 500000 600000 700000 800000 
400000: 4672616 
500000: 5808603 
600000: 6897412 
700000: 8039642 
800000: 9187032 
warning: dgelsd: rank deficient 5x4 matrix, rank = 3 
Early in analysis:  
Time To Execute: 5.7938e+06 
Velocity: 11.133 
Acceleration: 5.5004e-08 
Jerk: 1.123e-11 
Your run time has no acceleration, indicating linear complexity 
Your acceleration is neither increasing nor decreasing. 
------------------- 
Later in analysis:  
Time To Execute: 8.0366e+06 
Velocity: 11.369 
Acceleration: 2.301e-06 
Jerk: 1.123e-11 
Your run time has no acceleration, indicating linear complexity 
Your acceleration is neither increasing nor decreasing.

Unfortunately the logarithmic data tends to appear linear to my tests. My math, algorithm analysis skills, and Octave skills are bad enough that someone who understands this better may be able to solve that problem. Iā€™m sure there are improvements to be had all around. In general the process likes lower values of N and longer run times.

The source code of the analysis application:

#include <vector>
#include <string>
#include <stdlib.h>
#include <sstream>
#include <iostream>
#include <sys/time.h>
struct stats
{
  double position;
  double velocity;
  double acceleration;
  double jerk;
};

template<typename InItr>
std::vector<int> parse_counts(InItr begin, InItr end)
{
  std::vector<int> vals;

  while (begin != end)
  {
    vals.push_back(atoi(*begin));
    ++begin;
  }

  return vals;
}

int64_t timediff(timeval left, timeval right)
{
  return (((left.tv_sec - right.tv_sec) * 1000000) + (left.tv_usec - right.tv_usec));
  
}

int64_t time(const std::string &command, int count)
{
  timeval begin;
  timeval end;
  std::stringstream ss;
  ss << command << " " << count;
  std::string call(ss.str());
  
  gettimeofday(&begin, 0);
  system(call.c_str());
  gettimeofday(&end, 0);

  return timediff(end, begin);
}

std::pair<stats,stats> calculate_stats(const std::vector<std::pair<int, int> > &timings)
{
  std::stringstream ss;
  ss << "octave --eval \"x=[";
  std::stringstream xs;
  std::stringstream ys;
  
  for (std::vector<std::pair<int, int> >::const_iterator itr = timings.begin();
       itr != timings.end();
       ++itr)
  {
    xs << itr->first << ",";
    ys << itr->second << ",";
  }

  int secondtolast = timings[timings.size()-2].first;

  ss << xs.str();
  ss << "]; y=[" << ys.str() << "]; f = polyfit(x,y,3); v = polyderiv(f); a = polyderiv(v); j = polyderiv(a); disp(polyval(f,"<< timings[1].first << ")); disp(polyval(v," << timings[1].first << ")); disp(polyval(a," << timings[1].first << ")); disp(polyval(j," << timings[1].first << ")); disp(polyval(f," << secondtolast << ")); disp(polyval(v," << secondtolast << ")); disp(polyval(a," << secondtolast << ")); disp(polyval(j," << secondtolast << ")); \" --silent";

  FILE *p = popen(ss.str().c_str(), "r");
  std::stringstream ret;
  while (!feof(p) && !ferror(p))
  {
    char buf[256];
    int r = fread(buf, 1, sizeof(buf), p);
    ret << std::string(buf, r);
  }  

  std::pair<stats,stats> s;
  
  ret >> s.first.position;
  ret >> s.first.velocity;
  ret >> s.first.acceleration;
  ret >> s.first.jerk;

  ret >> s.second.position;
  ret >> s.second.velocity;
  ret >> s.second.acceleration;
  ret >> s.second.jerk;

  return s;
}

void analyze(const stats &s)
{
  std::cout << "Time To Execute: " << s.position << std::endl;
  std::cout << "Velocity: " << s.velocity << std::endl;
  std::cout << "Acceleration: " << s.acceleration << std::endl;
  std::cout << "Jerk: " << s.jerk << std::endl;

  if (s.acceleration > .001)
  {
    std::cout << "Your run time is accelerating, this is bad, indicating polynomial complexity." << std::endl;
  } else if (s.acceleration < -.001) {
    std::cout << "Your run time is decelerating, good, the algorithm is becoming more effecient as the number of units is increased." << std::endl;
  } else {
    std::cout << "Your run time has no acceleration, indicating linear complexity" << std::endl;
  }

  if (s.jerk > .001)
  {
    std::cout << "Your run time has positive jerk, this means that your acceleration is increasing, indicating possible exponential complexity (very bad)." << std::endl;
  } else if (s.jerk < -.001) {
    std::cout << "Your run time has negative jerk, this means that your acceleration is decreasing, indicating possible logarithmic complexity." << std::endl;
  } else {
    std::cout << "Your acceleration is neither increasing nor decreasing." << std::endl;
  }
       
}

int main(int argc, char *argv[])
{
  std::vector<int> vals = parse_counts(&argv[2], &argv[argc]);

  std::vector<std::pair<int, int> > timings;

  for (std::vector<int>::const_iterator itr = vals.begin();
       itr != vals.end();
       ++itr)
  {
    timings.push_back(std::make_pair(*itr, time(argv[1], *itr)));
    std::cout << timings.back().first << ": " << timings.back().second << std::endl;
  }

  std::pair<stats, stats> s = calculate_stats(timings);

  std::cout << "Early in analysis: " << std::endl;
  analyze(s.first);

  std::cout << "-------------------" << std::endl;
  std::cout << "Later in analysis: " << std::endl;
  analyze(s.second);
}