Jon and I were discussing the minnow scheduler and I suggested to him that it might be possible to use GNU Octave to run an automated analysis of his code and determine how well it will scale as the number of messages sent increases. The general notion is to use application run times to try and determine if the application has exponential, linear, polynomial, or logarithmic complexity. We can characterize the runtime of a function using regression analysis to generate a polynomial that represents our data and comparing the properties of the generated function to the known algorithm complexities we are concerned about. Specifically, we will be looking at the acceleration (2nd derivative) and jerk (3rd derivative) of our function. Positive acceleration means that the each unit of work performed takes longer to execute than the previous unit took. Positive jerk means that the acceleration is increasing, which is very bad. The characteristics are as follows:
If the function generated by Octave for us presents any of the above characteristics we can broadly characterize the complexity of our application. To that end, we have created an application which runs the application to be tested through a number of different inputs. It takes the timing result from each input and uses Octave to create and analyze a trend line for our timing data.
Exponential Test:
jason@ubuntu:~/Programming$ ./run_timing ./two_to_the_n 20 25 26 27 28 29 30
20: 36437
25: 507890
26: 807668
27: 1584727
28: 3093271
29: 6157129
30: 12325504
Early in analysis:
Time To Execute: 777300
Velocity: -442890
Acceleration: 422510
Jerk: 396000
Your run time is accelerating, this is bad, indicating polynomial complexity.
Your run time has positive jerk, this means that your acceleration is increasing, indicating possible exponential complexity (very bad).
-------------------
Later in analysis:
Time To Execute: 6.6098e+06
Velocity: 4.4151e+06
Acceleration: 2.0065e+06 Jerk: 396000
Your run time is accelerating, this is bad, indicating polynomial complexity.
Your run time has positive jerk, this means that your acceleration is increasing, indicating possible exponential complexity (very bad).
Polynomial Test:
jason@ubuntu:~/Programming$ ./run_timing ./n_squared 10000 11000 12000 15000 20000 22000 23000 25000
10000: 1253335
11000: 1401532
12000: 1740649
15000: 2596077
20000: 4602049
22000: 5576732
23000: 6087936
25000: 7166800
Early in analysis:
Time To Execute: 1.4535e+06
Velocity: 231.89
Acceleration: 0.0285
Jerk: -7.0737e-07
Your run time is accelerating, this is bad, indicating polynomial complexity.
Your acceleration is neither increasing nor decreasing.
-------------------
Later in analysis:
Time To Execute: 6.0845e+06
Velocity: 522.96
Acceleration: 0.020012
Jerk: -7.0737e-07
Your run time is accelerating, this is bad, indicating polynomial complexity.
Your acceleration is neither increasing nor decreasing.
Linear Test:
jason@ubuntu:~/Programming$ ./run_timing ./n 400000 500000 600000 700000 800000
400000: 4672616
500000: 5808603
600000: 6897412
700000: 8039642
800000: 9187032
warning: dgelsd: rank deficient 5x4 matrix, rank = 3
Early in analysis:
Time To Execute: 5.7938e+06
Velocity: 11.133
Acceleration: 5.5004e-08
Jerk: 1.123e-11
Your run time has no acceleration, indicating linear complexity
Your acceleration is neither increasing nor decreasing.
-------------------
Later in analysis:
Time To Execute: 8.0366e+06
Velocity: 11.369
Acceleration: 2.301e-06
Jerk: 1.123e-11
Your run time has no acceleration, indicating linear complexity
Your acceleration is neither increasing nor decreasing.
Unfortunately the logarithmic data tends to appear linear to my tests. My math, algorithm analysis skills, and Octave skills are bad enough that someone who understands this better may be able to solve that problem. Iām sure there are improvements to be had all around. In general the process likes lower values of N and longer run times.
The source code of the analysis application:
#include <vector>
#include <string>
#include <stdlib.h>
#include <sstream>
#include <iostream>
#include <sys/time.h>
struct stats
{
double position;
double velocity;
double acceleration;
double jerk;
};
template<typename InItr>
std::vector<int> parse_counts(InItr begin, InItr end)
{
std::vector<int> vals;
while (begin != end)
{
vals.push_back(atoi(*begin));
++begin;
}
return vals;
}
int64_t timediff(timeval left, timeval right)
{
return (((left.tv_sec - right.tv_sec) * 1000000) + (left.tv_usec - right.tv_usec));
}
int64_t time(const std::string &command, int count)
{
timeval begin;
timeval end;
std::stringstream ss;
ss << command << " " << count;
std::string call(ss.str());
gettimeofday(&begin, 0);
system(call.c_str());
gettimeofday(&end, 0);
return timediff(end, begin);
}
std::pair<stats,stats> calculate_stats(const std::vector<std::pair<int, int> > &timings)
{
std::stringstream ss;
ss << "octave --eval \"x=[";
std::stringstream xs;
std::stringstream ys;
for (std::vector<std::pair<int, int> >::const_iterator itr = timings.begin();
itr != timings.end();
++itr)
{
xs << itr->first << ",";
ys << itr->second << ",";
}
int secondtolast = timings[timings.size()-2].first;
ss << xs.str();
ss << "]; y=[" << ys.str() << "]; f = polyfit(x,y,3); v = polyderiv(f); a = polyderiv(v); j = polyderiv(a); disp(polyval(f,"<< timings[1].first << ")); disp(polyval(v," << timings[1].first << ")); disp(polyval(a," << timings[1].first << ")); disp(polyval(j," << timings[1].first << ")); disp(polyval(f," << secondtolast << ")); disp(polyval(v," << secondtolast << ")); disp(polyval(a," << secondtolast << ")); disp(polyval(j," << secondtolast << ")); \" --silent";
FILE *p = popen(ss.str().c_str(), "r");
std::stringstream ret;
while (!feof(p) && !ferror(p))
{
char buf[256];
int r = fread(buf, 1, sizeof(buf), p);
ret << std::string(buf, r);
}
std::pair<stats,stats> s;
ret >> s.first.position;
ret >> s.first.velocity;
ret >> s.first.acceleration;
ret >> s.first.jerk;
ret >> s.second.position;
ret >> s.second.velocity;
ret >> s.second.acceleration;
ret >> s.second.jerk;
return s;
}
void analyze(const stats &s)
{
std::cout << "Time To Execute: " << s.position << std::endl;
std::cout << "Velocity: " << s.velocity << std::endl;
std::cout << "Acceleration: " << s.acceleration << std::endl;
std::cout << "Jerk: " << s.jerk << std::endl;
if (s.acceleration > .001)
{
std::cout << "Your run time is accelerating, this is bad, indicating polynomial complexity." << std::endl;
} else if (s.acceleration < -.001) {
std::cout << "Your run time is decelerating, good, the algorithm is becoming more effecient as the number of units is increased." << std::endl;
} else {
std::cout << "Your run time has no acceleration, indicating linear complexity" << std::endl;
}
if (s.jerk > .001)
{
std::cout << "Your run time has positive jerk, this means that your acceleration is increasing, indicating possible exponential complexity (very bad)." << std::endl;
} else if (s.jerk < -.001) {
std::cout << "Your run time has negative jerk, this means that your acceleration is decreasing, indicating possible logarithmic complexity." << std::endl;
} else {
std::cout << "Your acceleration is neither increasing nor decreasing." << std::endl;
}
}
int main(int argc, char *argv[])
{
std::vector<int> vals = parse_counts(&argv[2], &argv[argc]);
std::vector<std::pair<int, int> > timings;
for (std::vector<int>::const_iterator itr = vals.begin();
itr != vals.end();
++itr)
{
timings.push_back(std::make_pair(*itr, time(argv[1], *itr)));
std::cout << timings.back().first << ": " << timings.back().second << std::endl;
}
std::pair<stats, stats> s = calculate_stats(timings);
std::cout << "Early in analysis: " << std::endl;
analyze(s.first);
std::cout << "-------------------" << std::endl;
std::cout << "Later in analysis: " << std::endl;
analyze(s.second);
}