Default Function Library - The Auto-pilot Benchmarking Suite

Next: Replacements, Previous: Basic Transformations, Up: Getstats Internals

7.2 Default Function Library

The default function library can be broken down into several categories: manipulating global variables, time related functions, warning primitives, built-in warnings, the default transformations that each relation undergoes, predicate evaluation, and relation renaming.

Manipulating Global Variables

There are several functions that manipulate global replacement variables (which are stored in %globals).

set(var, val)
setexpr(var, expr)
seteval(var, expr): Sets the global replacement variable var to val. setexpr performs global replacement first, and seteval first runs Perl's eval function on expr.
unset(var): Is the equivalent of set("var", 0);
push(var, val)
pusheval(var, expr): Pushes "val" onto the global replacement variable "var", which must be a Perl array. With pusheval, "expr" is first passed through Perl's eval function.
pop(var): Pops an element from the global replacement variable var, which must be a perl array.

Time Related Functions

There are also several transformation functions for creating more useful output from times.

aggthreads: Aggregate multiple threads in an epoch into a single value.
procdiff: Transform the procdiff column from the total number of seconds of CPU time used into the percent of CPU used by non-measured processes.
savestats: Saves the current summary statistics objects in $globals{'stathashes'}, this is used for computing overheads.
unifycommand: Aggregate multiple commands in a single thread within an epoch into a single value. This is used for benchmarks like compilations that separately time several commands (e.g., configure and make).

Warning Primitives

There are three types of warning primitives. Warnings that should be evaluated for each row ("warnrow"), column ("warncol"), or each cell ("warnval").

warnrow(expression, output)

An example of a row warning is that the exit status of each test should be zero. If this is false, it means that the test failed, and that particular measurement is likely bad.

Assuming that the exist status is stored in a column named status, the following transformation warns when tests fail:

          warnrow("$status != 0",
          	"Failure for epoch $epoch, thread $thread, exit status = $status.")

epoch thread status
1 1 0
2 1 0
3 1 127
4 1 0
5 1 0

Relation 7.15

When run on Relation 7.15, the output is:

          Failure for epoch 3, thread 1, exit status = 127.

warncol(expression, output)

Column warnings are evaluated once for each column in a relation. They have the same arguments as a row warning—the expression, and the warning text. The expression undergoes column replacement so summary statistics like the mean are available, and is then passed to Perl's eval.

An example column warning would be if the half-width of a confidence interval is greater than 10% of the mean:

          warncol("$delta > $mean * 0.05", "$name has a half-width of $delta.")

warnval(expression, output)

Value warnings combine row and column warnings. The expression is run on each value of each row, and if it is true the warning text is printed on stderr. Row, column, and value replacement is performed. This is how Getstats implements z-score checking for each value.

Built-in Warnings

The following transformations use the previously described warning primitives to raise warnings if $warn is set (which it is by default). You can turn these warnings off by setting $warn to 0.

exitfail: Warn if any test failed.
negio: Warn if wait time is negative (more CPU time was used than elapsed time).
otherexec: Warn if something else was executing (non-measured processes used more than 5% of the CPU time)
You can control the percentage by setting the global variable otherexec-thresh.
warnregress: Warn if the best fit is not within 5% of a horizontal line. For example, if there is a memory leak, then things will get consistently slower.
You can control the slope by setting the global variable regress-thresh.
zscore: Warn if the zscore of any value is greater than
You can control the zscore by setting the global variable zscore-thresh.

Default Transformations

The following functions define the four default passes that getstats performs over the relations. Any transformations specified on the command line are executed after readpass, but before warnpass.

readpass: If the elapsed column exists, then warn about failed executions, unify commands, aggregate threads, calculate wait time, CPU utilization, and update the pdiff column.
warnpass: Warn if there another process used significant CPU time, any tests have negative IO time or a high z-score, or if any quantity has a high linear regression slope. Remove excess columns, and reorder and rename the "elapsed", "sys", "user", "io" and "cpu" to "Elapsed", "System", "User", "Wait", and "CPU%".
ohpass: Save statistics and baseline for computing overheads.
summary: Create the summary table.

Analyzing Data

predicate(expr)

This is used for predicate evaluation from within Auto-pilot. After a certain number of epochs, Auto-pilot optionally runs an external script to decide whether it should keep on going. If the script exits with a zero status code, then Auto-Pilot stops the test. If the script exits with a failure (non-zero) code, then Auto-Pilot continues the test.

Predicate takes a single argument, which is the predicate to evaluate over each column of data. If the predicate is not true for any of the columns, Getstats dies. Auto-pilot picks up on this failure and continues to run the tests.

twosamplet

Each input relation is statistically compared with the baseline file (the first file specified). This option prints a confidence interval for the difference between each column's mean in the current file and the baseline.

It also performs a two-sample t-test with a null hypotheses of "current mean - baseline mean <= twosampledelta", "current mean - baseline mean >= twosampledelta", and "current mean - baseline mean = twosampledelta". The p-value (the probability of observing this data, assuming the null hypothesis is true). If the p-value is small, then you can reject the null hypothesis. "REJECT" or "ACCEPT" is also printed, based on the confidence level you have specified (95% by default).

The following example compares a one threaded Postmark run (Sample 1) against a two threaded postmark run (the baseline is Sample 2).

          Comparing samples/ext2:2.res (Sample 1) to samples/ext2:1.res (Sample 2).
          Elapsed: 95%CI for samples/ext2:2.res - samples/ext2:1.res = (7.945, 8.029)
          H_0: u1 - u2 <= 0.000  H_a: u1 - u2 >  0.000  P = 1.000  ACCEPT H_0
          H_0: u1 - u2 >= 0.000  H_a: u1 - u2 <  0.000  P = 0.000  REJECT H_0
          H_0: u1 - u2 == 0.000  H_a: u1 - u2 != 0.000  P = 0.000  REJECT H_0

As we can see, we must accept the assumption that two threads takes less time than one thread. We can reject the assumption that two threads takes longer than one thread, and the assumption that they take the same amount of time. Remember that Getstats runs all of these tests, but you need to choose which assumption makes sense for your case. For example, if you have code that should improve performance, you can't make the assumption that it does improve performance. Instead, you must make the assumption that it does not improve performance, and either accept or reject that conclusion.

Several global variables control the ttest. ttestcolumns is a comma separate list of columns to compare. twosampledelta controls the delta for the t-test, by default the delta is zero. This is used to show that two samples are different by more than some value. rejectonly causes only rejected hypothesis to be displayed, this can greatly reduce the amount of output when comparing mostly similar samples. verbosettest prints the relation names instead of u1 and u2. confidencelevel controls when to reject or accept tests (and the width of the confidence interval). precision controls how many decimal places are printed.

pairwiset

This function is identical to twosamplet, but each sample is compared with every other sample (i.e., sample 1 is compared with sample 2–n, sample 2 is compared with sample 3–n, ..., etc.)

Relation Renaming

rename_relation(expr): By default, each relation is named after its input file. This transformation allows you to rename relations to more easily readable names. First, expr undergoes global replacement (the most useful global replacement variable for this function is $file, which is the relations current name). Next, the resulting expression is evaluated using Perl's eval function. Finally, the relation is renamed to the result of this evaluation.
basename: Basename renames the relations using File::basename, which strips leading directories and any file extension. For example, "samples/ext2:1.res" is renamed to "ext2:1". This can make the output easier to read.