NPS Universal Bench Test
When IBM handed me the test-drive keys of a Base+2 platform, I agreed to couch my assessment in terms of my own domain. Phrases like “much faster” are too vague. But I could say – “Compared to what I’ve seen, it’s (adjective)-faster.”
Why would this matter?
Expectations. Nobody wants to get their hopes up — only to have them dashed on the shores of Integration Island and subsumed by the green foam of the Sea of Inefficiency.
Whoa, where did that come from?
Anyhow – IBM saw this as a purely subjective endeavor. How could we say “how much faster” for a given customer’s queries? Everybody uses the machine in such wildly different ways.
But as an appliance, it’s good to get a baseline for how much faster the new machine is compared to prior machines – that is – machine-to-machine. Query efficiency, or lack thereof, is another matter entirely. And we still can’t tell folks they’ll get blazing speed if their queries are too inefficient to take advantage of it.
When we configured our first NPS on the Cloud at AWS, the AWS configurator lets us choose from various tiers of disk drive speeds. We moved from the old system to the new, and on paper we should’ve been 3x or more faster, not slower. It’s an appliance, easily measured.
Or so I always told people.
Why does this matter? Invariably when we move a client from one system to another – and it runs slower – we must scramble to make sure we did everything right. It’s EASY to blame the hardware, but in my experience, it’s never been the hardware.
I’ve shared one case where the client upgraded to a system almost 3x more hardware than the original, but the system ran half the speed. Because it was a refurbished machine, they convinced IBM the problem was in the hard drives, and so IBM swapped out every hard drive on the system. Imagine changing 96 drives every day for two weeks.
But the problem remained.
The problem was: This being a national retailer and most of their queries were on “transaction date”. Plus the data arrived in a natural order that kept zone maps healthy. When they moved from the smaller to larger device, it jumbled the zone maps and everything ran slower. When we optimized the tables, power was restored and it ran as expected, with 3x the power of the prior machine.
But the problem wasn’t the hardware.
My new bench test focused on hardware alone, regardless of the client’s tables or utilization. And for the first time ever, the new target hardware (this time on AWS) was actually, physically slower than the on-prem Striper. How could this be possible? And for that matter, how could we ever find such a problem if all we had were functional queries using the client’s data structures? The answer is “never”.
As you will soon see, my bench test does not lie, so we were in a bit of a pickle.
As it turns out, some diligent elf had directed the AWS instance to run the disk drives at half their capacity. Why anyone would want to do this, is beyond me, so why would we expect it to be a configurable parameter?
Ahh, the cloud. So many choices.
So we fixed this, and hunted around for any other dead-weight keeping us bolted to the ground, and ran the test again.
This time, everybody whooped. The new machine was far-and-away faster, but smaller and less expensive to operate than their on-prem equivalent. The bench test helped them know this, not just take a vendor’s word for it. The bench test also — go figure — helped us diagnose the configuration problem.
Almost there – bear with me.
Years ago, some colleagues and I built test harnesses for our software and hardware configurations, and it saved us a lot of time. We wanted a performance test check, (much like the “speed test” app for Windows, which connects to the internet, runs a few pulls and pushes of a known size, and tells us how fast our connection is. Hey, this could work in our favor either way, especially if we think our internet vendor might be short-changing us!)
The bench-test for the Netezza platform should be straightforward — it’s an appliance with so many predictable behaviors. All we need is a table of common composition and record length, a known table size, and some other canned values “random”, but within measurable limits.
Many years ago, we thought using the customer’s data was more appropriate, but such structures aren’t portable. They create mystery where we want to remove it.
We pulled several examples of tests such as TPC-D and others, but these always had something subtle that didn’t test well or it had too many variables. We had to tweak it here or there, and we just knew any modifications would require careful documentation, consummate faith in the end users, and infinite time to help them work through kinks.
Frankly, we were low on all that stuff.
We know what to test and what the results should be, so why not make our own, just for Netezza? The changes required for the end-user should be for credentials and database, not the data structures, algorithm, etc. Moreover, every instance of the test, worldwide, should be comparable to any other test instance.
I finally formulated one, and here it is for your review. You might wonder why it took so long to make something so simple, but the fact is, it’s the product of many heated debates and deliberation with data aficionados worldwide.
The spirit of the bench test is this:
(a) Run the performance test on the existing system.
(b) Run the same test on the newer model, such as the NPS on-prem or cloud.
(c) Compare the results.
A drum-roll please—
At the end of this post is a link to a TAR file. Download it and extract the contents. Four are bash shell script and the others are log files.
(1) bench_utils.sh – library used by bench_build.sh and bench_perf.sh
(2) bench_build.sh – builds all the data structures and data.
(3) bench_perf.sh – tests all the data structures and data.
(4) pivot.sh – from the log output of bench_build.sh and bench_perf.sh, pivot the metric and ready it for Excel.
(5) build.log.out (Base+1 timing for bench_build.sh) – These metrics should come close to your own Base+1 metrics.
(6) perf.log.out (Base+1 timing for bench_perf.sh) – These metrics should also come close to your Base+1 metrics.
If you have a Base+2 or higher, expect your numbers from the logs NPS to be linear matches to these log files.
If you have a script server connected to both existing Netezza and to your new NPS, all the better – copy these files there. Otherwise, you may need to place a copy on each Netezza machine, ideally in a directory under the “nz” user (I usually make a subdir called “perf” and put them there).
Make the files executable with “chmod 775 ” for each file.
Make sure the files are executable via PATH by prefixing a dot and colon to PATH:
These scripts also presume you can invoke “nzsql” from the command line.
Lastly, we need a working database to run these scripts inside – we strongly recommend doing this in an isolated database and not any working databases. The test is not destructive, but an isolated database is a best practice.
I typically create a SCRATCHPAD database.
nzsql -d system -c “create database scratchpad”
In the file “bench_utils.sh” the environment variable SCRATCHDB
must be set to the name of your scratchpad.
bench_build.sh > build.log 2>&1
bench_perf.sh > perf.log 2>&1
In the above, all output will go to the log files. If we want to monitor progress, open another putty window, go to the directory where these log files are located, and type:
tail -f build.log
tail -f perf.log
Likely goes without saying, but I’ll say it anyhow — try to find a quiet moment on the current system to run these tests. The “bench_build” and “bench_perf” have the following durations depending on the machine:
Base+1 NPS Less than 15 minutes
Mako, one-rack Less than 30 minutes
Striper, one-rack Less than 30 minutes
TwinFin, one-rack Less than 45 minutes
Once the logs are in place, see notes in the “pivot.sh” script and how it pivots the log output into single rows of pipe-delimited data, easily consumed by Excel or other tool.