Latest Netezza Performance Server
CloudPak for Data (CP4D) and Netezza Peformance Server (NPS) are the latest one-two market punch from IBM. I’ve already migrated a number of our clients into this platform. The experience has been, in a word, fun.
Now don’t get me wrong, moving an enterprise to another platform is non-trivial, but it’s interesting that the feedback we hear from client always involves a measure of laughter.
My first interaction with the system, I connected to the device’s host and also launched a Linux instance on my laptop, with a very old Netezza client instance (4.x or so). I setup the common parameters (NZ_USER, NZ_HOST, etc) and executed “nzsql”. The interface connected the first time, no pushback. I was able to interact normally with all the usual nzsql commands. This one test proved (to me) that the technology was faithful to the original
Except now, the system is 64-bit, flash drives, faster CPUs, faster hardware infrastructure — in short — we were about to have a blast.
I formulated a benchmark script which can run on any Netezza system. It sets up a common data structure with consistent data types, builds tables of known sizes, and executes stress-queries which both join and scan the tables, exercise co-location and zone maps. The script has a “build” and a “run” – and outputs timings for each operation.
Why is this important? Running this test on an existing Netezza system (say, a Mako), and running it on the new NPS gives us apples-for-apples what kind of baseline performance boost we should expect. It gives us raw numbers of course, but it can’t be fooled. Disk speeds on a prior machine are “what they are” and this test goes to the deepest machine metal. Shoot me an email at email@example.com and I will send you the tarball and instructions. It’s simple bash shell script and easy to follow.
The system can be on-prem or provisioned on the cloud. It’s the real Netezza hardware — massively parallel with shared-nothing drives, all the familiar parts and nomenclature. I say this because in legend, some had suggested (years ago) that the Netezza cloud version might be commodity hardware with a Netezza host, which would be entirely anticlimactic. The Netezza host is merely a facade. The heavy-lifting is in the hardware.
IBM provisioned us a POC instance on Amazon Web Services and our team executed the first-time implementation of Netezza in the cloud. The client pointed their ETL to the cloud version, their reports and Cognos application to it, and the experience was seamless. I mean, not a hiccup. Our challenges were the usual migration issues, but not the technology. This gave the client immense confidence to proceed forward. They moved their ETL and reporting into the instance, too, and unhooked themselves from any on-prem dependency for this solution.
We used a backup/restore scenario to move data up to the cloud, largely because this was the most stable means to guarantee one-for-one data. We could snapshot report outputs at the time of the backup and compare those to our results on the cloud instance. No sooner had we moved the data than we needed to move it again. Their engineers were busy with changes to the solution and we had to account for those in the new instance. We setup some simple automations that allowed us to kick-off a migration of data and monitor it on the way up. Fourteen hours worth of stop-and-go tasks to move three databases, all automated from start to finish — largely because the machines were “so” identical and the AWS instance so predictable – it’s an appliance — what can we say?
So — a shout-out to the folks at IBM who made this happen, both in its design and construction, and in assisting with its deployment.
In case there’s any doubt, Cloud Pak for Data (CP4D) interoperates with Netezza Performance Server as a harness, but does not require us to go through the CP4D layer to interact with the Netezza host. In fact, for this first instance of Netezza on the cloud, CP4D had a small configuration issue particular to our deployment. It was easier to unhook CP4D from Netezza, muscle-through the migration, then hook the systems together again (all in software). IBM has since corrected this problem, but the interim fix was simple, and demonstrates the reality: These are separate systems in synergy. One does not constrain the other.
As many of our clients know, we have a replication technology for Netezza we deploy as a service to our clients. Our replicator (nzREP) dives deep into the system catalog and is intimately connected to its API. We could know quickly if the new NPS was compatible, so we setup a replication scenario between our on-prem instance and the cloud instance, and the systems talked flawlessly.
Since that first rollout, we’ve had a steady stream of people upgrading, moving, kicking the tires, and the verdict is in — they love the new appliance and the cloud with it.
Cloud Pak for Data is an IBM Cloud technology built on Openshift. It’s an on-prem cloud and allows us to pull down various applications (kind of like an app store) and launch them under Cloud Pak, sort of like an operating system, but more like an integration platform. Data scientists can snap together components from various tools and components, test and examine what they like, and hand it off as a working prototype for developers to make operational.
As its own cloud, CP4D helps use containerize and modernize our apps with no risk of exposure, so we can deploy what we like. Moreover, we can deploy some things to the external cloud and keep other things local. Push a volatile new app with a strong need for elasticity, and when it stabilized, bring it back into the fold with no loss of functionality. Point being, the apps aren’t either/or where we must choose which to permanently migrate to an external cloud. We can do both now, and the multi-cloud experience is just a key-click away.