IBM DB2 Purescale is very cool, IBM WebSphere eXtreme Scale makes it better - No Fluff Just Stuff

IBM DB2 Purescale is very cool, IBM WebSphere eXtreme Scale makes it better

Posted by: Billy Newport on

DB2 Purescale is announced now and it's pretty cool, they have worked for a long time on this and it looks like a great achievement by the folks in DB2 land. It's a scale out SQL database based on p-series hardware for now and the new cluster accelerator component. The cluster accelerator component is very similar to the coupling facility on z-series. As a result, DB2 Purescale is very similar to the design of DB2/390.

However, it's primary competitor is Oracle RAC. It's a very different design than Oracle RAC in that it does actually scale out well. RAC uses a peer to peer model for coordinating between its nodes that leads to performance bottlenecks under non application partitioned read/write transactional workloads. As a result, it appears most Oracle customers use RAC simply for 2 node hot failover. It's a solution for high availability, not scale out...

DB2 Purescale is different. Instead of trying to have each node coordinate with every other node, they centralized elements of this communication to the cluster accelerator. Centralized sounds slower but its not here. The peer to peer model results in more operations for coordination, all nodes are involved in the coordination function. The centralized one is much less as there is only one node handling this. Big difference, reduces coordination latencies significantly and this means better scaling. Of course, the cluster accelerator will slow down gradually as the cluster scales out but you end up being able to scale significantly more with this approach than the peer to peer approach. It's unlikely in practice customers will hit the limits.

Customers should also choose Purescale over previous solutions like DB2 HADR or queue replication for HA because the second box is actually contributing to the workload. 2 boxes gives higher performance than a single box with a passive replica. But, replication between data centers will still require technologies like queue replication and so on as all purescale nodes need to be on the same high performance network and SAN.

So customers can, of course, use PureScale for 2 node hot failover like Oracle RAC customers do but they will also be able to use it to build larger scale out databases and unlike Oracle RAC it will perform and scale up well. This will let customers build larger N node database clusters with hot failover and get very reasonable scaling. This may mean customers that previously looked at sharded databases might view purescale as an alternative to sharding as it will still look like a single database but operationally a sharded database (i.e. a manually partitioned set of independent databases) may be better from a risk point of view because the shards are independent and scale forever where as there is a limit for purescale but for most customers the limit will not be a factor. Purescale should give many customers looking at sharding a great alternative for scale out without the hassles of partitioning. Plus, there is no reason why a pair of Purescale nodes would be a shard. Each shard still needs to be highly available and if a shard gets really hot then you can add a node to the shard which is impossible with a conventional database. DB2 Purescale solves the hot shard problem nicely.

Does this mean there is no need for caching products like WebSphere eXtreme Scale? Of course not. Lets examine why:

  • It's still an SQL database. You needed caches before for SQL databases, you still need caches for DB2 Purescale databases.
  • IBM is not giving away DB2 purescale for free. It runs on Enterprise class servers with a SAN and very high speed networks. Companies will get significant value from buying Purescale and as a result they will want to extract the best value from it also. This means they will want to cache data in front of it just like they would have using an SMP database. Nothing has changed.
  • Data is served up in SQL form and for many applications then needs to be object relationally mapped to objects. This means complex SQL to the database which is slow and lots of path length doing the actual mapping to objects in the application. Products like IBM WebSphere eXtreme Scale cache data in the native application form, this makes it significantly faster to fetch from the cache rather than fetching from a database and reduces the path on the application side because there is no mapping.
  • You can't collocate your application with your data. The data will still be in a separate tier just like todays databases. 
  • It's a single data center technology. The communication between the nodes and the cluster accelerator require a very high speed, low latency network and access to a SAN for best performance. You will only be able to use a purescale cluster within a single data center or possibly within a single network switch for best performance. Products like IBM WebSphere eXtreme Scale have deployed/proven out of the box multi-data center (more than 2) active/active capabilities.
  • Customers used caches to reduce response times to access data. While DB2 purescale absolutely scales out horizontally extremely well, each operation still does a small amount of communication with the cluster accelerator. This is work that isn't needed with a normal single node database. This means slightly slower response times so if you needed to cache before for response times then nothing has changed with purescale.
  • It's not a cost effective place to store short lived persistent data like application/http sessions or data related to application-customer conversations. Products like IBM WebSphere eXtreme Scale are designed to meet these needs in a more cost effective manner.

Most of this is not specific to purescale. These points apply to all databases, even mysql. These are the reasons customers used caches with databases or replaced databases with caches already. So, I hope we sell a truckload of DB2 purescale with a truck load of IBM WebSphere eXtreme Scale in front of it.


Billy Newport

About Billy Newport

Billy is a Distinguished Engineer at IBM. He's been at IBM since 2001. Billy was the lead on the WorkManager/ Scheduler APIs which were later standardized by IBM and BEA and are now the subject of JSR 236 and JSR 237. Billy lead the design of the WebSphere 6.0 non blocking IO framework (channel framework) and the WebSphere 6.0 high availability/clustering (HAManager). Billy currently works on WebSphere XD and ObjectGrid. He's also the lead persistence architect and runtime availability/scaling architect for the base application server.

Before IBM, Billy worked as an independant consultant at investment banks, telcos, publishing companies and travel reservation companies. He wrote video games in C and assembler on the ZX Spectrum, Atari ST and Commodore Amiga as a teenager. He started programming on an Apple IIe when he was eleven, his first programming language was 6502 assembler.

Billys current interests are lightweight non invasive middleware, complex event processing systems and grid based OLTP frameworks.

Why Attend the NFJS Tour?

  • » Cutting-Edge Technologies
  • » Agile Practices
  • » Peer Exchange

Current Topics:

  • Languages on the JVM: Scala, Groovy, Clojure
  • Enterprise Java
  • Core Java, Java 8
  • Agility
  • Testing: Geb, Spock, Easyb
  • REST
  • NoSQL: MongoDB, Cassandra
  • Hadoop
  • Spring 4
  • Cloud
  • Automation Tools: Gradle, Git, Jenkins, Sonar
  • HTML5, CSS3, AngularJS, jQuery, Usability
  • Mobile Apps - iPhone and Android
  • More...
Learn More »