memcached versus DataGrid, dumb server versus smart server...

Memcached is a pretty popular free open source distributed cache. It's pretty unique compared with the commercial data grids in that it has a completely unique design.

Memcached is basically a dumb server. This means it's a process running on a Linux box typically which accepts client requests for keys and returns the data. Updates and deletes are also supported. This is ALL the function in the server, hence, dumb server. The client is the only piece which is 'distributed' or even knows about other servers. The clients typically get a collection of host:ip addresses (they all NEED to use the same list) and the clients hash the key over the set of online servers. If there were 4 servers running then all clients will hash data for a key over these 4 servers. Clients will write data for a key K to a particular server using this hash algorithm. One server.

Now, thats what memcached does out of the box. Note, there is no replication on the server side. If a server fails then the clients will hash keys over 3 servers but this changes everything from a hashing point of view so it's likely you'll get a very high miss rate as it's unlikely keys still hash to the same server as before given the modulo part of the hashing is now 3 instead of 4.

Clients work around this by adding a consistent hashing layer on top of the normal client. This works better and writes may be written to a server and the next server in the hash ring. This slows down writes by 2x but allows some recoverability when a server fails. The next server in the ring has the data from the dead server so it's not as bad as before BUT if this server also fails then now the cache has lost the data. It's not replicated remember, it's just written twice. After the first server failed, the clients write changes to the next server and the following server in the hash ring. The new next server in the ring does not have a copy of all the data. It was never written there. It will just have new changes since the failure. There is no active replication going on here.

Most of the materiel on the web shows various workarounds to try to make memcached 'reliable'. A lot of the facebook engineering blog entries focus on this aspect. What they had to do to make it reliable. This works for facebook but won't work in general because I don't think the majority of customers are structured like they are from a build your own mentality. I was talking to someone from one of the big internet shops last week and they always run two memcached clusters for HA. If a server in cluster A fails then they stop using cluster A and switch to B. This works but thats a 50% loss in hardware because of a single failure.

Smart servers

Commercial data grids like IBM WebSphere eXtreme Scale are very different. They handle making the data fault tolerant on the server side with built-in replication and elastic scale out and scale in. Their clients understand the server side data layout or routing. If a server fails with WebSphere eXtreme Scale then no data is lost. The software actively maintains replicas on other servers. Clients that stored an entry with a key K before will still find it. It just works. You don't need consistent hashing on the client, our client does it using routing tables instead. Adding more server processes causes the product to automatically redistribute the minimum amount of data to leverage the new memory/CPU and network available through the new JVMs.

The smart servers allow proper replication which means that the cache isn't as brittle as the memcached cache when servers come and go. If nothing ever fails then the memcached approach works but the potential cache hit rates over time if multiple failures occur mean it can be a liability. The last facebook entry I read on this showed them crawling the logs to try to recover lost data to make the 'next server' more complete. They get this issue and are jumping through hoops to handle it but again, most commercial customers won't do this. They expect the product to do this sort of stuff like replication and they view it as table steaks.

So, memcached for sure has its uses but it's got some limitations which need a fair bit of work by its users to handle correctly if its not exactly what you want or risk some bad scenarios as the cache hit rate climbs. It's very, very fast but you gotta remember, it's not doing much either... No transactions, no consistency, no replication, no isolation and so on. If you don't need these features then it's pretty fast but if you do need this stuff then it starts to slow as you add code on the client side to try and compensate for the limitations.

Billy Newport's complete blog can be found at: http://www.devwebsphere.com/devwebsphere

About Billy Newport

Billy is a Distinguished Engineer at IBM. He's been at IBM since 2001. Billy was the lead on the WorkManager/ Scheduler APIs which were later standardized by IBM and BEA and are now the subject of JSR 236 and JSR 237. Billy lead the design of the WebSphere 6.0 non blocking IO framework (channel framework) and the WebSphere 6.0 high availability/clustering (HAManager). Billy currently works on WebSphere XD and ObjectGrid. He's also the lead persistence architect and runtime availability/scaling architect for the base application server.

Before IBM, Billy worked as an independant consultant at investment banks, telcos, publishing companies and travel reservation companies. He wrote video games in C and assembler on the ZX Spectrum, Atari ST and Commodore Amiga as a teenager. He started programming on an Apple IIe when he was eleven, his first programming language was 6502 assembler.

Billys current interests are lightweight non invasive middleware, complex event processing systems and grid based OLTP frameworks.

Why Attend the NFJS Tour?

» Cutting-Edge Technologies
» Agile Practices
» Peer Exchange

Current Topics:

Languages on the JVM: Scala, Groovy, Clojure
Enterprise Java
Core Java, Java 8
Agility
Testing: Geb, Spock, Easyb
REST
NoSQL: MongoDB, Cassandra
Hadoop
Spring 4
Cloud
Automation Tools: Gradle, Git, Jenkins, Sonar
HTML5, CSS3, AngularJS, jQuery, Usability
Mobile Apps - iPhone and Android
More...

Learn More »

memcached versus DataGrid, dumb server versus smart server...

Posted by: Billy Newport on

About Billy Newport

Why Attend the NFJS Tour?

Current Topics: