diff --git a/00-RELEASENOTES b/00-RELEASENOTES
index 36317ca35..81ff184fe 100644
--- a/00-RELEASENOTES
+++ b/00-RELEASENOTES
@@ -1,85 +1,16 @@
-Redis 2.6 release notes
+Hello! This file is just a placeholder, since this is the "unstable" branch
+of Redis, the place where all the development happens.
 
-Migrating from 2.4 to 2.6
-=========================
+There is no release notes for this branch, it gets forked into another branch
+every time there is a partial feature freeze in order to eventually create
+a new stable release.
 
-Redis 2.4 is mostly a strict subset of 2.6. However there are a few things
-that you should be aware of:
+Usually "unstable" is stable enough for you to use it in development enviromnets
+however you should never use it in production environments. It is possible
+to download the latest stable release here:
 
-* You can't use .rdb and AOF files generated with 2.6 into a 2.4 instance.
-* 2.6 slaves can be attached to 2.4 masters, but not the contrary, and only
-  for the time needed to perform the version upgrade.
+    http://download.redis.io/releases/redis-stable.tar.gz
 
-There are also a few API differences, that are unlikely to cause problems,
-but it is better to keep them in mind:
+More information is available at http://redis.io
 
-* SORT now will refuse to sort in numerical mode elements that can't be parsed
-  as numbers.
-* EXPIREs now all have millisecond resolution (but this is very unlikely to
-  break code that was not conceived exploting the previous resolution error
-  in some way.)
-* INFO output is a bit different now, and contains empty lines and comments
-  starting with '#'. All the major clients should be already fixed to work
-  with the new INFO format.
-
-Also the following redis.conf and CONFIG GET / SET parameters changed name:
-
-    * hash-max-zipmap-entries, now replaced by hash-max-ziplist-entries
-    * hash-max-zipmap-value, now replaced by hash-max-ziplist-value
-    * glueoutputbuf was now completely removed as it does not make sense
-
----------
-CHANGELOG
----------
-
-What's new in Redis 2.6.0
-=========================
-
-UPGRADE URGENCY: We suggest new users to start with 2.6.0, and old users to
-                 upgrade after some testing of the application with the new
-                 Redis version.
-
-* Server side Lua scripting, see http://redis.io/commands/eval
-* Virtual Memory removed (was deprecated in 2.4)
-* Hardcoded limits about max number of clients removed.
-* AOF low level semantics is generally more sane, and especially when used
-  in slaves.
-* Milliseconds resolution expires, also added new commands with milliseconds
-  precision (PEXPIRE, PTTL, ...).
-* Clients max output buffer soft and hard limits. You can specifiy different
-  limits for different classes of clients (normal,pubsub,slave).
-* AOF is now able to rewrite aggregate data types using variadic commands,
-  often producing an AOF that is faster to save, load, and is smaller in size.
-* Every redis.conf directive is now accepted as a command line option for the
-  redis-server binary, with the same name and number of arguments.
-* Hash table seed randomization for protection against collisions attacks.
-* Performances improved when writing large objects to Redis.
-* Significant parts of the core refactored or rewritten. New internal APIs
-  and core changes allowed to develop Redis Cluster on top of the new code,
-  however for 2.6 all the cluster code was removed, and will be released with
-  Redis 3.0 when it is more complete and stable.
-* Redis ASCII art logo added at startup.
-* Crash report on memory violation or failed asserts improved significantly
-  to make debugging of hard to catch bugs simpler.
-* redis-benchmark improvements: ability to run selected tests,
-  CSV output, faster, better help.
-* redis-cli improvements: --eval for comfortable development of Lua scripts.
-* SHUTDOWN now supports two optional arguments: "SAVE" and "NOSAVE".
-* INFO output split into sections, the command is now able to just show 
-  pecific sections.
-* New statistics about how many time a command was called, and how much
-  execution time it used (INFO commandstats).
-* More predictable SORT behavior in edge cases.
-* INCRBYFLOAT and HINCRBYFLOAT commands.
-
---------------------------------------------------------------------------------
-
-Credits: Where not specified the implementation and design are done by
-Salvatore Sanfilippo and Pieter Noordhuis. Thanks to VMware for making all
-this possible. Also many thanks to all the other contributors and the amazing
-community we have.
-
-See commit messages for more credits.
-
-Cheers,
-Salvatore
+Happy hacking!
diff --git a/COPYING b/COPYING
index c8665ba67..a58de44dd 100644
--- a/COPYING
+++ b/COPYING
@@ -1,4 +1,4 @@
-Copyright (c) 2006-2012, Salvatore Sanfilippo
+Copyright (c) 2006-2014, Salvatore Sanfilippo
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
diff --git a/Changelog b/Changelog
deleted file mode 100644
index f72746663..000000000
--- a/Changelog
+++ /dev/null
@@ -1,1032 +0,0 @@
-2010-07-01 gitignore modified (antirez)
-2010-06-22 redis.c split into many different C files. (antirez)
-2010-06-16 more pub/sub tests (Pieter Noordhuis)
-2010-06-15 initial basic pub/sub tests (Pieter Noordhuis)
-2010-06-15 fix BLPOP/BRPOP to use the wrapped function for list length (Pieter Noordhuis)
-2010-06-15 tests for BLPOP/BRPOP via an option in the tcl client that defers reading the reply (Pieter Noordhuis)
-2010-06-14 TODO updated (antirez)
-2010-06-14 Merge branch 'ltrim-tests' of git://github.com/pietern/redis (antirez)
-2010-06-14 rename "list" to "linkedlist" to be more verbose (Pieter Noordhuis)
-2010-06-14 allow running the test suite against an external Redis instance, without auto spawning (antirez)
-2010-06-14 change ltrim tests to cover all min/max cases and add stronger stresser (Pieter Noordhuis)
-2010-06-13 Fixed deps in makefile and mkreleasehdr.sh script to really take advantage of the new trick to avoid recompilation of redis.c on git sha1 or dirty status change (antirez)
-2010-06-13 hopefully faster recompiling with a trick (antirez)
-2010-06-13 fixed a bug in rdbLoadObject abount specially encoded objects (antirez)
-2010-06-13 use raw strings when loading a hash from the rdb into a zipmap (Pieter Noordhuis)
-2010-06-12 Merge branch 'expire' of git://github.com/pietern/redis (antirez)
-2010-06-11 Merge branch 'lists' of git://github.com/pietern/redis (antirez)
-2010-06-11 LPUSHX, RPUSHX, LINSERT only work on non-empty lists, so there are no clients waiting for a push (Pieter Noordhuis)
-2010-06-11 make LINSERT return -1 when the value could not be inserted (Pieter Noordhuis)
-2010-06-11 check if the list encoding needs to be changed on LPUSHX, RPUSHX, LINSERT (Pieter Noordhuis)
-2010-06-11 make sure the value to insert is string encoded (Pieter Noordhuis)
-2010-06-11 rename vars, move arguments, add comments (Pieter Noordhuis)
-2010-06-11 always iterate from head to tail on LINSERT (Pieter Noordhuis)
-2010-06-11 use REDIS_TAIL to insert AFTER an entry and REDIS_HEAD to insert BEFORE an entry (Pieter Noordhuis)
-2010-06-11 move listTypeInsert to be grouped with other wrapper functions (Pieter Noordhuis)
-2010-06-11 squashed merge from robey/twitter3: LINSERT BEFORE|AFTER, LPUSHX, RPUSHX (Pieter Noordhuis)
-2010-06-09 remove pop function and the sds dependency; can be implemented using get+delete (Pieter Noordhuis)
-2010-06-07 compute swappability for ziplist encoded lists (Pieter Noordhuis)
-2010-06-07 reuse the sds from the main dictionary in the expiration dictionary (Pieter Noordhuis)
-2010-06-07 TODO updated (antirez)
-2010-06-07 encode integers while loading an hash (antirez)
-2010-06-05 Merge branch 'lists' of git://github.com/pietern/redis (antirez)
-2010-06-05 fixed two leaks for the dual encoded lists (Pieter Noordhuis)
-2010-06-04 TODO updated (antirez)
-2010-06-04 DISCSARD now unwatches all keys, as it should (antirez)
-2010-06-04 generated tests for different encodings to avoid test code duplication (Pieter Noordhuis)
-2010-06-04 refactor list tests to test both encodings; implemented assert functions (Pieter Noordhuis)
-2010-06-04 renamed hash wrapper functions to match wrapper function naming convention: "<type>Type<func>" (Pieter Noordhuis)
-2010-06-04 Merge branch 'lists' of git://github.com/pietern/redis (antirez)
-2010-06-04 Merge branch 'smallkeys' (antirez)
-2010-06-04 safety assert in listTypeNext (Pieter Noordhuis)
-2010-06-04 renamed list wrapper functions to be more verbose (Pieter Noordhuis)
-2010-06-04 add thresholds for converting a ziplist to a real list (Pieter Noordhuis)
-2010-06-04 merge antirez/smallkeys (Pieter Noordhuis)
-2010-06-03 test restored (antirez)
-2010-06-03 memory leak introduced in the latest big changes fixed (antirez)
-2010-06-03 Fixed VM bugs introduced with the top level keys as sds strings changes (antirez)
-2010-06-03 top level keys are no longer redis objects but sds strings. There are still a few bugs to fix when VM is enabled (antirez)
-2010-06-03 update Makefile to include ziplist.o (Pieter Noordhuis)
-2010-06-03 use ziplists in SORT STORE until the thresholds are determined (Pieter Noordhuis)
-2010-06-03 Merge branch 'testsuite' of git://github.com/pietern/redis (antirez)
-2010-06-03 Merge branch 'testsuite' of git://github.com/pietern/redis into smallkeys (antirez)
-2010-06-03 tag memory leak check on kill server as "leaks" (Pieter Noordhuis)
-2010-06-03 tag test with sleep() as slow (Pieter Noordhuis)
-2010-06-03 make sure the config it returned when called without code (Pieter Noordhuis)
-2010-06-03 tag more slow tests (Pieter Noordhuis)
-2010-06-03 change how arguments are passed from the AOF tests (Pieter Noordhuis)
-2010-06-03 scope res variable outside test (Pieter Noordhuis)
-2010-06-02 tags for existing tests (Pieter Noordhuis)
-2010-06-02 pass tags to filter and match via arguments (Pieter Noordhuis)
-2010-06-02 basic support to tag tests (Pieter Noordhuis)
-2010-06-02 changed how server.tcl accepts options to support more directives without requiring more arguments to the proc (Pieter Noordhuis)
-2010-06-02 removed obsolete code (Pieter Noordhuis)
-2010-06-02 catch exceptions in the server proc, to be able to kill the entire chain of running servers (Pieter Noordhuis)
-2010-06-02 Merge branch 'master' into smallkeys (antirez)
-2010-06-02 smarter swapout policy on AOF too (antirez)
-2010-06-02 better swapout policy while loading RDB file (antirez)
-2010-06-02 minor code comment change (antirez)
-2010-06-01 use integer types from stdint.h to be more verbose on the size in bytes of encoded elements. update list length to use 2 bytes instead of 1. (Pieter Noordhuis)
-2010-06-01 added stress test for heavy i/o in ziplists (Pieter Noordhuis)
-2010-06-01 fix signedness errors in ziplist testing code (Pieter Noordhuis)
-2010-06-01 minor code movements and free object pull restored to 1 million (antirez)
-2010-06-01 TODO updated with syslog plans for 2.2 (antirez)
-2010-06-01 Debug message was printing stuff that are sometimes not initialized/valid (antirez)
-2010-06-01 Merge branch 'smallkeys' of github.com:antirez/redis into smallkeys (antirez)
-2010-06-01 fixed a few comments (antirez)
-2010-06-01 fixed bugs introduced in the rewrite of the new VM engine (antirez)
-2010-05-31 support rewriting the AOF with dual list encoding (Pieter Noordhuis)
-2010-05-31 small refactor of fwrite* commands for AOF rewrite to allow writing a bulk long long (Pieter Noordhuis)
-2010-05-31 use list wrapper functions in computing the dataset digest (Pieter Noordhuis)
-2010-05-31 ziplistNext should work as expected when called with a pointer to ZIP_END (Pieter Noordhuis)
-2010-05-31 update SORT to work with the dual list encoding (Pieter Noordhuis)
-2010-05-31 function to create a new ziplist encoded list (Pieter Noordhuis)
-2010-05-31 fixed missing incrRefCount (antirez)
-2010-05-31 support rdb saving/loading with dual list encoding (Pieter Noordhuis)
-2010-05-31 fixed signedness and disambiguate variable names (Pieter Noordhuis)
-2010-05-31 added rdb save function to directly save long long values (Pieter Noordhuis)
-2010-05-31 update RPOPLPUSH to support dual encoding (Pieter Noordhuis)
-2010-05-31 update list iteration semantic to work as expected (i.e. "while(lNext(..))") (Pieter Noordhuis)
-2010-05-31 ziplistDelete no longer needs a direction now ziplistPrev is fixed (Pieter Noordhuis)
-2010-05-31 ziplistPrev should return the tail when the argument is ZIP_END (Pieter Noordhuis)
-2010-05-31 first step of VM rewrite. blocking VM tests passing, more work needed in the async side (antirez)
-2010-05-31 Merge branch 'no-appendfsync-on-rewrite' (antirez)
-2010-05-30 fix LREM to remove *all* occurances when a zero argument is given (Pieter Noordhuis)
-2010-05-30 fixed LINDEX to always return bulk response (Pieter Noordhuis)
-2010-05-30 the tail offset must be an integer pointer to hold a 32-bit offset (Pieter Noordhuis)
-2010-05-30 update LREM to support dual encoding via extra iteration primitives (Pieter Noordhuis)
-2010-05-30 support dual encoding in LTRIM (Pieter Noordhuis)
-2010-05-30 update LRANGE to use basic iteration code to support dual encoding (Pieter Noordhuis)
-2010-05-30 inline support for dual encoding in the LINDEX and LSET commands (Pieter Noordhuis)
-2010-05-30 generic pop and length function for ziplist encoding (Pieter Noordhuis)
-2010-05-30 generic push function that supports the dual encoding (Pieter Noordhuis)
-2010-05-30 change delete function to accept a direction argument, so "p" can be properly updated (Pieter Noordhuis)
-2010-05-30 expose extra functionality from ziplist.c (Pieter Noordhuis)
-2010-05-30 code style consistency fixes (Pieter Noordhuis)
-2010-05-29 ziplistIndex now accepts negative indices (Pieter Noordhuis)
-2010-05-29 fix compile warnings (Pieter Noordhuis)
-2010-05-29 use simpler encoding for the length of the previous entry (Pieter Noordhuis)
-2010-05-29 replace functions to get pointers to head and tail by macros (Pieter Noordhuis)
-2010-05-29 function to insert an element at an arbitrary position in the list (Pieter Noordhuis)
-2010-05-29 extract a generic delete function that can be used in pop and delete(range) (Pieter Noordhuis)
-2010-05-29 use the entry struct in zipRawEntryLength (Pieter Noordhuis)
-2010-05-29 rename argument names to s* to disambiguate from e* (Pieter Noordhuis)
-2010-05-29 change ziplistRepr to use the entry struct (Pieter Noordhuis)
-2010-05-29 modify compare function to check if the encoding is equal before comparing (Pieter Noordhuis)
-2010-05-29 use a struct to retrieve all details for an entry (Pieter Noordhuis)
-2010-05-29 initial implementation for making the ziplist doubly linked (Pieter Noordhuis)
-2010-05-29 fix some warnings (Pieter Noordhuis)
-2010-05-29 add function to retrieve ziplist size in bytes (Pieter Noordhuis)
-2010-05-22 fix compare function of ziplist to only load integer from ziplist when it is encoded as integer (Pieter Noordhuis)
-2010-05-22 add function to retrieve length of ziplist (Pieter Noordhuis)
-2010-05-22 re-introduce ZIP_BIGLEN for clarity (Pieter Noordhuis)
-2010-05-22 added header ziplist.h (Pieter Noordhuis)
-2010-05-22 code to compare strings with entries in ziplist, regardless of their encoding (Pieter Noordhuis)
-2010-05-22 updated iteration code to work well with different encodings (Pieter Noordhuis)
-2010-05-22 move code from zip.c to ziplist.c (Pieter Noordhuis)
-2010-05-22 partial revert of c80df5 because ziplist functions are starting to divert too much from zipmap functions (Pieter Noordhuis)
-2010-05-22 initial work for integer encoding in ziplists (Pieter Noordhuis)
-2010-05-22 move length housekeeping to a macro (Pieter Noordhuis)
-2010-05-21 allow entries to be deleted in place when iterating over a ziplist (Pieter Noordhuis)
-2010-05-21 allow pointer to be stored to current element when iterating over ziplist (Pieter Noordhuis)
-2010-05-21 rename ziplistDelete to ziplistDeleteRange (Pieter Noordhuis)
-2010-05-21 code to delete an inner range from the ziplist (Pieter Noordhuis)
-2010-05-21 check if *value is non-NULL before setting it (Pieter Noordhuis)
-2010-05-21 change iteration code to avoid allocating a new sds for each traversed entry (Pieter Noordhuis)
-2010-05-21 code to iterate over a ziplist (Pieter Noordhuis)
-2010-05-21 implementation for a ziplist with push and pop support (Pieter Noordhuis)
-2010-05-21 extracted general methods to zip.c for reuse in other zip* structures (Pieter Noordhuis)
-2010-05-28 command table size calculated with sizeof (antirez)
-2010-05-28 use qsort and bsearch to lookup commands in O(log(N)) instead of O(N) (Pieter Noordhuis)
-2010-05-28 Merge branch 'cli-stdin' of git://github.com/pietern/redis (antirez)
-2010-05-28 Fixed ZINCR Nan bugs leading to server crash and added tests (antirez)
-2010-05-28 redis.conf new features the new option, a minor typo preventing the compilation fixed (antirez)
-2010-05-28 don't fsync after a rewrite if appendfsync is set to no. use aof_fsycn instead of fsync where appropriate (antirez)
-2010-05-28 added new option no-appendfsync-on-rewrite to avoid blocking on fsync() in the main thread while a background process is doing big I/O (antirez)
-2010-05-28 Added Git sha1 and dirty status in redis-server -v output (antirez)
-2010-05-28 changed the message in the Makefile with the new command like to run the test suite (antirez)
-2010-05-27 Fixed typo. (Vincent Palmer)
-2010-05-27 new multi/exec tests (antirez)
-2010-05-26 build command outside while loop (Pieter Noordhuis)
-2010-05-26 require the flag "-c" to be used for redis-cli to read the last argument from stdin (Pieter Noordhuis)
-2010-05-26 Merge branch 'master' into nested-multi (antirez)
-2010-05-26 Fix EXEC bug that was leaving the client in dirty status when used with WATCH (antirez)
-2010-05-26 raise error on nested MULTI and WATCH inside multi (antirez)
-2010-05-25 allow regular sets to be passed to zunionstore/zinterstore (Pieter Noordhuis)
-2010-05-25 Version is now 2.1.1 (antirez)
-2010-05-25 RENAME is now WATCH-aware (antirez)
-2010-05-25 TODO updated (antirez)
-2010-05-25 WATCH is now able to detect keys removed by FLUSHALL and FLUSHDB (antirez)
-2010-05-25 WATCH tests (antirez)
-2010-05-25 minor bug fixed in WATCH (antirez)
-2010-05-25 WATCH for MULTI/EXEC (CAS alike concurrency) (antirez)
-2010-05-25 gitignore updated (antirez)
-2010-05-21 Master is now already unfreezed, unstable, and ready to hacking sessions! (antirez)
-2010-05-21 Merge branch 'solaris' of git://github.com/pietern/redis (antirez)
-2010-05-21 Changelog updated (antirez)
-2010-05-21 redis version is now 1.3.14 (aka 2.0.0 RC1) (antirez)
-2010-05-21 html doc updated (antirez)
-2010-05-21 by default test with valgrind does not show full leak info (antirez)
-2010-05-21 minor fix for the skiplist code, resulting in a false positive with valgrind, and in general into a useless small allocation (antirez)
-2010-05-21 Merge branch 'master' of git@github.com:antirez/redis (antirez)
-2010-05-21 tests suite initial support for valgrind, fixed the old test suite until the new one is able to target a specific host/port (antirez)
-2010-05-21 include solaris fixes in sha1.c (Pieter Noordhuis)
-2010-05-20 Don't exit with error in tests temp file cleanup if there are no files to clean (antirez)
-2010-05-20 fix memory leak on 32-bit builds (Pieter Noordhuis)
-2010-05-20 Merge branch 'master' of github.com:antirez/redis (antirez)
-2010-05-20 Fix for DEBUG DIGEST (antirez)
-2010-05-20 Merge branch 'test_vm' of git://github.com/pietern/redis (antirez)
-2010-05-20 code to enable running tests with the vm enabled (Pieter Noordhuis)
-2010-05-20 minor change to shutdown (antirez)
-2010-05-20 shutdown on SIGTERM (antirez)
-2010-05-20 Merge http://github.com/ngmoco/redis (antirez)
-2010-05-20 fix compile error on solaris (Pieter Noordhuis)
-2010-05-20 added regression for zipmap bug (antirez)
-2010-05-20 fix lookup of keys with length larger than ZIPMAP_BIGLEN (Pieter Noordhuis)
-2010-05-19 TODO updated (antirez)
-2010-05-19 initial tests for AOF (and small changes to server.tcl to support these) (Pieter Noordhuis)
-2010-05-19 Merge branch 'master' into integration (Pieter Noordhuis)
-2010-05-19 Fix for 'CONFIG SET appendonly no' (antirez)
-2010-05-19 It's now possible to turn off and on the AOF via CONFIG (antirez)
-2010-05-18 git hash 00000000 in reelase.h when git is not found enabled again after some shell scripting fix that is now compatible with most shells (antirez)
-2010-05-18 build fixed when simpler shells are used to create release.h (antirez)
-2010-05-18 use git diff when generating release.h to check for dirty status (antirez)
-2010-05-18 Solaris fixes (antirez)
-2010-05-18 html doc rebuild (antirez)
-2010-05-18 buliding of release.h moved into an external script. Avoided recompialtion of redis.c if git sha1 is the same as the previous one (antirez)
-2010-05-17 create release.h in make process and add this information to INFO listing (Pieter Noordhuis)
-2010-05-16 Redis version is now 1.3.12 (antirez)
-2010-05-16 redis version is now 1.3.11 (antirez)
-2010-05-16 random refactoring and speedups (antirez)
-2010-05-16 faster INCR with very little efforts... (antirez)
-2010-05-15 print warnings in redis log when a test raises an exception (very likely to be caused by something like a failed assertion) (Pieter Noordhuis)
-2010-05-15 Merge branch 'redis-cli-fix' of http://github.com/tizoc/redis (antirez)
-2010-05-15 added pid info to the check memory leaks test, so that those tests don't appear to be duplicated (antirez)
-2010-05-15 Merge branch 'integration' of git://github.com/pietern/redis (antirez)
-2010-05-14 more endianess detection fix for SHA1 (antirez)
-2010-05-14 fixed a warning seen with some GCC version under Linux (antirez)
-2010-05-14 initial rough integration test for replication (Pieter Noordhuis)
-2010-05-14 store entire server object on the stack instead of just the client (Pieter Noordhuis)
-2010-05-14 proc to retrieve values from INFO properties (Pieter Noordhuis)
-2010-05-14 one more fix for endianess detection (antirez)
-2010-05-14 Fixed sha1.c compilation on Linux, due to endianess detection lameness (antirez)
-2010-05-14 ZUNION,ZINTER -> ZUNIONSTORE,ZINTERSTORE (antirez)
-2010-05-14 minor fixes to the new test suite, html doc updated (antirez)
-2010-05-14 wait for redis-server to be settled and ready for connections (Pieter Noordhuis)
-2010-05-14 fix cleaning up tmp folder (Pieter Noordhuis)
-2010-05-14 update makefile to use the new test suite (Pieter Noordhuis)
-2010-05-14 check for memory leaks before killing a server (Pieter Noordhuis)
-2010-05-14 extract code to kill a server to a separate proc (Pieter Noordhuis)
-2010-05-14 start servers on different ports to prevent conflicts (Pieter Noordhuis)
-2010-05-14 use DEBUG DIGEST in new test suite (Pieter Noordhuis)
-2010-05-14 split test suite into multiple files; runs redis-server in isolation (Pieter Noordhuis)
-2010-05-14 use DEBUG DIGEST in the test instead of a function that was doing a similar work, but in a much slower and buggy way (antirez)
-2010-05-14 Don't rely on cliReadReply being able to return on shutdown (Bruno Deferrari)
-2010-05-14 If command is a shutdown, ignore errors on reply (Bruno Deferrari)
-2010-05-14 DEBUG DIGEST implemented, in order to improve the ability to test persistence and replication consistency (antirez)
-2010-05-13 Add SIGTERM shutdown handling. (Ashley Martens)
-2010-05-13 makefile deps updated (antirez)
-2010-05-13 conflicts resolved (antirez)
-2010-05-13 feed SETEX as SET and EXPIREAT to AOF (Pieter Noordhuis)
-2010-05-13 very strong speedup in saving time performance when there are many integers in the dataset. Instead of decoding the object before to pass them to the rdbSaveObject layer we check asap if the object is integer encoded and can be written on disk as an integer. (antirez)
-2010-05-13 include limits.h otherwise no double precison macros (antirez)
-2010-05-13 explicitly checks with ifdefs if our floating point and long long assumptions are verified (antirez)
-2010-05-13 Yet another version of the double saving code, with comments explaining what's happening there (antirez)
-2010-05-12 added overflow check in the double -> long long conversion trick to avoid integer overflows. I think this was not needed in practical terms, but it is safer (antirez)
-2010-05-12 use withscores when performing the dataset digest (antirez)
-2010-05-12 If a float can be casted to a long long without rounding loss, we can use the integer conversion function to write the score on disk. This is a seriuous speedup (antirez)
-2010-05-12 fixed compilation warnings in the AOF sanity check tool (antirez)
-2010-05-12 Merge branch 'vm-speedup' (antirez)
-2010-05-11 fix to return error when calling INCR on a non-string type (Pieter Noordhuis)
-2010-05-11 load objects encoded from disk directly without useless conversion (antirez)
-2010-05-11 fixed a problem leading to crashes, as keys can't be currently specially encoded, so we can't encode integers at object loading time... For now this can be fixed passing a few flags, or later can be fixed allowing encoded keys as well (antirez)
-2010-05-11 long long to string conversion speedup applied in other places as well. Still the code has bugs, fixing right now... (antirez)
-2010-05-11 hand written code to turn a long long into a string -> very big speed win (antirez)
-2010-05-11 added specialized function to compare string objects for perfect match that is optimized for this task (antirez)
-2010-05-11 better use of encoding inforamtion in dictEncObjKeyCompare (antirez)
-2010-05-10 CONFIG now can change appendfsync policy at run time (antirez)
-2010-05-10 CONFIG command now supports hot modification of RDB saving parameters. (antirez)
-2010-05-10 while loading the rdb file don't add the key to the dictionary at all if it's already expired, instead of removing it just after the insertion. (antirez)
-2010-05-10 Merge branch 'check-aof' of git://github.com/pietern/redis (antirez)
-2010-05-08 minor changes to improve code readability (antirez)
-2010-05-08 swap objects out directly while loading an RDB file if we detect we can't stay in the vm max memory limits anyway (antirez)
-2010-05-07 change command names no longer used to zunion/zinter (Pieter Noordhuis)
-2010-05-07 DEBUG POPULATE command for fast creation of test databases (antirez)
-2010-05-07 update TODO (Pieter Noordhuis)
-2010-05-07 swap arguments in blockClientOnSwappedKeys to be consistent (Pieter Noordhuis)
-2010-05-07 added function that preloads all keys needed to execute a MULTI/EXEC block (Pieter Noordhuis)
-2010-05-07 add sanity check to zunionInterBlockClientOnSwappedKeys, as the number of keys used is provided as argument to the function (Pieter Noordhuis)
-2010-05-07 make prototype of custom function to preload keys from the vm match the prototype of waitForMultipleSwappedKeys (Pieter Noordhuis)
-2010-05-07 extract preloading of multiple keys according to the command prototype to a separate function (Pieter Noordhuis)
-2010-05-07 make append only filename configurable (Pieter Noordhuis)
-2010-05-07 don't load value from VM for EXISTS (Pieter Noordhuis)
-2010-05-07 swap file name pid expansion removed. Not suited for mission critical software... (antirez)
-2010-05-07 Swap file is now locked (antirez)
-2010-05-06 Merge branch 'master' into aof-speedup (antirez)
-2010-05-06 log error and quit when the AOF contains an unfinished MULTI (antirez)
-2010-05-06 log error and quit when the AOF contains an unfinished MULTI (Pieter Noordhuis)
-2010-05-06 Merge branch 'master' into check-aof (Pieter Noordhuis)
-2010-05-06 hincrby should report an error when called against a hash key that doesn't contain an integer (Pieter Noordhuis)
-2010-05-06 AOF writes are now accumulated into a buffer and flushed into disk just before re-entering the event loop. A lot less writes but still this guarantees that AOF is written before the client gets a positive reply about a write operation, as no reply is trasnmitted before re-entering into the event loop. (antirez)
-2010-05-06 clarified a few messages in redis.conf (antirez)
-2010-05-05 ask for confirmation before AOF is truncated (Pieter Noordhuis)
-2010-05-05 str can be free'd outside readString (Pieter Noordhuis)
-2010-05-05 moved argument parsing around (Pieter Noordhuis)
-2010-05-05 ignore redis-check-aof binary (Pieter Noordhuis)
-2010-05-05 allow AOF to be fixed by truncating to the portion of the file that is valid (Pieter Noordhuis)
-2010-05-05 tool to check if AOF is valid (Pieter Noordhuis)
-2010-05-02 included fmacros.h in linenose.c to avoid compilation warnings on Linux (antirez)
-2010-05-02 compilation fix for mac os x (antirez)
-2010-05-02 Merge branch 'master' of git@github.com:antirez/redis (antirez)
-2010-05-02 On Linux now fdatasync() is used insetad of fsync() in order to flush the AOF file kernel buffers (antirez)
-2010-04-30 More tests for APPEND and tests for SUBSTR (antirez)
-2010-04-30 linenoise.c updated, now redis-cli can be used in a pipe (antirez)
-2010-04-29 redis-cli minor fix (less segfault is better) (antirez)
-2010-04-29 New MONITOR output format with timestamp, every command in a single line, string representations (antirez)
-2010-04-29 redis-cli INFO output format is now raw again (antirez)
-2010-04-29 Added more information about slave election in Redis Cluster alternative doc (antirez)
-2010-04-29 Redis cluster version 2 (antirez)
-2010-04-27 Fixed a redis-cli bug, was using free instead of zfree call (antirez)
-2010-04-27 AOF is now rewritten on slave after SYNC with master. Thanks to @_km for finding this bug and any others' (antirez)
-2010-04-27 redis-cli is now using only the new protocol (antirez)
-2010-04-27 Minimal support for subscribe/psubscribe in redis-cli (antirez)
-2010-04-26 don't output the newline when stdout is not a tty (antirez)
-2010-04-26 redis-cli now is able to also output the string representation instead of the raw string. Much better for debugging (antirez)
-2010-04-26 Initial support for quoted strings in redis-cli (antirez)
-2010-04-23 SETEX implemented (antirez)
-2010-04-23 Pub/Sub API change: now messages received via pattern matching have a different message type and an additional field representing the original pattern the message matched (antirez)
-2010-04-22 typo fixed, reloaded (antirez)
-2010-04-22 typo fixed (antirez)
-2010-04-22 REDIS-CLUSTER doc updated (antirez)
-2010-04-22 Virtual memory design document removed, no longer needed as we have a full specification and implementation (antirez)
-2010-04-22 new units for bytes specification (antirez)
-2010-04-22 Now in redis.conf it is possible to specify units where appropriate instead of amounts of bytes, like 2Gi or 4M and so forth (antirez)
-2010-04-21 binary safe keys ready implementation of RANDOMKEYS (antirez)
-2010-04-21 Now that's the right 1.3.10 (antirez)
-2010-04-21 Revert "fsync always now uses O_DIRECT on Linux" (antirez)
-2010-04-21 Revert "define __USE_GNU to get O_DIRECT" (antirez)
-2010-04-21 Merge branch 'master' of github.com:antirez/redis (antirez)
-2010-04-21 Revert "version 1.3.10" (antirez)
-2010-04-21 version 1.3.10 (antirez)
-2010-04-20 define __USE_GNU to get O_DIRECT (antirez)
-2010-04-20 fsync always now uses O_DIRECT on Linux (antirez)
-2010-04-20 More precise memory used guesswork in zmalloc.c (antirez)
-2010-04-19 Fix for MULTI/EXEC and Replication/AOF: now the block is correctly sent as MULTI/..writing operations../EXEC. Ok for slaves but more work needed for the AOF as it should be a write-all-or-nothing business (antirez)
-2010-04-19 running the test using tcl8.5 directly instead of tclsh that too often it's a symlink to 8.4 (antirez)
-2010-04-19 Added package require Tcl 8.5 in redis.tcl so it will show a clear error when the test suit is attempted to run under 8.4 (antirez)
-2010-04-18 Fix for a SORT bug introduced with commit 16fa22f1, regression test added (antirez)
-2010-04-18 Guru mediation -> meditation (antirez)
-2010-04-16 check eptr inline (Pieter Noordhuis)
-2010-04-16 refactor code that retrieves value from object or replies to client (Pieter Noordhuis)
-2010-04-17 Merge branch 'hash' of git://github.com/pietern/redis (antirez)
-2010-04-17 redisAssert(0) => redisPanic("something meaningful") (antirez)
-2010-04-17 make sure that the resulting value in hincrby is encoded when possible (Pieter Noordhuis)
-2010-04-17 increment dirty counter after hmset (Pieter Noordhuis)
-2010-04-17 strip tryObjectEncoding from hashSet, to enable the arguments being encoded in-place (Pieter Noordhuis)
-2010-04-17 Added support for Guru Mediation, and raising a guru mediation if refCount <= 0 but decrRefCount is called against such an object (antirez)
-2010-04-16 fix small error and memory leaks in SORT (Pieter Noordhuis)
-2010-04-16 SORT/GET test added (antirez)
-2010-04-16 Added tests for GET/BY against hashes fields (antirez)
-2010-04-16 Merge branch 'hash-refactor' of git://github.com/pietern/redis (antirez)
-2010-04-16 check object type in lookupKeyByPattern (Pieter Noordhuis)
-2010-04-16 make sortCommand aware that lookupKeyByPattern always increased the refcount of the returned value (Pieter Noordhuis)
-2010-04-16 revert 0c390a to stop using tricks with o->refcount (Pieter Noordhuis)
-2010-04-16 store the hash iterator on the heap instead of the stack (Pieter Noordhuis)
-2010-04-16 drop inline directive (Pieter Noordhuis)
-2010-04-16 rename hashReplace to hashSet (Pieter Noordhuis)
-2010-04-16 added dictFetchValue() to dict.c to make hash table API a bit less verbose in the common cases (antirez)
-2010-04-03 Don't set expire to keys with ttl=0, remove them immediately. (antirez)
-2010-04-15 make sure that cmpobj is in decoded form when sorting by ALPHA (this solves edge case from previous commit where (!sortby && alpha) == 1) (Pieter Noordhuis)
-2010-04-15 enable hash dereference in SORT on BY and GET (Pieter Noordhuis)
-2010-04-15 use shared replies for hset (Pieter Noordhuis)
-2010-04-15 set refcount of string objects retrieved from zipmaps to 0, so we don't have to touch the refcount of the objects inside dicts (Pieter Noordhuis)
-2010-04-15 added HSETNX (Pieter Noordhuis)
-2010-04-14 refactor of hash commands to use specialized api that abstracts zipmap and dict apis (Pieter Noordhuis)
-2010-04-13 move retrieval of long up to prevent an empty hash from being created (Pieter Noordhuis)
-2010-04-15 more advanced leaks detection in test redis (antirez)
-2010-04-15 ability to select port/host from make test (antirez)
-2010-04-15 Active rehashing (antirez)
-2010-04-15 Incrementally rehahsing hash table! Thanks to Derek Collison and Pieter Noordhuis for feedbacks/help (antirez)
-2010-04-14 Does not allow commands other than Pub/Sub commands when there is at least one pattern (antirez)
-2010-04-13 Fixed a tiny memory leak when loading the configuration file. (Alex McHale)
-2010-04-13 Merge branch 'hmget' of git://github.com/pietern/redis (antirez)
-2010-03-29 Validate numeric inputs. (Alex McHale)
-2010-03-24 Remove trailing whitespace. (Alex McHale)
-2010-04-12 Now all the commands returning a multi bulk reply against non existing keys will return an empty multi bulk, not a nil one (antirez)
-2010-04-12 implemented HMGET (Pieter Noordhuis)
-2010-04-12 implemented HMSET (Pieter Noordhuis)
-2010-04-12 Sharing of small integer objects: may save a lot of memory with datasets having many of this (antirez)
-2010-04-10 dict.c fixed to play well with enabling/disabling of the hash table (antirez)
-2010-04-09 removed a no longer true assert in the VM code (antirez)
-2010-04-09 shareobjects feautres killed - no gains most of the time, but VM complexities (antirez)
-2010-04-09 use directly the real key object in VM I/O jobs to match by pointer, and to handle different keys with the same name living in different DBs, but being at the same moment in the IO job queues (antirez)
-2010-04-08 last change reverted as it was unstable... more testing needed (antirez)
-2010-04-08 Prevent hash table resize while there are active child processes in order to play well with copy on write (antirez)
-2010-04-08 Merge branch 'issue_218' of git://github.com/pietern/redis (antirez)
-2010-04-08 -1 not needed... (antirez)
-2010-04-08 Skiplist theoretical fix (antirez)
-2010-04-07 Now when a child is terminated by a signal, the signal number is logged as well (antirez)
-2010-04-07 First version of evented Redis Tcl client, that will be used for BLPOP and Pub/Sub tests (antirez)
-2010-04-05 use long long reply type for HINCRBY (Pieter Noordhuis)
-2010-04-05 last argument is never encoded for HINCRBY (Pieter Noordhuis)
-2010-04-02 Now PUBLISH commands are replicated to slaves (antirez)
-2010-04-01 use the right object when cleaning up after zunion/zinter (fixes issue 216) (Pieter Noordhuis)
-2010-04-01 Merge branch 'zipmap' of git://github.com/pietern/redis (antirez)
-2010-04-01 reduce code complexity because zipmapLen now is O(1) (Pieter Noordhuis)
-2010-04-01 update the zipmap entry in-place instead of appending it (Pieter Noordhuis)
-2010-04-01 updated zipmap documentation to match the implementation (Pieter Noordhuis)
-2010-04-01 allow 4 free trailing bytes for each value (Pieter Noordhuis)
-2010-04-01 Pub/Sub pattern matching capabilities (antirez)
-2010-04-01 use function to determine length of a single entry (Pieter Noordhuis)
-2010-03-31 Deny EXEC under out of memory (antirez)
-2010-03-29 No timeouts nor other commands for clients in a Pub/Sub context (antirez)
-2010-03-29 free hash table entries about no longer active classes, so that PUBSUB can be abused with millions of different classes (antirez)
-2010-03-29 Fixed a refcount stuff leading to PUBSUB crashes (antirez)
-2010-03-29 fmacros added to linenoise, avoiding all the nice warnings... (antirez)
-2010-03-29 First pubsub fix (antirez)
-2010-03-29 PUBSUB implemented (antirez)
-2010-03-29 Redis version is now 1.3.8 (antirez)
-2010-03-28 removed references in code to ZIPMAP_EMPTY (Pieter Noordhuis)
-2010-03-28 use first byte of zipmap to store length (Pieter Noordhuis)
-2010-03-28 implemented strategy that doesn't use free blocks in zipmaps (Pieter Noordhuis)
-2010-03-26 Merge branch 'hincrby' of git://github.com/pietern/redis (antirez)
-2010-03-26 removed unnecessary refcount increase that caused the HINCRBY memleak (Pieter Noordhuis)
-2010-03-26 implements HINCRBY and tests (todo: find and fix small memleak) (Pieter Noordhuis)
-2010-03-26 Removed a useless if spotted by Pieter Noordhuis (antirez)
-2010-03-26 Fixed a critical replication bug: binary values issued with the multi bulk protocol caused a protocol desync with slaves. (antirez)
-2010-03-24 Fixed the reply about denied  write commands under maxmemory reached condition: now the error will no longer lead to a client-server protocol desync (antirez)
-2010-03-24 CONFIG command implemened -- just a start but already useful (antirez)
-2010-03-24 redis-cli prompt is now redis> (antirez)
-2010-03-23 with --help states that you can use - as config file name to feed config via stdin (antirez)
-2010-03-23 New INFO field: expired_keys (antirez)
-2010-03-23 the Cron timer function is now called 10 times per second instead of 1 time per second to make Redis more responsibe to BGSAVE and to delete expired keys more incrementally (antirez)
-2010-03-23 Use linenoise for line editing on redis-cli. (Michel Martens)
-2010-03-23 Fix authentication for redis-cli on non-interactive mode. (Michel Martens)
-2010-03-23 key deletion on empty value fix + some refactoring (antirez)
-2010-03-23 Empty value trigger key removal in all the operations (antirez)
-2010-03-22 Merged gnrfan patches fixing issues 191, 193, 194 (antirez)
-2010-03-22 Merge branch 'issue_193' of git://github.com/gnrfan/redis (antirez)
-2010-03-22 Merge branch 'issue_191' of git://github.com/gnrfan/redis (antirez)
-2010-03-22 Redis master version is now 1.3.7 (antirez)
-2010-03-19 support for include directive in config parser (Jeremy Zawodny)
-2010-03-19 Removed a stupid overriding of config values due to a wrong cut&paste (antirez)
-2010-03-19 VM hash type swappability implemented. Handling of failed pthread_create() call. (antirez)
-2010-03-19 Solving issue #191 on Google Code: -v and --version should print the version of Redis (Antonio Ognio)
-2010-03-19 Solves issue #194 on Google Code: --help parameter to redis-srver prints the usage message (Antonio Ognio)
-2010-03-19 Fixing issue 193 (Antonio Ognio)
-2010-03-18 increment server.dirty on HDEL (antirez)
-2010-03-18 Redis 1.3.6 (antirez)
-2010-03-18 test-redis.tcl dataset digest function Hash support (antirez)
-2010-03-18 zipmap fix for large values (antirez)
-2010-03-18 Optimization fixed and re-activated (antirez)
-2010-03-18 reverted an optimization that makes Redis not stable (antirez)
-2010-03-18 Fixed redis-cli auth code (antirez)
-2010-03-17 HDEL fix, an optimization for comparison of objects in hash table lookups when they are integer encoding (antirez)
-2010-03-17 Version is now 1.3.5 (antirez)
-2010-03-17 Merged Pietern patch for VM key args helper function. Fixed an obvious bug in the redis-cli passwd auth stuff (antirez)
-2010-03-17 Merge branch 'aggregates' of git://github.com/pietern/redis (antirez)
-2010-03-17 Added Authentication to redis-cli.c using -a switch Update usage fixed Makefile to delete redis-check-dump during make clean (root)
-2010-03-17 HEXISTS and tests implemented (antirez)
-2010-03-17 More hash tests (antirez)
-2010-03-17 better HSET test (antirez)
-2010-03-17 Fixed a bug in HSET, a memory leak, and a theoretical bug in dict.c (antirez)
-2010-03-17 More Hash tests (antirez)
-2010-03-13 added preloading keys from VM when using ZINTER or ZUNION (Pieter Noordhuis)
-2010-03-13 added explicit AGGREGATE [SUM|MIN|MAX] option to ZUNION/ZINTER (Pieter Noordhuis)
-2010-03-16 HGET fix for integer encoded field against zipmap encoded hash (antirez)
-2010-03-16 zrevrank support in redis-cli (antirez)
-2010-03-16 HKEYS / HVALS / HGETALL (antirez)
-2010-03-16 Solved a memory leak with Hashes (antirez)
-2010-03-15 pretty big refactoring (antirez)
-2010-03-15 An interesting refactoring + more expressive internal API (antirez)
-2010-03-15 Fixed the same problem in ZREVRANK (antirez)
-2010-03-15 Fixed a ZRANK bug (antirez)
-2010-03-15 zipmap to hash conversion in HSET (antirez)
-2010-03-14 max zipmap entries and max zipmap value parameters added into INFO output (antirez)
-2010-03-14 HDEL and some improvement in DEBUG OBJECT command (antirez)
-2010-03-14 Append only file support for hashes (antirez)
-2010-03-13 utility to check rdb files for unprocessable opcodes (Pieter Noordhuis)
-2010-03-12 A minor fix and a few debug messages removed (antirez)
-2010-03-12 Applied the replication bug patch provided by Jeremy Zawodny, removing temp file collision after the slave got the dump.rdb file in the SYNC stage (antirez)
-2010-03-11 Fix for HGET against non Hash type, debug messages used to understand a bit better a corrupted rdb file (antirez)
-2010-03-09 fix: use zmalloc instead of malloc (Pieter Noordhuis)
-2010-03-09 Merged zsetops branch from Pietern (antirez)
-2010-03-09 Merged ZREMBYRANK from Pietern (antirez)
-2010-03-09 Merged ZREVRANK from Pietern (antirez)
-2010-03-09 use a struct to store both a dict and its weight for ZUNION and ZINTER, so qsort can be applied (Pieter Noordhuis)
-2010-03-09 Hash auto conversion from zipmap to hash table, type fixed for hashes, hash loading from disk (antirez)
-2010-03-09 replaced ZMERGE by ZUNION and ZINTER. note: key preloading by the VM does not yet work (Pieter Noordhuis)
-2010-03-08 Hashes saving / fixes (antirez)
-2010-03-08 use ZMERGE as starting point (Pieter Noordhuis)
-2010-03-07 HSET fixes, now the new pointer is stored back in the object pointer field (antirez)
-2010-03-07 added ZREVRANK (Pieter Noordhuis)
-2010-03-06 Fix for replicaiton with over 2GB dump file initial SYNC stage (antirez)
-2010-03-06 first implementation of HSET/HSET. More work needed (antirez)
-2010-03-05 zipmaps functions to get, iterate, test for existence. Initial works for Hash data type (antirez)
-2010-03-04 redis-benchmark now implements Set commands benchmarks (antirez)
-2010-03-04 zipmap iteration code (antirez)
-2010-03-04 moved code to delete a single node from a zset to a separate function (Pieter Noordhuis)
-2010-03-04 rename zslDeleteRange to zslDeleteRangeByScore (to differentiate between deleting using score or rank) (Pieter Noordhuis)
-2010-03-04 use 1-based rank across zsl*Rank functions consistently (Pieter Noordhuis)
-2010-03-04 implemented ZREMBYRANK (Pieter Noordhuis)
-2010-03-04 A fix for initialization of augmented skip lists (antirez)
-2010-03-04 A fix for an invalid access when VM is disabled (antirez)
-2010-03-04 Merge branch 'zsl-get-rank' of git://github.com/pietern/redis (antirez)
-2010-03-04 redis-cli now runs in interactive mode if no command is provided (antirez)
-2010-03-04 merged memory reduction patch (Pieter Noordhuis)
-2010-03-04 Now list push commands return the length of the new list, thanks to Gustavo Picon (antirez)
-2010-03-04 first check if starting point is trivial (head or tail) before applying log(N) search (Pieter Noordhuis)
-2010-03-04 use rank to find starting point for ZRANGE and ZREVRANGE (Pieter Noordhuis)
-2010-03-04 lookup rank of a zset entry in a different function (Pieter Noordhuis)
-2010-03-04 SUBSTR fix for integer encoded vals (antirez)
-2010-03-04 fix ZRANK (realize that rank is 1-based due to the skip list header) (Pieter Noordhuis)
-2010-03-03 initial implementation of SUBSTR (antirez)
-2010-03-03 TODO updated (antirez)
-2010-03-03 fpurge call removed from redis-cli (antirez)
-2010-03-03 ZRANK stress tester (antirez)
-2010-03-03 use less memory as element->span[0] will always be 1; any level 0 skip list is essentially a linked list (Pieter Noordhuis)
-2010-03-03 rank is very unlikely to overflow integer range (Pieter Noordhuis)
-2010-03-03 x->backward never equals zsl->header (Pieter Noordhuis)
-2010-03-03 initial implementation for augmented zsets and the zrank command (Pieter Noordhuis)
-2010-03-03 zipampDel() implemented (antirez)
-2010-03-03 added quit and exit commands to redis-cli in order to quit the interactive mode (antirez)
-2010-03-03 Merge remote branch 'djanowski/interactive' (antirez)
-2010-03-02 Add support for MULTI/EXEC. (Damian Janowski & Michel Martens)
-2010-03-02 Remove trailing newline in interactive mode. (Damian Janowski & Michel Martens)
-2010-03-02 minor fix for a Linux warning (antirez)
-2010-03-02 Add interactive mode to redis-cli. (Michel Martens & Damian Janowski)
-2010-03-02 Better to increment the version minor number when a VM bug is fixed... it will be simpler to understand what's going on when users will report problems with the INFO trace. (antirez)
-2010-03-02 Fixed a subtle VM bug... was not flushing the buffer so the child process read truncated data (antirez)
-2010-03-01 KEYS now returns a multi bulk reply (antirez)
-2010-02-27 Add DISCARD command to discard queued MULTI commands. (antirez)
-2010-03-01 Swappability bug due to a typo fixed thanks to code review by Felix Geisendörfer @felixge (antirez)
-2010-02-28 minor fixes for zipmap.c (antirez)
-2010-02-27 first zipmap fix of a long sequence in the days to come ;) (antirez)
-2010-02-27 initial zipmap.c implementation (antirez)
-2010-02-27 Bug #169 fixed (BLOP/BRPOP interrupted connections are not cleared from the queue) (antirez)
-2010-02-22 Fixed 32bit make target to work on Linux out of the box (antirez)
-2010-02-19 A problem with replication with multiple slaves connectiong to a single master fixed. It was due to a typo, and reported on github by the user micmac. Also the copyright year fixed from many files. (antirez)
-2010-02-10 Saner VM defaults for redis.conf (antirez)
-2010-02-09 VM now is able to block clients on swapped keys for all the commands (antirez)
-2010-02-07 ZCOUNT and ZRANGEBYSCORE new tests (antirez)
-2010-02-07 ZRANGEBYSCORE now supports open intervals, prefixing double values with a open paren. Added ZCOUNT that can count the elements inside an interval of scores, this supports open intervals too (antirez)
-2010-02-07 WITHSCORES in ZRANGEBYSCORE thanks to Sam Hendley (antirez)
-2010-02-06 Added "withscores" option to zrangebyscore command. Based on withscores support in zrange function, ugliest part was the argument parsing to handle using it with the limit option. (Sam Hendley)
-2010-02-06 DEBUG OBJECT provide info about serialized object length even when VM is disabled (antirez)
-2010-02-06 multi bulk requests in redis-benchmark, default fsync policy changed to everysec, added a prefix character for DEBUG logs (antirez)
-2010-02-04 APPEND tests (antirez)
-2010-02-04 APPEND command (antirez)
-2010-02-02 Faster version of the function hashing possibly encoded objects, leading to a general speed gain when working with Sets of integers (antirez)
-2010-02-02 faster Set loading time from .rdb file resizing the hash table to the right size before loading elements (antirez)
-2010-02-02 Log time taken to load the DB at startup, in seconds (antirez)
-2010-01-31 Fixed VM corruption due to child fclosing the VM file directly or indirectly calling exit(), now replaced with _exit() in all the sensible places. Masked a few signals from IO threads. (antirez)
-2010-01-28 loading side of the threaded VM (antirez)
-2010-01-26 TODO cahnges (antirez)
-2010-01-23 Fixed memory human style memory reporting, removed server.usedmemory, now zmalloc_used_memory() is used always. (antirez)
-2010-01-22 VM tuning thanks to redis-stat vmstat. Now it performs much better under high load (antirez)
-2010-01-21 Changelog updated (antirez)
-2010-01-21 REDIS_MAX_COMPLETED_JOBS_PROCESSED is now in percentage, not number of jobs. Moved a debugging message a few lines forward as it was called where a few logged parameters where invalid, leading to a crash (antirez)
-2010-01-20 fixed a deadlock caused by too much finished processes in queue so that I/O clients writing to the wirte side of the pipe used to awake the main thread where blocking. Then a BGSAVE started waiting for the last active thread to finish, condition impossible because all the I/O threads where blocking on threads. Takes this as a note to myself... (antirez)
-2010-01-20 ae.c event loop does no longer support exception notifications, as they are fully pointless. Also a theoretical bug that never happens in practice fixed. (antirez)
-2010-01-19 commercial tools stuff removed from the Redis makefile. cotools are now migrated into a different repos (antirez)
-2010-01-19 removed a bug in the function to cancel an I/O job (antirez)
-2010-01-17 static symbols update (antirez)
-2010-01-16 removed support for REDIS_HELGRIND_FRIENDLY since Helgrind 3.5.0 is friendly enough even with many threads created and destroyed (antirez)
-2010-01-15 now redis-cli understands -h (antirez)
-2010-01-15 Create swap file only if not exists (antirez)
-2010-01-15 I hate warnings (antirez)
-2010-01-15 fixed a minor memory leak in configuration file parsing (antirez)
-2010-01-15 minor fix (antirez)
-2010-01-15 support for named VM swap file. Fixed a few important interaction issues between the background saving processes and IO threads (antirez)
-2010-01-15 fix for the just added new test (antirez)
-2010-01-15 useless debugging messages removed (antirez)
-2010-01-15 new test added (antirez)
-2010-01-15 thread safe zmalloc used memory counter (antirez)
-2010-01-15 A define to make Redis more helgrind friendly (antirez)
-2010-01-15 removed a few races from threaded VM (antirez)
-2010-01-14 Fixed a never experienced, theoretical bug that can actually happen in practice. Basically when a thread is working on a I/O Job we need to wait it to finish before to cancel the Job in vmCancelThreadedIOJob(), otherwise the thread may mess with an object that is being manipulated by the main thread as well. (antirez)
-2010-01-14 Set the new threads stack size to a LZF friendly amount (antirez)
-2010-01-13 access to already freed job structure fixed by statements reoredering (antirez)
-2010-01-13 removed a useless debugging message (antirez)
-2010-01-13 Wait zero active threads condition before to fork() for BGSAVE or BGREWRITEAOF (antirez)
-2010-01-13 list API is now thread safe (antirez)
-2010-01-13 minor TODO and debugging info changes (antirez)
-2010-01-12 support for blocking VM in config file (antirez)
-2010-01-12 more non blocking VM changes (antirez)
-2010-01-12 fix for test #11 (antirez)
-2010-01-12 a few more stuff in INFO about VM. Test #11 changed a bit in order to be less lame (antirez)
-2010-01-12 Added a define to configure how many completed IO jobs the handler should process at every call. (antirez)
-2010-01-11 Fixed a bug in the IO Job canceling funtion (antirez)
-2010-01-11 more steps towards a working non blocking VM (antirez)
-2010-01-11 converted random printfs in debug logs (antirez)
-2010-01-11 removed a bug introduced with non blocking VM (antirez)
-2010-01-11 a few non blocking VM bugs fixed (antirez)
-2010-01-11 More work on non-blocking VM. Should work in a few days (antirez)
-2010-01-11 More threaded I/O VM work + Redis init script (antirez)
-2010-01-10 more work on VM threaded I/O. Still nothing of usable (antirez)
-2010-01-09 non-blocking VM data structures, just a start (antirez)
-2010-01-08 used_memory_human added to INFO output. Human readable amount of memory used. (antirez)
-2010-01-07 Now DEBUG OBJECT plays well with swapped out objects (antirez)
-2010-01-07 fflush VM swap file after object swapping (antirez)
-2010-01-07 added the fmacros to enable support for fseeko() lseeko() with 64bit off_t (antirez)
-2010-01-07 VM now swaps objects out while loading datasets not fitting into vm-max-memory bytes of RAM (antirez)
-2010-01-07 added process id information in INFO (antirez)
-2010-01-06 vm-enabled set to no by default in redis.conf (antirez)
-2010-01-06 a new default redis.conf (antirez)
-2010-01-06 VM stats in INFO command (antirez)
-2010-01-06 Introduced a new log verbosity level, so now DEBUG is really for debugging. Refactored a bit maxmemory. When virtual memory is short in RAM free the objects freelist as well as swapping things out. (antirez)
-2010-01-05 fixed a bug in bgsave when VM is off but still it was testing for obj->storage field (antirez)
-2010-01-05 converted a few calls to assert() => redisAssert() to print stack trace (antirez)
-2010-01-05 BGREWRITEAOF now works with swapping on (antirez)
-2010-01-05 A first fix for SET key overwrite (antirez)
-2010-01-05 SAVE now works with VM (antirez)
-2010-01-05 swapping algorithm a bit more aggressive under low memory (antirez)
-2010-01-05 basic VM mostly working! (antirez)
-2010-01-05 New object field (one of the unused bytes) to hold the type of the swapped out value object in key objects (antirez)
-2010-01-05 VM internals bugfixes, set 1 (antirez)
-2010-01-05 load key from swap on key lookup (antirez)
-2010-01-05 more object-level VM primitives (antirez)
-2010-01-05 Redis objects swapping / loading (antirez)
-2010-01-05 rdbLoadObject() as a separated function to load objects from disk. Dropped support for RDB version 0, I guess no longer has this legacy DBs around (antirez)
-2010-01-04 VM low level pages handling (antirez)
-2010-01-04 vm swap file creation, and some basic configuration (antirez)
-2010-01-04 version marked 1.3.2 (antirez)
-2010-01-04 saving code refactored a bit, added a function returning the number of bytes an object will use on disk (antirez)
-2010-01-02 Now the PUSH side of RPOPLPUSH is able to unblock clients blocked on BLPOP (antirez)
-2010-01-02 Version is now 1.3.1 (antirez)
-2010-01-02 New vararg BLPOP able to block against multiple keys (antirez)
-2009-12-29 fixed a problem with BLPOP timeout of zero, now it blocks forever (antirez)
-2009-12-29 BLPOP timeouts implemented (antirez)
-2009-12-29 first working implementation of BLPOP and BRPOP, still everything is to test well (antirez)
-2009-12-29 a few more fixes, still broken (antirez)
-2009-12-29 First fix, still broken (antirez)
-2009-12-29 minor fix for Linux 64 bit (antirez)
-2009-12-29 not yet working BLPOP implementation (antirez)
-2009-12-27 AOFSYNC removed, got a better idea... (antirez)
-2009-12-27 AOFSYNC command implemented (antirez)
-2009-12-27 Version changed to 1.3.0, welcome to the new unstable (antirez)
-2009-12-27 Now MULTI returns +OK as well (antirez)
-2009-12-27 MULTI/EXEC first implementation (antirez)
-2009-12-24 Fixed a minor bug in GETSET, now the SET part is not performed if the GET fails because the key does not contain a string value (antirez)
-2009-12-23 html doc readded (antirez)
-2009-12-23 ZRANGE WITHSCORES test added (antirez)
-2009-12-23 version is now 1.1.94 (antirez)
-2009-12-23 Add the command name in the unknown command error message. (antirez)
-2009-12-22 ZRANGE, ZREVRANGE now support WITHSCORES options (antirez)
-2009-12-22 html docs update (ZINCRBY added) (antirez)
-2009-12-18 TODO list update (antirez)
-2009-12-18 the pipelining test was ran against DB 1 for error, now it runs on DB 9 like all the other tests (antirez)
-2009-12-18 still more tests (antirez)
-2009-12-18 SORT STORE test added (antirez)
-2009-12-18 Now SORT returns an empty bulk reply if the key does not exist (antirez)
-2009-12-18 modified a bit the ZREVRANGE test to cover a few lines of code more (antirez)
-2009-12-18 SHUTDOWN now does the right thing when append only is on, that is, fsync instead to save the snapshot. (antirez)
-2009-12-18 Added a missing server.dirty increment in a non critical place, added more tests (antirez)
-2009-12-18 LTRIM stress testing test added (antirez)
-2009-12-18 LTRIM now returns +OK against non existing keys. More tests in test-redis.tcl (antirez)
-2009-12-18 added sdstoupper() declaration in sds.h (antirez)
-2009-12-18 Fixed sds.c bug #124 (antirez)
-2009-12-16 LZF compression re-enabled by default, but with INIT_HTAB set to 0 to avoid the very costly memset initialization. Note that with this option set valgrind will output some false positive about lzf_c.c (antirez)
-2009-12-16 lzf compression switched off by default now, with config file option to enable it in redis.conf (antirez)
-2009-12-16 Regression for epoll bug in redis-test.tcl, version is now 1.1.93 (antirez)
-2009-12-16 Fixed a lame epoll issue (antirez)
-2009-12-15 html doc updated (antirez)
-2009-12-15 version is now 1.1.92 (antirez)
-2009-12-15 Two important fixes to append only file: zero length values and expires. A pretty neat new test to check consistency of randomly build datasets against snapshotting and AOF. (antirez)
-2009-12-15 debug loadaof implemented in order to add more consistency tests in test-redis.tcl (antirez)
-2009-12-15 Added a new test able to stress a lot the snapshotting engine (antirez)
-2009-12-15 Unified handling of empty queries with normal queries. (antirez)
-2009-12-15 Fixed some subtle bug in the command processing code almost impossible to spot in the real world, thanks to gcov (antirez)
-2009-12-15 Regression test for SINTERSTORE added (antirez)
-2009-12-15 Fixed issue #121 (antirez)
-2009-12-14 a few more tests and ability to run a specific test in test-redis.tcl (antirez)
-2009-12-13 Changed the reply of BGSAVE and BGREWRITEAOF from +OK to a more meaningful message that makes the user aware of an operation that just started and is not yet finished. (antirez)
-2009-12-13 Set the master->slave logical client as authenticated on creation, so that if the slave requires a password replication works anyway (antirez)
-2009-12-13 TODO update (antirez)
-2009-12-12 bgrewriteaof_in_progress added to INFO (antirez)
-2009-12-12 TODO list modified. What's planned for 1.4 is now written in the stone ;) (antirez)
-2009-12-12 better handling of non blocking connect on redis-benchmark: EPIPE on read does not print an error message now (antirez)
-2009-12-11 some change to redis-sha1.rb utility to make it more robust against non-meaningful changes in the dataset (antirez)
-2009-12-10 redis-sha1.rb utility updated (antirez)
-2009-12-10 a bit more verbose -ERR wrong number o arguments error, now gives info about the command name causing the error (antirez)
-2009-12-10 TODO change and minor SETNX optimization (antirez)
-2009-12-06 in rdbLoadDoubleValue now the buffer is nul terminated correctly. Thanks valgrind. (antirez)
-2009-12-06 printf format warnings fixed by casting (antirez)
-2009-12-06 Regression tests for SETNX and MSETNX bugs added (antirez)
-2009-12-06 SETNX and MSETNX now respect the delete-on-write operation of EXPIREing keys (antirez)
-2009-12-06 Fixed daemonization when using kqueue/kevent. Now the server initialization is performed *after* the daemonization (antirez)
-2009-12-05 more HTML doc changes (antirez)
-2009-12-05 HTML doc update (antirez)
-2009-12-05 a few redis-cli format specified fixed (antirez)
-2009-12-05 use __attribute__ format in sdscatprintf() when the compiler is GCC. Fixed format bugs resulting from the new warnings. (antirez)
-2009-12-01 TODO update (antirez)
-2009-12-01 compilation problem on 64bit mac os x 10.5 possibly fixed (antirez)
-2009-12-01 virtual memory design doc typos (antirez)
-2009-12-01 design documents added to the project (antirez)
-2009-11-30 Fixed issued #85 (getDecodedObject: Assertion 1 != 1 failed. While sorting a set), added a smarter assert() function to dump the stacktrace, provided a macro to initalize Redis objects on the stack to avoid this kind of bugs. (antirez)
-2009-11-30 fixed a subtle bug in redis-cli not having visible effects (antirez)
-2009-11-29 TODO updated (antirez)
-2009-11-29 Version chagned to 1.100, also known as the first first 2.0 beta version (antirez)
-2009-11-29 more tests in test-redis.tcl, some minor fix (antirez)
-2009-11-29 SORT support for sorted sets (antirez)
-2009-11-28 Implemented LIMIT option in ZRANGEBYSCORE. We now enter feature-freeze (antirez)
-2009-11-28 Changelog updated (antirez)
-2009-11-28 html doc updated (antirez)
-2009-11-28 enable kqueue/kevent only for Mac OS X 10.6.x as it seems that 10.5.x has a broken implementation of this syscalls. (antirez)
-2009-11-28 TODO updated (antirez)
-2009-11-28 ZRANGEBYSCORE fuzzy test (antirez)
-2009-11-28 ZRANGEBYSCORE memory leak fixed, ZRANGEBYSCORE initial test added (antirez)
-2009-11-28 INFO refactored. Stack trace on memory corruption now dumps the same information as the INFO command (antirez)
-2009-11-28 ifdefs added to use kevent on Free Open and Net BSD as well. INFO and ae.c modified in order to report the multiplexing API in use (antirez)
-2009-11-28 Enabled object encoding for multiple keys in MSET. Added a test for memory leaks in test-redis.tcl when running on Mac OS X (antirez)
-2009-11-28 Merge branch 'kqueue' of git://github.com/mallipeddi/redis (antirez)
-2009-11-28 Changes to TODO list, commented a function in redis.c (antirez)
-2009-11-28 Added support for kqueue. (Harish Mallipeddi)
-2009-11-27 TODO updated (antirez)
-2009-11-26 zero length bulk data reading fixed in loadAppendOnlyFile() (antirez)
-2009-11-26 append only file fixes (antirez)
-2009-11-26 log rebuilding, random refactoring, work in progress please wait for an OK commit before to use this version (antirez)
-2009-11-24 DEBUG RELOAD implemented, and test-redis.tcl modified to use it to check for persistence consistency. (antirez)
-2009-11-24 Redis version set to 1.07 (antirez)
-2009-11-24 sorted sets saving fixed (antirez)
-2009-11-24 minor TODO change (antirez)
-2009-11-24 minor fix to avoid a false valgrind warning. (antirez)
-2009-11-23 epoll support enabled by default for Linux builds (antirez)
-2009-11-23 epoll module for ae.c implemented. Some more testing needed (antirez)
-2009-11-23 commented the HAVE_EPOLL test in config.h to allow compilation under Linux now that the epoll module is still missing (antirez)
-2009-11-23 ae_select module added (antirez)
-2009-11-23 ae.c now supports multiple polling API modules, even if only ae_select.c is implemented currently. Also adding and removing an event is now O(1). (antirez)
-2009-11-23 ae.c initial refactoring for epoll implementation (antirez)
-2009-11-21 version incremented up to 1.06 (antirez)
-2009-11-21 TODO aesthetic changes (antirez)
-2009-11-21 TODO updated with plans up to 1.5 (antirez)
-2009-11-21 SRANDMEMBER test (antirez)
-2009-11-21 Fixed a SORT memory leak that should never happen in practice (antirez)
-2009-11-21 SORT GET # implemented, with a test (antirez)
-2009-11-21 EXPIREAT test (antirez)
-2009-11-20 EXPIRE tests (antirez)
-2009-11-20 more RPOPLPUSH tests (antirez)
-2009-11-20 RPOPLPUSH tests added (antirez)
-2009-11-20 ZINCRBY return value fixed (antirez)
-2009-11-20 ZINCRSCOREBY => ZINCRBY (antirez)
-2009-11-19 ZINCRSCOREBY implemented (antirez)
-2009-11-19 writev() finally uncommented again (antirez)
-2009-11-19 redis-benchmark hopefully last bug with multi bulk reply fixed (antirez)
-2009-11-19 debug mode in redis-bench (antirez)
-2009-11-19 Use writev(2) if glue output buffers is disabled (antirez)
-2009-11-19 benchmark.c fixes (antirez)
-2009-11-18 more experiments with long replies, glue output buffer, and writev. (antirez)
-2009-11-18 benchmarking with different number of LRANGE elements. Ability to change the glue output buffer limit by #define (antirez)
-2009-11-18 more writev tests/work (antirez)
-2009-11-18 redis-benchmark multi bulk reply support hopefully fixed (antirez)
-2009-11-17 support for writev implemented but currently ifdef-ed in order to understan why I can't see the improvements expected. Btw code provided by Stefano Barbato (antirez)
-2009-11-17 multi-bulk reply support for redis-bench, and as a result LRANGE is not tested, providing some number for the tuning of multi-bulk requests performances server-side (antirez)
-2009-11-12 Solaris fix thanks to Alan Harder (antirez)
-2009-11-12 Merge git://github.com/ianxm/redis (antirez)
-2009-11-12 ZSCORE fixed, now returns NULL on missing key or missing element (antirez)
-2009-11-12 Redis test will not fail the SAVE test even if a background save is in progress (antirez)
-2009-11-12 LPOPPUSH renamed into RPOPLPUSH (antirez)
-2009-11-11 can select db num (ian)
-2009-11-11 Workaround for test-redis.tcl and Tcl 8.4.x about ZSCORE test (antirez)
-2009-11-11 Removed a long time warning compiling with recent GCC on Linux (antirez)
-2009-11-11 TODO updated (antirez)
-2009-11-11 LPUSHPOP first implementation (antirez)
-2009-11-10 Tcl script, make target, and redis.c changes to build the static symbol table automagically (antirez)
-2009-11-10 Implemented a much better lazy expiring algorithm for EXPIRE (antirez)
-2009-11-10 Fixed issue 92 in redis: redis-cli (nil) return value lacks CR/LF (antirez)
-2009-11-10 Minor TODO change with new expiring algorithm description. New expiring algorithm moved since it'll go in 1.1 (antirez)
-2009-11-04 redis-test is now a better Redis citizen, testing everything against DB 9 and 10 and only if this DBs are empty. (antirez)
-2009-11-04 fixed a refcounting bug with SORT ... STORE leading to random crashes (root)
-2009-11-04 masterauth option merged, thanks to Anthony Lauzon (antirez)
-2009-11-03 ZSets double to string serialization fixed (antirez)
-2009-11-03 client-libraries directory readded (antirez)
-2009-11-03 redis.tcl put at toplevel since it's uesd for the test-redis.tcl script (antirez)
-2009-11-03 client libs removed from Redis git (antirez)
-2009-11-03 redis-cli now accepts a -r (repeat) switch. Still there is a memory leaks to fix (antirez)
-2009-11-01 TODO updated again (antirez)
-2009-11-01 TODO updated (antirez)
-2009-11-01 redis-cli now makes clear when the returned string is an integer (antirez)
-2009-11-01 SORT STORE option (antirez)
-2009-11-01 now Redis prints DB stats just after the startup without to wait a second for the first report (antirez)
-2009-11-01 another fix for append only mode, now read-only operations are not appended (antirez)
-2009-11-01 appendfsync parsing in config file fixed. If you benchmarked Redis against different appendfsync options is time to try again ;) (antirez)
-2009-11-01 append only file loading fixed (antirez)
-2009-11-01 first version of append only file loading -- STILL BROKEN don't use it (antirez)
-2009-10-31 Fixed Issue 83:Using TYPE on a zset results in a malformed response from the Redis server (antirez)
-2009-10-31 Fixed compilation on Linux (antirez)
-2009-10-30 append only mode is now able to translate EXPIRE into EXPIREAT transparently (antirez)
-2009-10-30 appendfsync is now set to NO by default (antirez)
-2009-10-30 support for appendonly mode no, always, everysec (antirez)
-2009-10-30 first fix for append only mode (antirez)
-2009-10-30 Initial implementation of append-only mode. Loading still not implemented. (antirez)
-2009-10-30 EXPIRE behaviour changed a bit, a negative TTL or an EXPIREAT with unix time in the past will now delete the key. It seems saner to me than doing nothing. (antirez)
-2009-10-30 EXPIREAT implemented, will be useful for the append-only mode (antirez)
-2009-10-29 Fixed Issue 74 (ERR just returned on invalid password), now the error message is -ERR invalid password. (antirez)
-2009-10-29 Fixed issue 72 (SLAVEOF shutdowns redis-server on malformed reply) (antirez)
-2009-10-29 Fixed issue 77 (Incorrect time in log files) thanks to youwantalex (antirez)
-2009-10-29 Fixed  Issue 76 (redis-server crashes when it can't connect to MASTER and client connects to SLAVE) (antirez)
-2009-10-29 ZREMRANGEBYSCORE implemented. Remove a range of elements with score between min and max (antirez)
-2009-10-28 TODO changes and mostly theoretical minor skiplist change (antirez)
-2009-10-28 ZLEN renamed ZCARD for consistency with SCARD (antirez)
-2009-10-27 TODO reworked to reflect the real roadmap (antirez)
-2009-10-27 Fix for 'make 32bit' (antirez)
-2009-10-27 a fix for the solaris fix itself ;) (antirez)
-2009-10-27 More Solaris fixes (antirez)
-2009-10-27 A lot of ZSETs tests implemented, and a bug fixed thanks to this new tests (antirez)
-2009-10-27 zmalloc Solaris fixes thanks to Alan Harder (antirez)
-2009-10-27 ZSCORE implemented (antirez)
-2009-10-26 fix for ZRANGEBYSCORE (antirez)
-2009-10-26 ZRANGEBYSCORE implemented. Redis got range queries! (antirez)
-2009-10-26 A trivial change makes the new implementation O(log(N)) instead of O(log(N))+O(M) when there are M repeated scores! (antirez)
-2009-10-26 ZSET now saved on disk like any other type (antirez)
-2009-10-26 double serialization routines implemented (antirez)
-2009-10-26 ZSETs random fixes. Now the implementation appears to be pretty stable (antirez)
-2009-10-26 another leak fixed. Can't find more for now, but still a bug in ZSETs to fix (antirez)
-2009-10-26 ZSETs memory leak #1 solved, another one missing (antirez)
-2009-10-26 Fix for skiplists backward link (antirez)
-2009-10-26 Merged Solaris patches provided by Alan Harder (antirez)
-2009-10-26 backward support to skiplists for ZREVRANGE, still broken, committing since I've to merge the Solaris patches (antirez)
-2009-10-26 TODO updated (antirez)
-2009-10-26 ZREM implemented (antirez)
-2009-10-24 fix for ZADD in score update mode (antirez)
-2009-10-24 some work on ZADD against existing element (score update), still broken... (antirez)
-2009-10-23 zrange now starts to work. zadd still does not support update and will crash or leak or b000mmmmm (antirez)
-2009-10-23 zrange initial hack (not working for now) (antirez)
-2009-10-23 first skiplist fix, courtesy of valgrind (antirez)
-2009-10-23 zset symbols added to stack trace code. ZSets will simply crash at the moment (antirez)
-2009-10-23 more work on ZSETs and a new make target called 32bit to build i386 binaries on mac os x leopard (antirez)
-2009-10-23 initial skiplist implementation. Most memory checks removed and zmalloc() modified to fail with an error message and abort. Anyway Redis is not designed to recover from out of memory conditions. (antirez)
-2009-10-23 Fixed compilation in mac os x snow leopard when compiling a 32 bit binary. (antirez)
-2009-10-22 version incremented to 1.050 to distinguish from 1.001 stable and next stable versions with minor fixes (antirez)
-2009-10-21 TODO updated (antirez)
-2009-10-21 SRANDMEMBER added (antirez)
-2009-10-20 Imporant bug leading to data corruption fixed (NOT affecting stable distribution), Tcl client lib MSET/MSETNX implementation fixed, Added new tests for MSET and MSETNX in test-redis.tcl (antirez)
-2009-10-17 added multi-bulk protocol support to redis-cli and support for MSET and MSETNX (antirez)
-2009-10-17 MSET fixed, was not able to replace keys already set for a stupid bug (antirez)
-2009-10-16 some dead code removed (antirez)
-2009-10-16 multi bulk input protocol fixed (antirez)
-2009-10-16 MSET and MSETNX commands implemented (antirez)
-2009-10-07 undoed all the sds hacking that lead just to random bugs and no memory saving ;) (antirez)
-2009-10-07 initial multi-bulk query protocol, this will allow MSET and other interesting features. (antirez)
-2009-10-03 benchmark now outputs the right command line to shorten the TIME_WAIT interval on Mac OS X when keep alive is set (antirez)
-2009-10-02 Issue 69 fixed. Object integer encoding now works with replication and MONITORing again. (antirez)
-2009-09-18 LREM fixed, used to crash since the new object integer encoding is on the stage (antirez)
-2009-09-17 maxmemory didn't worked in 64 systems for values > 4GB since it used to be an unsigned int. Fixed (antirez)
-2009-09-10 incremented version number to 1.001, AKA Redis edge is no longer stable... (antirez)
-2009-09-10 in-memory specialized object encoding (for now 32 signed integers only) (antirez)
-2009-09-03 Latest doc changes for 1.0 (antirez)
-2009-09-03 Redis 1.0.0 release (antirez)
-2009-09-02 Redis version pushed to 1.0 (antirez)
-2009-09-02 Ruby client lib updated to the latest git version (antirez)
-2009-09-02 update-scala-client script added (antirez)
-2009-09-02 Scala client added thanks to Alejanro Crosa (antirez)
-2009-09-02 QuickStart added (antirez)
-2009-09-01 Fixed crash with only space and newline as command (issue 61), thanks to a guy having as nick "fixxxerrr" (antirez)
-2009-08-11 TODO list modified (antirez)
-2009-07-24 more snow leopard related fixes (for 32bit systems) (antirez)
-2009-07-24 fixed compilation with Snow Leopard, thanks to Lon Baker for providing SSH access to Snow Leopard box (antirez)
-2009-07-22 Fixed NetBSD compile problems (antirez)
-2009-07-17 now the size of the shared pool can be really modified via config, also the number of objects in the sharing pool is logged when the log level is set to debug. Thanks to Aman Gupta (antirez)
-2009-07-05 added utils/redis-copy.rb, a script that is able to copy data from one Redis server to another one on the fly. (antirez)
-2009-07-04 Applied three different patches thanks to Chris Lamb, one to fix compilation and get the IP register value on Linux IA64 and other systems. One in order to log the overcommit problem on the logs instead of the standard output when Redis is demonized. The latest in order to suggest a more consistent way in order to switch to 1 the memory overcommit Linux feature. (antirez)
-2009-07-03 bugfix: EXPIRE now propagates to the Slave. (antirez)
-2009-06-16 Redis version modified to 0.900 (antirez)
-2009-06-16 update-ruby-client script already points to ezmobius repo (antirez)
-2009-06-16 client libraries updated (antirez)
-2009-06-16 Redis release candidate 1 (antirez)
-2009-06-16 Better handling of background saving process killed or crashed (antirez)
-2009-06-14 number of keys info in INFO command thanks to Diego Rosario Brogna (antirez)
-2009-06-14 SPOP documented (antirez)
-2009-06-14 Clojure library thanks to Ragnar Dahlén (antirez)
-2009-06-10 It is now possible to specify - as config file name to read it from stdin (antirez)
-2009-06-10 sync with jodosha redis-rb (antirez)
-2009-06-10 Redis-rb sync (antirez)
-2009-06-10 max inline request raised again to 1024*1024*256 bytes (antirez)
-2009-06-10 max bytes in an inline command raised to 1024*1024 bytes, in order to allow for very large MGETs and still protect from client crashes (antirez)
-2009-06-08 SPOP implemented. Hash table resizing for Sets and Expires too. Changed the resize policy to play better with RANDOMKEY and SPOP. (antirez)
-2009-06-07 some minor changes to the backtrace code (antirez)
-2009-06-07 enable backtrace capabilities only for Linux and MacOSX (antirez)
-2009-06-07 Dump a backtrace on sigsegv/sigbus, original coded thanks to Diego Rosario Brogna, modified in order to work on different OSes and to enhance reliability (antirez)
-2009-06-06 Merge git://github.com/dierbro/redis (antirez)
-2009-06-06 add more output (hrothgar)
-2009-06-06 store static function pointer for a useful stack trace (hrothgar)
-2009-06-06 TODO updated (antirez)
-2009-06-06 Makefile dependencies updated (antirez)
-2009-06-05 Avoid a busy loop while sending very large replies against very fast links, this allows to be more responsive with other clients even under a KEY * against the loopback interface (antirez)
-2009-06-05 Kill the background saving process before performing SHUTDOWN to avoid races (antirez)
-2009-06-05 LREM now returns :0 for non existing keys (antirez)
-2009-06-05 - put some order in code - better output (hrothgar)
-2009-06-05 added config.h for #ifdef business isolation, added fstat64 for Mac OS X (antirez)
-2009-06-04 remove die() :-) (hrothgar)
-2009-06-04 add compile options to debug (hrothgar)
-2009-06-04 initial commit print stack trace (hrothgar)
-2009-06-04 initial commit print stack trace (hrothgar)
-2009-06-04 macosx specific zmalloc.c, uses malloc_size function in order to avoid to waste memory and time to put an additional header (antirez)
-2009-06-04 DEBUG OBJECT implemented (antirez)
-2009-06-04 backtrace support removed: unreliable stack trace :( (antirez)
-2009-06-04 initial backtrace dumping on sigsegv/sigbus + debug command (antirez)
-2009-06-03 Python lib updated (antirez)
-2009-06-03 shareobjectspoolsize implemented in reds.conf, in order to control the pool size when object sharing is on (antirez)
-2009-05-30 Erlang client updated (antirez)
-2009-05-30 Python client library updated (antirez)
-2009-05-29 Redis-rb minor bool convertion fix (antirez)
-2009-05-29 ruby library client is not Redis-rb merged with RubyRedis "engine" by Brian McKinney (antirez)
-2009-05-28 __P completely removed from pqsort.c/h (antirez)
-2009-05-28 another minor fix for Solaris boxes (antirez)
-2009-05-28 minor fix for Solaris boxes (antirez)
-2009-05-28 minor fix for Solaris boxes (antirez)
-2009-05-27 maxmemory implemented (antirez)
-2009-05-26 Redis git version modified to 0.101 in order to distinguish that from the latest tar.gz via INFO ;) (antirez)
-2009-05-26 Redis 0.100 released (antirez)
-2009-05-26 client libraries synched in git (antirez)
-2009-05-26 ignore gcc warning about write() return code not checked. It is esplicitily this way since the "max number of clients reached" is a best-effort error (antirez)
-2009-05-26 max bytes of a received command enlarged from 1k to 16k (antirez)
-2009-05-26 RubyRedis: set TCP_NODELAY TCP socket option to to disable the neagle algorithm. Makes a huge difference under some OS, notably Linux (antirez)
-2009-05-25 maxclients implemented, see redis.conf for details (antirez)
-2009-05-25 INFO command now reports replication info (antirez)
-2009-05-25 minor fix to RubyRedis about bulk commands sent without arguments (antirez)
-2009-05-24 Warns if using the default config (antirez)
-2009-05-24 Issue with redis-client used in scripts solved, now to check if the latest argument must come from standard input we do not check that stdin is or not a tty but the command arity (antirez)
-2009-05-23 RubyRedis: now sets are returned as arrays again, and not as Set objects (antirez)
-2009-05-23 SLAVEOF command documented (antirez)
-2009-05-23 SLAVEOF command implemented for replication remote control (antirez)
-2009-05-22 Fix: no connection timeout for the master! (antirez)
-2009-05-22 replication slave timeout when receiving the initial bulk data set to 3600 seconds, now that replication is non-blocking the server must save the db before to start the async replication and this can take a lot of time with huge datasets (antirez)
-2009-05-22 README tutorial now reflects the new proto (antirez)
-2009-05-22 critical bug about glueoutputbuffers=yes fixed. Under load and with pipelining and clients disconnecting on the middle of the chat with the server, Redis could block. Now it's ok (antirez)
-2009-05-22 TTL command doc added (antirez)
-2009-05-22 TTL command implemented (antirez)
-2009-05-22 S*STORE now return the cardinality of the resulting set (antirez)
-2009-05-22 rubyredis more compatible with Redis-rb (antirez)
-2009-05-21 minor indentation fix (antirez)
-2009-05-21 timeout support and Redis-rb compatibility aliases implemented in RubyRedis (antirez)
-2009-05-21 RubyRedis info postprocessor rewritten in a more functional way (antirez)
-2009-05-21 dead code removed from RubyRedis (antirez)
-2009-05-21 command postprocessing implemented into RubyRedis (antirez)
-2009-05-20 Automagically reconnection of RubyRedis (antirez)
-2009-05-20 RubyRedis: Array alike operators implemented (antirez)
-2009-05-20 random testing code removed (antirez)
-2009-05-20 RubyRedis DB selection forced at object creation (antirez)
-2009-05-20 Initial version of an alternative Ruby client added (antirez)
-2009-05-20 SDIFF / SDIFFSTORE added to doc (antirez)
-2009-05-20 Aman Gupta changes merged (antirez)
-2009-05-20 Merge git://github.com/tmm1/redis (antirez)
-2009-05-19 Allow timeout=0 config to disable client timeouts (Aman Gupta)
-2009-05-19 Partial qsort implemented in SORT command, only when both BY and LIMIT is used. minor fix for a warning compiling under Linux. (antirez)
-2009-05-19 psort.c/h added. This is a partial qsort implementation that Redis will use when SORT+LIMIT is requested (antirez)
-2009-05-17 Fix SINTER/UNIONSTORE to allow for &=/|= style operations (i.e. SINTERSTORE set1 set1 set2) (Aman Gupta)
-2009-05-17 Optimize SDIFF to return as soon as the result set is empty (Aman Gupta)
-2009-05-17 SDIFF/SDIFFSTORE implemnted unifying it with the implementation of SUNION/SUNIONSTORE (antirez)
-2009-05-11 timestamp in log lines (antirez)
-2009-05-11 Python client updated pushing from Ludo's repository (antirez)
-2009-05-11 disconnect when we cannot read from the socket (Ludovico Magnocavallo)
-2009-05-11 benchmark utility now supports random keys (antirez)
-2009-05-10 minor doc changes (antirez)
-2009-05-09 added tests for vararg DEL (antirez)
-2009-05-09 DEL is now a vararg, IMPORTANT: memory leak fixed in loading DB code (antirez)
-2009-05-09 doc changes (antirez)
-2009-05-09 CPP client added thanks to Brian Hammond (antirez)
-2009-05-06 Infinite number of arguments for MGET and all the other commands (antirez)
-2009-05-04 Warns if /proc/sys/vm/overcommit_memory is set to 0 on Linux. Also make sure to don't resize the hash tables while the child process is saving in order to avoid copy-on-write of memory pages (antirez)
-2009-04-30 zmalloc fix, return NULL or real malloc failure (antirez)
-2009-04-30 more fixes for dict.c and the 150 million keys limit (antirez)
-2009-04-30 dict.c modified to be able to handle more than 150,000,000 keys (antirez)
-2009-04-29 fuzz stresser implemented in redis-test (antirez)
-2009-04-29 fixed for HT resize check 32bits overflow (antirez)
-2009-04-29 Check for fork() failure in background saving (antirez)
-2009-04-29 fix for the LZF off-by-one bug added (antirez)
-2009-04-28 print bytes used at exit on SHUTDOWN (antirez)
-2009-04-28 SMOVE test added (antirez)
-2009-04-28 SMOVE command implemented (antirez)
-2009-04-28 less CPU usage in command parsing, case insensitive config directives (antirez)
-2009-04-28 GETSET command doc added (antirez)
-2009-04-28 GETSET tests (antirez)
-2009-04-28 GETSET implemented (antirez)
-2009-04-27 ability to specify a different file name for the DB (antirez)
-2009-04-27 log file parsing code improved a bit (antirez)
-2009-04-27 bgsave_in_progress field in INFO output (antirez)
-2009-04-27 INCRBY/DECRBY now support 64bit increments, with tests (antirez)
-2009-04-23 RANDOMKEY regression test added (antirez)
-2009-04-23 dictGetRandomKey bug fixed, RANDOMKEY will not block the server anymore (antirez)
-2009-04-22 FLUSHALL/FLUSHDB no longer sync on disk. Just increment the dirty counter by the number of elements removed, that will probably trigger a background saving operation (antirez)
-2009-04-21 forgot to comment testing code in PHP lib. Now it is ok (antirez)
-2009-04-21 PHP client ported to PHP5 and fixed (antirez)
-2009-04-21 doc update (antirez)
-2009-04-20 Non blocking replication (finally!). C-side linked lists API improved. (antirez)
-2009-04-19 SUNION, SUNIONSTORE, Initial work on non blocking replication (antirez)
-2009-04-10 Redis 0.091 released (antirez)
-2009-04-10 SINTER/SINTERSTORE/SLEMENTS fix: misisng keys are now not errors, but just like empty sets (antirez)
-2009-04-09 doc changes (antirez)
-2009-04-08 TODO changes, minor change to default redis.conf (antirez)
-2009-04-08 html doc updated (antirez)
-2009-04-08 library clients update scripts (antirez)
-2009-04-08 Ruby client updated (antirez)
-2009-04-08 Lua client updated (antirez)
-2009-04-08 Changelog updated (antirez)
-2009-04-08 Merge git://github.com/ludoo/redis (antirez)
-2009-04-08 add expire command to the php lib (Ludovico Magnocavallo)
-2009-04-08 fix decode bug, add flush and info commands (Ludovico Magnocavallo)
-2009-04-07 Rearrange redisObject struct to reduce memory usage in 64bit environments (as recommended http://groups.google.com/group/redis-db/msg/68f5a743f8f4e287) (Bob Potter)
-2009-04-07 ruby19 compat: use each_line on string (Bob Potter)
-2009-04-07 64bit fixes for usedmemory (Bob Potter)
-2009-04-08 RANDOMKEY issue 26 fixed, generic test + regression added (antirez)
-2009-04-06 Don't accept SAVE if BGSAVE is in progress (antirez)
-2009-04-06 add expire command to the python lib (Ludovico Magnocavallo)
-2009-04-03 persistent EXPIRE (antirez)
-2009-04-03 dirty increment was missing in two points. TODO updated (antirez)
-2009-04-02 LZF configured to initalize the HT in order to be determinsitic and play well with valgrind (antirez)
-2009-04-02 fix select test (Ludovico Magnocavallo)
-2009-04-02 fix trailing cr+nl in values (Ludovico Magnocavallo)
-2009-04-02 compression/decompression of large values on disk now working (antirez)
-2009-04-02 disable LZF compression since it's not able to load the DB for now, the load part is missing (antirez)
-2009-04-02 new LZF files added (antirez)
-2009-04-02 Fixed issue 23 about AUTH (antirez)
-2009-04-02 Issue 22 fixed (antirez)
-2009-04-01 non-lazy expired keys purging implemented (antirez)
-2009-04-01 fastlz dependence removed (antirez)
-2009-04-01 Initial implementation of EXPIRE (antirez)
-2009-03-30 TODO updated (antirez)
-2009-03-30 changelog added (antirez)
-2009-03-28 redis-sha1 utility added (antirez)
-2009-03-28 Integer encoding implemented in dump file. Doc updated (antirez)
-2009-03-27 feature macros defined to play well with C99 (antirez)
-2009-03-27 feature macros defined to play well with C99 (antirez)
-2009-03-27 now Redis is C99-ok (antirez)
-2009-03-27 IMPORTANT FIX: new dump format implementation was broken. Now it's ok but tests for the 32-bit case values are needed (antirez)
-2009-03-27 ANSI-C compatibility changes (antirez)
-2009-03-27 Ruby client library updated. Important changes in this new version! (antirez)
-2009-03-26 Lua client added thanks to Daniele Alessandri (antirez)
-2009-03-26 Lua client added thanks to Daniele Alessandri (antirez)
-2009-03-26 AUTH merged from Brian Hammond fork, reworked a bit to fix minor problems (antirez)
-2009-03-25 Adds AUTH command. (Brian Hammond)
-2009-03-25 Nasty bug of the new DB format fixed, objects sharing implemented (antirez)
-2009-03-25 doc update (antirez)
-2009-03-25 Erlang client synched with Valentiono's repo (antirez)
-2009-03-25 New file dump format, perl client library added (antirez)
-2009-03-25 New protocol fix for LREM (antirez)
-2009-03-24 two typos fixed (antirez)
-2009-03-24 Now the Redis test uses the proper Tcl client library (antirez)
-2009-03-24 Tcl client library (antirez)
-2009-03-24 redis-benchmark sync with the new protocol (antirez)
-2009-03-24 git mess :) (Ludovico Magnocavallo)
-2009-03-24 sync python client to the new protocol (Ludovico Magnocavallo)
-2009-03-24 protocol fix in SORT reply with null elements (antirez)
-2009-03-24 protocol doc changed (antirez)
-2009-03-24 Server replies now in the new format, test-redis.tcl and redis-cli modified accordingly (antirez)
-2009-03-24 Python client library updated, thanks to Ludo! (antirez)
-2009-03-24 random tested mode for test-redis.tcl, minor other stuff, version switched to 0.8 (antirez)
-2009-03-23 Now MONITOR/SYNC cannot be issued multiple times (antirez)
-2009-03-23 MONITOR command implemented. (antirez)
-2009-03-23 lucsky changes imported. pid file path can now be configured, redis-cli fixes (antirez)
-2009-03-23 Merge git://github.com/lucsky/redis (antirez)
-2009-03-23 another missing free->zfree replacement fixed. Thanks to Ludo (antirez)
-2009-03-23 Fixed redis-cli readLine loop to correctly handle EOF. (Luc Heinrich)
-2009-03-23 Display the port on server startup. (Luc Heinrich)
-2009-03-23 Allow to specify the pid file from the config file. (Luc Heinrich)
-2009-03-23 Added gitignore file. (Luc Heinrich)
-2009-03-22 MGET tests added (antirez)
-2009-03-22 doc changes (antirez)
-2009-03-22 added doc for MGET (antirez)
-2009-03-22 redis-cli now checks the arity of vararg commnads (antirez)
-2009-03-22 INFO fixed, MGET implemented, redis-cli implements INFO/MGET (antirez)
-2009-03-22 first commit (antirez)
\ No newline at end of file
diff --git a/README b/README
index 329eb1cb3..b7a12b828 100644
--- a/README
+++ b/README
@@ -130,7 +130,7 @@ it the proper way for a production system, we have a script doing this
 for Ubuntu and Debian systems:
 
     % cd utils
-    % ./install_server
+    % ./install_server.sh
 
 The script will ask you a few questions and will setup everything you need
 to run Redis properly as a background daemon that will start again on
diff --git a/deps/linenoise/linenoise.c b/deps/linenoise/linenoise.c
index 4632f7de8..aef5cdd24 100644
--- a/deps/linenoise/linenoise.c
+++ b/deps/linenoise/linenoise.c
@@ -10,8 +10,8 @@
  *
  * ------------------------------------------------------------------------
  *
- * Copyright (c) 2010, Salvatore Sanfilippo <antirez at gmail dot com>
- * Copyright (c) 2010, Pieter Noordhuis <pcnoordhuis at gmail dot com>
+ * Copyright (c) 2010-2013, Salvatore Sanfilippo <antirez at gmail dot com>
+ * Copyright (c) 2010-2013, Pieter Noordhuis <pcnoordhuis at gmail dot com>
  *
  * All rights reserved.
  * 
@@ -45,12 +45,10 @@
  * - http://www.3waylabs.com/nw/WWW/products/wizcon/vt220.html
  *
  * Todo list:
- * - Switch to gets() if $TERM is something we can't support.
  * - Filter bogus Ctrl+<char> combinations.
  * - Win32 support
  *
  * Bloat:
- * - Completion?
  * - History search like Ctrl+r in readline?
  *
  * List of escape sequences used by this program, we do everything just
@@ -72,6 +70,17 @@
  *    Sequence: ESC [ n C
  *    Effect: moves cursor forward of n chars
  *
+ * When multi line mode is enabled, we also use an additional escape
+ * sequence. However multi line editing is disabled by default.
+ *
+ * CUU (Cursor Up)
+ *    Sequence: ESC [ n A
+ *    Effect: moves cursor up of n chars.
+ *
+ * CUD (Cursor Down)
+ *    Sequence: ESC [ n B
+ *    Effect: moves cursor down of n chars.
+ *
  * The following are used to clear the screen: ESC [ H ESC [ 2 J
  * This is actually composed of two sequences:
  *
@@ -92,6 +101,7 @@
 #include <errno.h>
 #include <string.h>
 #include <stdlib.h>
+#include <ctype.h>
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <unistd.h>
@@ -99,19 +109,89 @@
 
 #define LINENOISE_DEFAULT_HISTORY_MAX_LEN 100
 #define LINENOISE_MAX_LINE 4096
-static char *unsupported_term[] = {"dumb","cons25",NULL};
+static char *unsupported_term[] = {"dumb","cons25","emacs",NULL};
 static linenoiseCompletionCallback *completionCallback = NULL;
 
-static struct termios orig_termios; /* in order to restore at exit */
-static int rawmode = 0; /* for atexit() function to check if restore is needed*/
-static int atexit_registered = 0; /* register atexit just 1 time */
+static struct termios orig_termios; /* In order to restore at exit.*/
+static int rawmode = 0; /* For atexit() function to check if restore is needed*/
+static int mlmode = 0;  /* Multi line mode. Default is single line. */
+static int atexit_registered = 0; /* Register atexit just 1 time. */
 static int history_max_len = LINENOISE_DEFAULT_HISTORY_MAX_LEN;
 static int history_len = 0;
-char **history = NULL;
+static char **history = NULL;
+
+/* The linenoiseState structure represents the state during line editing.
+ * We pass this state to functions implementing specific editing
+ * functionalities. */
+struct linenoiseState {
+    int ifd;            /* Terminal stdin file descriptor. */
+    int ofd;            /* Terminal stdout file descriptor. */
+    char *buf;          /* Edited line buffer. */
+    size_t buflen;      /* Edited line buffer size. */
+    const char *prompt; /* Prompt to display. */
+    size_t plen;        /* Prompt length. */
+    size_t pos;         /* Current cursor position. */
+    size_t oldpos;      /* Previous refresh cursor position. */
+    size_t len;         /* Current edited line length. */
+    size_t cols;        /* Number of columns in terminal. */
+    size_t maxrows;     /* Maximum num of rows used so far (multiline mode) */
+    int history_index;  /* The history index we are currently editing. */
+};
+
+enum KEY_ACTION{
+	KEY_NULL = 0,	    /* NULL */
+	CTRL_A = 1,         /* Ctrl+a */
+	CTRL_B = 2,         /* Ctrl-b */
+	CTRL_C = 3,         /* Ctrl-c */
+	CTRL_D = 4,         /* Ctrl-d */
+	CTRL_E = 5,         /* Ctrl-e */
+	CTRL_F = 6,         /* Ctrl-f */
+	CTRL_H = 8,         /* Ctrl-h */
+	TAB = 9,            /* Tab */
+	CTRL_K = 11,        /* Ctrl+k */
+	CTRL_L = 12,        /* Ctrl+l */
+	ENTER = 13,         /* Enter */
+	CTRL_N = 14,        /* Ctrl-n */
+	CTRL_P = 16,        /* Ctrl-p */
+	CTRL_T = 20,        /* Ctrl-t */
+	CTRL_U = 21,        /* Ctrl+u */
+	CTRL_W = 23,        /* Ctrl+w */
+	ESC = 27,           /* Escape */
+	BACKSPACE =  127    /* Backspace */
+};
 
 static void linenoiseAtExit(void);
 int linenoiseHistoryAdd(const char *line);
+static void refreshLine(struct linenoiseState *l);
+
+/* Debugging macro. */
+#if 0
+FILE *lndebug_fp = NULL;
+#define lndebug(...) \
+    do { \
+        if (lndebug_fp == NULL) { \
+            lndebug_fp = fopen("/tmp/lndebug.txt","a"); \
+            fprintf(lndebug_fp, \
+            "[%d %d %d] p: %d, rows: %d, rpos: %d, max: %d, oldmax: %d\n", \
+            (int)l->len,(int)l->pos,(int)l->oldpos,plen,rows,rpos, \
+            (int)l->maxrows,old_rows); \
+        } \
+        fprintf(lndebug_fp, ", " __VA_ARGS__); \
+        fflush(lndebug_fp); \
+    } while (0)
+#else
+#define lndebug(fmt, ...)
+#endif
+
+/* ======================= Low level terminal handling ====================== */
 
+/* Set if to use or not the multi line mode. */
+void linenoiseSetMultiLine(int ml) {
+    mlmode = ml;
+}
+
+/* Return true if the terminal name is in the list of terminals we know are
+ * not able to understand basic escape sequences. */
 static int isUnsupportedTerm(void) {
     char *term = getenv("TERM");
     int j;
@@ -122,16 +202,7 @@ static int isUnsupportedTerm(void) {
     return 0;
 }
 
-static void freeHistory(void) {
-    if (history) {
-        int j;
-
-        for (j = 0; j < history_len; j++)
-            free(history[j]);
-        free(history);
-    }
-}
-
+/* Raw mode: 1960 magic shit. */
 static int enableRawMode(int fd) {
     struct termios raw;
 
@@ -173,51 +244,83 @@ static void disableRawMode(int fd) {
         rawmode = 0;
 }
 
-/* At exit we'll try to fix the terminal to the initial conditions. */
-static void linenoiseAtExit(void) {
-    disableRawMode(STDIN_FILENO);
-    freeHistory();
+/* Use the ESC [6n escape sequence to query the horizontal cursor position
+ * and return it. On error -1 is returned, on success the position of the
+ * cursor. */
+static int getCursorPosition(int ifd, int ofd) {
+    char buf[32];
+    int cols, rows;
+    unsigned int i = 0;
+
+    /* Report cursor location */
+    if (write(ofd, "\x1b[6n", 4) != 4) return -1;
+
+    /* Read the response: ESC [ rows ; cols R */
+    while (i < sizeof(buf)-1) {
+        if (read(ifd,buf+i,1) != 1) break;
+        if (buf[i] == 'R') break;
+        i++;
+    }
+    buf[i] = '\0';
+
+    /* Parse it. */
+    if (buf[0] != ESC || buf[1] != '[') return -1;
+    if (sscanf(buf+2,"%d;%d",&rows,&cols) != 2) return -1;
+    return cols;
 }
 
-static int getColumns(void) {
+/* Try to get the number of columns in the current terminal, or assume 80
+ * if it fails. */
+static int getColumns(int ifd, int ofd) {
     struct winsize ws;
 
-    if (ioctl(1, TIOCGWINSZ, &ws) == -1) return 80;
-    return ws.ws_col;
-}
+    if (ioctl(1, TIOCGWINSZ, &ws) == -1 || ws.ws_col == 0) {
+        /* ioctl() failed. Try to query the terminal itself. */
+        int start, cols;
 
-static void refreshLine(int fd, const char *prompt, char *buf, size_t len, size_t pos, size_t cols) {
-    char seq[64];
-    size_t plen = strlen(prompt);
-    
-    while((plen+pos) >= cols) {
-        buf++;
-        len--;
-        pos--;
-    }
-    while (plen+len > cols) {
-        len--;
+        /* Get the initial position so we can restore it later. */
+        start = getCursorPosition(ifd,ofd);
+        if (start == -1) goto failed;
+
+        /* Go to right margin and get position. */
+        if (write(ofd,"\x1b[999C",6) != 6) goto failed;
+        cols = getCursorPosition(ifd,ofd);
+        if (cols == -1) goto failed;
+
+        /* Restore position. */
+        if (cols > start) {
+            char seq[32];
+            snprintf(seq,32,"\x1b[%dD",cols-start);
+            if (write(ofd,seq,strlen(seq)) == -1) {
+                /* Can't recover... */
+            }
+        }
+        return cols;
+    } else {
+        return ws.ws_col;
     }
 
-    /* Cursor to left edge */
-    snprintf(seq,64,"\x1b[0G");
-    if (write(fd,seq,strlen(seq)) == -1) return;
-    /* Write the prompt and the current buffer content */
-    if (write(fd,prompt,strlen(prompt)) == -1) return;
-    if (write(fd,buf,len) == -1) return;
-    /* Erase to right */
-    snprintf(seq,64,"\x1b[0K");
-    if (write(fd,seq,strlen(seq)) == -1) return;
-    /* Move cursor to original position. */
-    snprintf(seq,64,"\x1b[0G\x1b[%dC", (int)(pos+plen));
-    if (write(fd,seq,strlen(seq)) == -1) return;
+failed:
+    return 80;
 }
 
-static void beep() {
+/* Clear the screen. Used to handle ctrl+l */
+void linenoiseClearScreen(void) {
+    if (write(STDOUT_FILENO,"\x1b[H\x1b[2J",7) <= 0) {
+        /* nothing to do, just to avoid warning. */
+    }
+}
+
+/* Beep, used for completion when there is nothing to complete or when all
+ * the choices were already shown. */
+static void linenoiseBeep(void) {
     fprintf(stderr, "\x7");
     fflush(stderr);
 }
 
+/* ============================== Completion ================================ */
+
+/* Free a list of completion option populated by linenoiseAddCompletion(). */
 static void freeCompletions(linenoiseCompletions *lc) {
     size_t i;
     for (i = 0; i < lc->len; i++)
@@ -226,28 +329,39 @@ static void freeCompletions(linenoiseCompletions *lc) {
         free(lc->cvec);
 }
 
-static int completeLine(int fd, const char *prompt, char *buf, size_t buflen, size_t *len, size_t *pos, size_t cols) {
+/* This is an helper function for linenoiseEdit() and is called when the
+ * user types the <tab> key in order to complete the string currently in the
+ * input.
+ * 
+ * The state of the editing is encapsulated into the pointed linenoiseState
+ * structure as described in the structure definition. */
+static int completeLine(struct linenoiseState *ls) {
     linenoiseCompletions lc = { 0, NULL };
     int nread, nwritten;
     char c = 0;
 
-    completionCallback(buf,&lc);
+    completionCallback(ls->buf,&lc);
     if (lc.len == 0) {
-        beep();
+        linenoiseBeep();
     } else {
         size_t stop = 0, i = 0;
-        size_t clen;
 
         while(!stop) {
             /* Show completion or original buffer */
             if (i < lc.len) {
-                clen = strlen(lc.cvec[i]);
-                refreshLine(fd,prompt,lc.cvec[i],clen,clen,cols);
+                struct linenoiseState saved = *ls;
+
+                ls->len = ls->pos = strlen(lc.cvec[i]);
+                ls->buf = lc.cvec[i];
+                refreshLine(ls);
+                ls->len = saved.len;
+                ls->pos = saved.pos;
+                ls->buf = saved.buf;
             } else {
-                refreshLine(fd,prompt,buf,*len,*pos,cols);
+                refreshLine(ls);
             }
 
-            nread = read(fd,&c,1);
+            nread = read(ls->ifd,&c,1);
             if (nread <= 0) {
                 freeCompletions(&lc);
                 return -1;
@@ -256,20 +370,18 @@ static int completeLine(int fd, const char *prompt, char *buf, size_t buflen, si
             switch(c) {
                 case 9: /* tab */
                     i = (i+1) % (lc.len+1);
-                    if (i == lc.len) beep();
+                    if (i == lc.len) linenoiseBeep();
                     break;
                 case 27: /* escape */
                     /* Re-show original buffer */
-                    if (i < lc.len) {
-                        refreshLine(fd,prompt,buf,*len,*pos,cols);
-                    }
+                    if (i < lc.len) refreshLine(ls);
                     stop = 1;
                     break;
                 default:
                     /* Update buffer and return */
                     if (i < lc.len) {
-                        nwritten = snprintf(buf,buflen,"%s",lc.cvec[i]);
-                        *len = *pos = nwritten;
+                        nwritten = snprintf(ls->buf,ls->buflen,"%s",lc.cvec[i]);
+                        ls->len = ls->pos = nwritten;
                     }
                     stop = 1;
                     break;
@@ -281,214 +393,526 @@ static int completeLine(int fd, const char *prompt, char *buf, size_t buflen, si
     return c; /* Return last read character */
 }
 
-void linenoiseClearScreen(void) {
-    if (write(STDIN_FILENO,"\x1b[H\x1b[2J",7) <= 0) {
-        /* nothing to do, just to avoid warning. */
+/* Register a callback function to be called for tab-completion. */
+void linenoiseSetCompletionCallback(linenoiseCompletionCallback *fn) {
+    completionCallback = fn;
+}
+
+/* This function is used by the callback function registered by the user
+ * in order to add completion options given the input string when the
+ * user typed <tab>. See the example.c source code for a very easy to
+ * understand example. */
+void linenoiseAddCompletion(linenoiseCompletions *lc, const char *str) {
+    size_t len = strlen(str);
+    char *copy, **cvec;
+
+    copy = malloc(len+1);
+    if (copy == NULL) return;
+    memcpy(copy,str,len+1);
+    cvec = realloc(lc->cvec,sizeof(char*)*(lc->len+1));
+    if (cvec == NULL) {
+        free(copy);
+        return;
+    }
+    lc->cvec = cvec;
+    lc->cvec[lc->len++] = copy;
+}
+
+/* =========================== Line editing ================================= */
+
+/* We define a very simple "append buffer" structure, that is an heap
+ * allocated string where we can append to. This is useful in order to
+ * write all the escape sequences in a buffer and flush them to the standard
+ * output in a single call, to avoid flickering effects. */
+struct abuf {
+    char *b;
+    int len;
+};
+
+static void abInit(struct abuf *ab) {
+    ab->b = NULL;
+    ab->len = 0;
+}
+
+static void abAppend(struct abuf *ab, const char *s, int len) {
+    char *new = realloc(ab->b,ab->len+len);
+
+    if (new == NULL) return;
+    memcpy(new+ab->len,s,len);
+    ab->b = new;
+    ab->len += len;
+}
+
+static void abFree(struct abuf *ab) {
+    free(ab->b);
+}
+
+/* Single line low level line refresh.
+ *
+ * Rewrite the currently edited line accordingly to the buffer content,
+ * cursor position, and number of columns of the terminal. */
+static void refreshSingleLine(struct linenoiseState *l) {
+    char seq[64];
+    size_t plen = strlen(l->prompt);
+    int fd = l->ofd;
+    char *buf = l->buf;
+    size_t len = l->len;
+    size_t pos = l->pos;
+    struct abuf ab;
+    
+    while((plen+pos) >= l->cols) {
+        buf++;
+        len--;
+        pos--;
+    }
+    while (plen+len > l->cols) {
+        len--;
+    }
+
+    abInit(&ab);
+    /* Cursor to left edge */
+    snprintf(seq,64,"\x1b[0G");
+    abAppend(&ab,seq,strlen(seq));
+    /* Write the prompt and the current buffer content */
+    abAppend(&ab,l->prompt,strlen(l->prompt));
+    abAppend(&ab,buf,len);
+    /* Erase to right */
+    snprintf(seq,64,"\x1b[0K");
+    abAppend(&ab,seq,strlen(seq));
+    /* Move cursor to original position. */
+    snprintf(seq,64,"\x1b[0G\x1b[%dC", (int)(pos+plen));
+    abAppend(&ab,seq,strlen(seq));
+    if (write(fd,ab.b,ab.len) == -1) {} /* Can't recover from write error. */
+    abFree(&ab);
+}
+
+/* Multi line low level line refresh.
+ *
+ * Rewrite the currently edited line accordingly to the buffer content,
+ * cursor position, and number of columns of the terminal. */
+static void refreshMultiLine(struct linenoiseState *l) {
+    char seq[64];
+    int plen = strlen(l->prompt);
+    int rows = (plen+l->len+l->cols-1)/l->cols; /* rows used by current buf. */
+    int rpos = (plen+l->oldpos+l->cols)/l->cols; /* cursor relative row. */
+    int rpos2; /* rpos after refresh. */
+    int old_rows = l->maxrows;
+    int fd = l->ofd, j;
+    struct abuf ab;
+
+    /* Update maxrows if needed. */
+    if (rows > (int)l->maxrows) l->maxrows = rows;
+
+    /* First step: clear all the lines used before. To do so start by
+     * going to the last row. */
+    abInit(&ab);
+    if (old_rows-rpos > 0) {
+        lndebug("go down %d", old_rows-rpos);
+        snprintf(seq,64,"\x1b[%dB", old_rows-rpos);
+        abAppend(&ab,seq,strlen(seq));
+    }
+
+    /* Now for every row clear it, go up. */
+    for (j = 0; j < old_rows-1; j++) {
+        lndebug("clear+up");
+        snprintf(seq,64,"\x1b[0G\x1b[0K\x1b[1A");
+        abAppend(&ab,seq,strlen(seq));
+    }
+
+    /* Clean the top line. */
+    lndebug("clear");
+    snprintf(seq,64,"\x1b[0G\x1b[0K");
+    abAppend(&ab,seq,strlen(seq));
+    
+    /* Write the prompt and the current buffer content */
+    abAppend(&ab,l->prompt,strlen(l->prompt));
+    abAppend(&ab,l->buf,l->len);
+
+    /* If we are at the very end of the screen with our prompt, we need to
+     * emit a newline and move the prompt to the first column. */
+    if (l->pos &&
+        l->pos == l->len &&
+        (l->pos+plen) % l->cols == 0)
+    {
+        lndebug("<newline>");
+        abAppend(&ab,"\n",1);
+        snprintf(seq,64,"\x1b[0G");
+        abAppend(&ab,seq,strlen(seq));
+        rows++;
+        if (rows > (int)l->maxrows) l->maxrows = rows;
+    }
+
+    /* Move cursor to right position. */
+    rpos2 = (plen+l->pos+l->cols)/l->cols; /* current cursor relative row. */
+    lndebug("rpos2 %d", rpos2);
+
+    /* Go up till we reach the expected positon. */
+    if (rows-rpos2 > 0) {
+        lndebug("go-up %d", rows-rpos2);
+        snprintf(seq,64,"\x1b[%dA", rows-rpos2);
+        abAppend(&ab,seq,strlen(seq));
+    }
+
+    /* Set column. */
+    lndebug("set col %d", 1+((plen+(int)l->pos) % (int)l->cols));
+    snprintf(seq,64,"\x1b[%dG", 1+((plen+(int)l->pos) % (int)l->cols));
+    abAppend(&ab,seq,strlen(seq));
+
+    lndebug("\n");
+    l->oldpos = l->pos;
+
+    if (write(fd,ab.b,ab.len) == -1) {} /* Can't recover from write error. */
+    abFree(&ab);
+}
+
+/* Calls the two low level functions refreshSingleLine() or
+ * refreshMultiLine() according to the selected mode. */
+static void refreshLine(struct linenoiseState *l) {
+    if (mlmode)
+        refreshMultiLine(l);
+    else
+        refreshSingleLine(l);
+}
+
+/* Insert the character 'c' at cursor current position.
+ *
+ * On error writing to the terminal -1 is returned, otherwise 0. */
+int linenoiseEditInsert(struct linenoiseState *l, char c) {
+    if (l->len < l->buflen) {
+        if (l->len == l->pos) {
+            l->buf[l->pos] = c;
+            l->pos++;
+            l->len++;
+            l->buf[l->len] = '\0';
+            if ((!mlmode && l->plen+l->len < l->cols) /* || mlmode */) {
+                /* Avoid a full update of the line in the
+                 * trivial case. */
+                if (write(l->ofd,&c,1) == -1) return -1;
+            } else {
+                refreshLine(l);
+            }
+        } else {
+            memmove(l->buf+l->pos+1,l->buf+l->pos,l->len-l->pos);
+            l->buf[l->pos] = c;
+            l->len++;
+            l->pos++;
+            l->buf[l->len] = '\0';
+            refreshLine(l);
+        }
+    }
+    return 0;
+}
+
+/* Move cursor on the left. */
+void linenoiseEditMoveLeft(struct linenoiseState *l) {
+    if (l->pos > 0) {
+        l->pos--;
+        refreshLine(l);
+    }
+}
+
+/* Move cursor on the right. */
+void linenoiseEditMoveRight(struct linenoiseState *l) {
+    if (l->pos != l->len) {
+        l->pos++;
+        refreshLine(l);
+    }
+}
+
+/* Move cursor to the start of the line. */
+void linenoiseEditMoveHome(struct linenoiseState *l) {
+    if (l->pos != 0) {
+        l->pos = 0;
+        refreshLine(l);
+    }
+}
+
+/* Move cursor to the end of the line. */
+void linenoiseEditMoveEnd(struct linenoiseState *l) {
+    if (l->pos != l->len) {
+        l->pos = l->len;
+        refreshLine(l);
+    }
+}
+
+/* Substitute the currently edited line with the next or previous history
+ * entry as specified by 'dir'. */
+#define LINENOISE_HISTORY_NEXT 0
+#define LINENOISE_HISTORY_PREV 1
+void linenoiseEditHistoryNext(struct linenoiseState *l, int dir) {
+    if (history_len > 1) {
+        /* Update the current history entry before to
+         * overwrite it with the next one. */
+        free(history[history_len - 1 - l->history_index]);
+        history[history_len - 1 - l->history_index] = strdup(l->buf);
+        /* Show the new entry */
+        l->history_index += (dir == LINENOISE_HISTORY_PREV) ? 1 : -1;
+        if (l->history_index < 0) {
+            l->history_index = 0;
+            return;
+        } else if (l->history_index >= history_len) {
+            l->history_index = history_len-1;
+            return;
+        }
+        strncpy(l->buf,history[history_len - 1 - l->history_index],l->buflen);
+        l->buf[l->buflen-1] = '\0';
+        l->len = l->pos = strlen(l->buf);
+        refreshLine(l);
+    }
+}
+
+/* Delete the character at the right of the cursor without altering the cursor
+ * position. Basically this is what happens with the "Delete" keyboard key. */
+void linenoiseEditDelete(struct linenoiseState *l) {
+    if (l->len > 0 && l->pos < l->len) {
+        memmove(l->buf+l->pos,l->buf+l->pos+1,l->len-l->pos-1);
+        l->len--;
+        l->buf[l->len] = '\0';
+        refreshLine(l);
+    }
+}
+
+/* Backspace implementation. */
+void linenoiseEditBackspace(struct linenoiseState *l) {
+    if (l->pos > 0 && l->len > 0) {
+        memmove(l->buf+l->pos-1,l->buf+l->pos,l->len-l->pos);
+        l->pos--;
+        l->len--;
+        l->buf[l->len] = '\0';
+        refreshLine(l);
     }
 }
 
-static int linenoisePrompt(int fd, char *buf, size_t buflen, const char *prompt) {
-    size_t plen = strlen(prompt);
-    size_t pos = 0;
-    size_t len = 0;
-    size_t cols = getColumns();
-    int history_index = 0;
-    size_t old_pos;
+/* Delete the previosu word, maintaining the cursor at the start of the
+ * current word. */
+void linenoiseEditDeletePrevWord(struct linenoiseState *l) {
+    size_t old_pos = l->pos;
     size_t diff;
 
-    buf[0] = '\0';
-    buflen--; /* Make sure there is always space for the nulterm */
+    while (l->pos > 0 && l->buf[l->pos-1] == ' ')
+        l->pos--;
+    while (l->pos > 0 && l->buf[l->pos-1] != ' ')
+        l->pos--;
+    diff = old_pos - l->pos;
+    memmove(l->buf+l->pos,l->buf+old_pos,l->len-old_pos+1);
+    l->len -= diff;
+    refreshLine(l);
+}
+
+/* This function is the core of the line editing capability of linenoise.
+ * It expects 'fd' to be already in "raw mode" so that every key pressed
+ * will be returned ASAP to read().
+ *
+ * The resulting string is put into 'buf' when the user type enter, or
+ * when ctrl+d is typed.
+ *
+ * The function returns the length of the current buffer. */
+static int linenoiseEdit(int stdin_fd, int stdout_fd, char *buf, size_t buflen, const char *prompt)
+{
+    struct linenoiseState l;
+
+    /* Populate the linenoise state that we pass to functions implementing
+     * specific editing functionalities. */
+    l.ifd = stdin_fd;
+    l.ofd = stdout_fd;
+    l.buf = buf;
+    l.buflen = buflen;
+    l.prompt = prompt;
+    l.plen = strlen(prompt);
+    l.oldpos = l.pos = 0;
+    l.len = 0;
+    l.cols = getColumns(stdin_fd, stdout_fd);
+    l.maxrows = 0;
+    l.history_index = 0;
+
+    /* Buffer starts empty. */
+    l.buf[0] = '\0';
+    l.buflen--; /* Make sure there is always space for the nulterm */
 
     /* The latest history entry is always our current buffer, that
      * initially is just an empty string. */
     linenoiseHistoryAdd("");
     
-    if (write(fd,prompt,plen) == -1) return -1;
+    if (write(l.ofd,prompt,l.plen) == -1) return -1;
     while(1) {
         char c;
         int nread;
-        char seq[2], seq2[2];
+        char seq[3];
 
-        nread = read(fd,&c,1);
-        if (nread <= 0) return len;
+        nread = read(l.ifd,&c,1);
+        if (nread <= 0) return l.len;
 
         /* Only autocomplete when the callback is set. It returns < 0 when
          * there was an error reading from fd. Otherwise it will return the
          * character that should be handled next. */
         if (c == 9 && completionCallback != NULL) {
-            c = completeLine(fd,prompt,buf,buflen,&len,&pos,cols);
+            c = completeLine(&l);
             /* Return on errors */
-            if (c < 0) return len;
+            if (c < 0) return l.len;
             /* Read next character when 0 */
             if (c == 0) continue;
         }
 
         switch(c) {
-        case 13:    /* enter */
+        case ENTER:    /* enter */
             history_len--;
             free(history[history_len]);
-            return (int)len;
-        case 3:     /* ctrl-c */
+            return (int)l.len;
+        case CTRL_C:     /* ctrl-c */
             errno = EAGAIN;
             return -1;
-        case 127:   /* backspace */
+        case BACKSPACE:   /* backspace */
         case 8:     /* ctrl-h */
-            if (pos > 0 && len > 0) {
-                memmove(buf+pos-1,buf+pos,len-pos);
-                pos--;
-                len--;
-                buf[len] = '\0';
-                refreshLine(fd,prompt,buf,len,pos,cols);
-            }
+            linenoiseEditBackspace(&l);
             break;
-        case 4:     /* ctrl-d, remove char at right of cursor */
-            if (len > 1 && pos < (len-1)) {
-                memmove(buf+pos,buf+pos+1,len-pos);
-                len--;
-                buf[len] = '\0';
-                refreshLine(fd,prompt,buf,len,pos,cols);
-            } else if (len == 0) {
+        case CTRL_D:     /* ctrl-d, remove char at right of cursor, or of the
+                       line is empty, act as end-of-file. */
+            if (l.len > 0) {
+                linenoiseEditDelete(&l);
+            } else {
                 history_len--;
                 free(history[history_len]);
                 return -1;
             }
             break;
-        case 20:    /* ctrl-t */
-            if (pos > 0 && pos < len) {
-                int aux = buf[pos-1];
-                buf[pos-1] = buf[pos];
-                buf[pos] = aux;
-                if (pos != len-1) pos++;
-                refreshLine(fd,prompt,buf,len,pos,cols);
+        case CTRL_T:    /* ctrl-t, swaps current character with previous. */
+            if (l.pos > 0 && l.pos < l.len) {
+                int aux = buf[l.pos-1];
+                buf[l.pos-1] = buf[l.pos];
+                buf[l.pos] = aux;
+                if (l.pos != l.len-1) l.pos++;
+                refreshLine(&l);
             }
             break;
-        case 2:     /* ctrl-b */
-            goto left_arrow;
-        case 6:     /* ctrl-f */
-            goto right_arrow;
-        case 16:    /* ctrl-p */
-            seq[1] = 65;
-            goto up_down_arrow;
-        case 14:    /* ctrl-n */
-            seq[1] = 66;
-            goto up_down_arrow;
+        case CTRL_B:     /* ctrl-b */
+            linenoiseEditMoveLeft(&l);
             break;
-        case 27:    /* escape sequence */
-            if (read(fd,seq,2) == -1) break;
-            if (seq[0] == 91 && seq[1] == 68) {
-left_arrow:
-                /* left arrow */
-                if (pos > 0) {
-                    pos--;
-                    refreshLine(fd,prompt,buf,len,pos,cols);
-                }
-            } else if (seq[0] == 91 && seq[1] == 67) {
-right_arrow:
-                /* right arrow */
-                if (pos != len) {
-                    pos++;
-                    refreshLine(fd,prompt,buf,len,pos,cols);
-                }
-            } else if (seq[0] == 91 && (seq[1] == 65 || seq[1] == 66)) {
-up_down_arrow:
-                /* up and down arrow: history */
-                if (history_len > 1) {
-                    /* Update the current history entry before to
-                     * overwrite it with tne next one. */
-                    free(history[history_len-1-history_index]);
-                    history[history_len-1-history_index] = strdup(buf);
-                    /* Show the new entry */
-                    history_index += (seq[1] == 65) ? 1 : -1;
-                    if (history_index < 0) {
-                        history_index = 0;
+        case CTRL_F:     /* ctrl-f */
+            linenoiseEditMoveRight(&l);
+            break;
+        case CTRL_P:    /* ctrl-p */
+            linenoiseEditHistoryNext(&l, LINENOISE_HISTORY_PREV);
+            break;
+        case CTRL_N:    /* ctrl-n */
+            linenoiseEditHistoryNext(&l, LINENOISE_HISTORY_NEXT);
+            break;
+        case ESC:    /* escape sequence */
+            /* Read the next two bytes representing the escape sequence.
+             * Use two calls to handle slow terminals returning the two
+             * chars at different times. */
+            if (read(l.ifd,seq,1) == -1) break;
+            if (read(l.ifd,seq+1,1) == -1) break;
+
+            /* ESC [ sequences. */
+            if (seq[0] == '[') {
+                if (seq[1] >= '0' && seq[1] <= '9') {
+                    /* Extended escape, read additional byte. */
+                    if (read(l.ifd,seq+2,1) == -1) break;
+                    if (seq[2] == '~') {
+                        switch(seq[1]) {
+                        case '3': /* Delete key. */
+                            linenoiseEditDelete(&l);
+                            break;
+                        }
+                    }
+                } else {
+                    switch(seq[1]) {
+                    case 'A': /* Up */
+                        linenoiseEditHistoryNext(&l, LINENOISE_HISTORY_PREV);
+                        break;
+                    case 'B': /* Down */
+                        linenoiseEditHistoryNext(&l, LINENOISE_HISTORY_NEXT);
+                        break;
+                    case 'C': /* Right */
+                        linenoiseEditMoveRight(&l);
+                        break;
+                    case 'D': /* Left */
+                        linenoiseEditMoveLeft(&l);
+                        break;
+                    case 'H': /* Home */
+                        linenoiseEditMoveHome(&l);
                         break;
-                    } else if (history_index >= history_len) {
-                        history_index = history_len-1;
+                    case 'F': /* End*/
+                        linenoiseEditMoveEnd(&l);
                         break;
                     }
-                    strncpy(buf,history[history_len-1-history_index],buflen);
-                    buf[buflen] = '\0';
-                    len = pos = strlen(buf);
-                    refreshLine(fd,prompt,buf,len,pos,cols);
                 }
-            } else if (seq[0] == 91 && seq[1] > 48 && seq[1] < 55) {
-                /* extended escape */
-                if (read(fd,seq2,2) == -1) break;
-                if (seq[1] == 51 && seq2[0] == 126) {
-                    /* delete */
-                    if (len > 0 && pos < len) {
-                        memmove(buf+pos,buf+pos+1,len-pos-1);
-                        len--;
-                        buf[len] = '\0';
-                        refreshLine(fd,prompt,buf,len,pos,cols);
-                    }
+            }
+
+            /* ESC O sequences. */
+            else if (seq[0] == 'O') {
+                switch(seq[1]) {
+                case 'H': /* Home */
+                    linenoiseEditMoveHome(&l);
+                    break;
+                case 'F': /* End*/
+                    linenoiseEditMoveEnd(&l);
+                    break;
                 }
             }
             break;
         default:
-            if (len < buflen) {
-                if (len == pos) {
-                    buf[pos] = c;
-                    pos++;
-                    len++;
-                    buf[len] = '\0';
-                    if (plen+len < cols) {
-                        /* Avoid a full update of the line in the
-                         * trivial case. */
-                        if (write(fd,&c,1) == -1) return -1;
-                    } else {
-                        refreshLine(fd,prompt,buf,len,pos,cols);
-                    }
-                } else {
-                    memmove(buf+pos+1,buf+pos,len-pos);
-                    buf[pos] = c;
-                    len++;
-                    pos++;
-                    buf[len] = '\0';
-                    refreshLine(fd,prompt,buf,len,pos,cols);
-                }
-            }
+            if (linenoiseEditInsert(&l,c)) return -1;
             break;
-        case 21: /* Ctrl+u, delete the whole line. */
+        case CTRL_U: /* Ctrl+u, delete the whole line. */
             buf[0] = '\0';
-            pos = len = 0;
-            refreshLine(fd,prompt,buf,len,pos,cols);
+            l.pos = l.len = 0;
+            refreshLine(&l);
             break;
-        case 11: /* Ctrl+k, delete from current to end of line. */
-            buf[pos] = '\0';
-            len = pos;
-            refreshLine(fd,prompt,buf,len,pos,cols);
+        case CTRL_K: /* Ctrl+k, delete from current to end of line. */
+            buf[l.pos] = '\0';
+            l.len = l.pos;
+            refreshLine(&l);
             break;
-        case 1: /* Ctrl+a, go to the start of the line */
-            pos = 0;
-            refreshLine(fd,prompt,buf,len,pos,cols);
+        case CTRL_A: /* Ctrl+a, go to the start of the line */
+            linenoiseEditMoveHome(&l);
             break;
-        case 5: /* ctrl+e, go to the end of the line */
-            pos = len;
-            refreshLine(fd,prompt,buf,len,pos,cols);
+        case CTRL_E: /* ctrl+e, go to the end of the line */
+            linenoiseEditMoveEnd(&l);
             break;
-        case 12: /* ctrl+l, clear screen */
+        case CTRL_L: /* ctrl+l, clear screen */
             linenoiseClearScreen();
-            refreshLine(fd,prompt,buf,len,pos,cols);
+            refreshLine(&l);
             break;
-        case 23: /* ctrl+w, delete previous word */
-            old_pos = pos;
-            while (pos > 0 && buf[pos-1] == ' ')
-                pos--;
-            while (pos > 0 && buf[pos-1] != ' ')
-                pos--;
-            diff = old_pos - pos;
-            memmove(&buf[pos], &buf[old_pos], len-old_pos+1);
-            len -= diff;
-            refreshLine(fd,prompt,buf,len,pos,cols);
+        case CTRL_W: /* ctrl+w, delete previous word */
+            linenoiseEditDeletePrevWord(&l);
             break;
         }
     }
-    return len;
+    return l.len;
+}
+
+/* This special mode is used by linenoise in order to print scan codes
+ * on screen for debugging / development purposes. It is implemented
+ * by the linenoise_example program using the --keycodes option. */
+void linenoisePrintKeyCodes(void) {
+    char quit[4];
+
+    printf("Linenoise key codes debugging mode.\n"
+            "Press keys to see scan codes. Type 'quit' at any time to exit.\n");
+    if (enableRawMode(STDIN_FILENO) == -1) return;
+    memset(quit,' ',4);
+    while(1) {
+        char c;
+        int nread;
+
+        nread = read(STDIN_FILENO,&c,1);
+        if (nread <= 0) continue;
+        memmove(quit,quit+1,sizeof(quit)-1); /* shift string to left. */
+        quit[sizeof(quit)-1] = c; /* Insert current char on the right. */
+        if (memcmp(quit,"quit",sizeof(quit)) == 0) break;
+
+        printf("'%c' %02x (%d) (type quit to exit)\n",
+            isprint(c) ? c : '?', (int)c, (int)c);
+        printf("\x1b[0G"); /* Go left edge manually, we are in raw mode. */
+        fflush(stdout);
+    }
+    disableRawMode(STDIN_FILENO);
 }
 
+/* This function calls the line editing function linenoiseEdit() using
+ * the STDIN file descriptor set in raw mode. */
 static int linenoiseRaw(char *buf, size_t buflen, const char *prompt) {
-    int fd = STDIN_FILENO;
     int count;
 
     if (buflen == 0) {
@@ -496,6 +920,7 @@ static int linenoiseRaw(char *buf, size_t buflen, const char *prompt) {
         return -1;
     }
     if (!isatty(STDIN_FILENO)) {
+        /* Not a tty: read from file / pipe. */
         if (fgets(buf, buflen, stdin) == NULL) return -1;
         count = strlen(buf);
         if (count && buf[count-1] == '\n') {
@@ -503,14 +928,20 @@ static int linenoiseRaw(char *buf, size_t buflen, const char *prompt) {
             buf[count] = '\0';
         }
     } else {
-        if (enableRawMode(fd) == -1) return -1;
-        count = linenoisePrompt(fd, buf, buflen, prompt);
-        disableRawMode(fd);
+        /* Interactive editing. */
+        if (enableRawMode(STDIN_FILENO) == -1) return -1;
+        count = linenoiseEdit(STDIN_FILENO, STDOUT_FILENO, buf, buflen, prompt);
+        disableRawMode(STDIN_FILENO);
         printf("\n");
     }
     return count;
 }
 
+/* The high level function that is the main API of the linenoise library.
+ * This function checks if the terminal has basic capabilities, just checking
+ * for a blacklist of stupid terminals, and later either calls the line
+ * editing function or uses dummy fgets() so that you will be able to type
+ * something even in the most desperate of the conditions. */
 char *linenoise(const char *prompt) {
     char buf[LINENOISE_MAX_LINE];
     int count;
@@ -534,29 +965,50 @@ char *linenoise(const char *prompt) {
     }
 }
 
-/* Register a callback function to be called for tab-completion. */
-void linenoiseSetCompletionCallback(linenoiseCompletionCallback *fn) {
-    completionCallback = fn;
+/* ================================ History ================================= */
+
+/* Free the history, but does not reset it. Only used when we have to
+ * exit() to avoid memory leaks are reported by valgrind & co. */
+static void freeHistory(void) {
+    if (history) {
+        int j;
+
+        for (j = 0; j < history_len; j++)
+            free(history[j]);
+        free(history);
+    }
 }
 
-void linenoiseAddCompletion(linenoiseCompletions *lc, char *str) {
-    size_t len = strlen(str);
-    char *copy = malloc(len+1);
-    memcpy(copy,str,len+1);
-    lc->cvec = realloc(lc->cvec,sizeof(char*)*(lc->len+1));
-    lc->cvec[lc->len++] = copy;
+/* At exit we'll try to fix the terminal to the initial conditions. */
+static void linenoiseAtExit(void) {
+    disableRawMode(STDIN_FILENO);
+    freeHistory();
 }
 
-/* Using a circular buffer is smarter, but a bit more complex to handle. */
+/* This is the API call to add a new entry in the linenoise history.
+ * It uses a fixed array of char pointers that are shifted (memmoved)
+ * when the history max length is reached in order to remove the older
+ * entry and make room for the new one, so it is not exactly suitable for huge
+ * histories, but will work well for a few hundred of entries.
+ *
+ * Using a circular buffer is smarter, but a bit more complex to handle. */
 int linenoiseHistoryAdd(const char *line) {
     char *linecopy;
 
     if (history_max_len == 0) return 0;
+
+    /* Initialization on first call. */
     if (history == NULL) {
         history = malloc(sizeof(char*)*history_max_len);
         if (history == NULL) return 0;
         memset(history,0,(sizeof(char*)*history_max_len));
     }
+
+    /* Don't add duplicated lines. */
+    if (history_len && !strcmp(history[history_len-1], line)) return 0;
+
+    /* Add an heap allocated copy of the line in the history.
+     * If we reached the max length, remove the older line. */
     linecopy = strdup(line);
     if (!linecopy) return 0;
     if (history_len == history_max_len) {
@@ -569,6 +1021,10 @@ int linenoiseHistoryAdd(const char *line) {
     return 1;
 }
 
+/* Set the maximum length for the history. This function can be called even
+ * if there is already some history, the function will make sure to retain
+ * just the latest 'len' elements if the new history length value is smaller
+ * than the amount of items already inside the history. */
 int linenoiseHistorySetMaxLen(int len) {
     char **new;
 
@@ -578,8 +1034,16 @@ int linenoiseHistorySetMaxLen(int len) {
 
         new = malloc(sizeof(char*)*len);
         if (new == NULL) return 0;
-        if (len < tocopy) tocopy = len;
-        memcpy(new,history+(history_max_len-tocopy), sizeof(char*)*tocopy);
+
+        /* If we can't copy everything, free the elements we'll not use. */
+        if (len < tocopy) {
+            int j;
+
+            for (j = 0; j < tocopy-len; j++) free(history[j]);
+            tocopy = len;
+        }
+        memset(new,0,sizeof(char*)*len);
+        memcpy(new,history+(history_len-tocopy), sizeof(char*)*tocopy);
         free(history);
         history = new;
     }
@@ -591,7 +1055,7 @@ int linenoiseHistorySetMaxLen(int len) {
 
 /* Save the history in the specified file. On success 0 is returned
  * otherwise -1 is returned. */
-int linenoiseHistorySave(char *filename) {
+int linenoiseHistorySave(const char *filename) {
     FILE *fp = fopen(filename,"w");
     int j;
     
@@ -607,7 +1071,7 @@ int linenoiseHistorySave(char *filename) {
  *
  * If the file exists and the operation succeeded 0 is returned, otherwise
  * on error -1 is returned. */
-int linenoiseHistoryLoad(char *filename) {
+int linenoiseHistoryLoad(const char *filename) {
     FILE *fp = fopen(filename,"r");
     char buf[LINENOISE_MAX_LINE];
     
diff --git a/deps/linenoise/linenoise.h b/deps/linenoise/linenoise.h
index 76a703c28..e22ebd3fd 100644
--- a/deps/linenoise/linenoise.h
+++ b/deps/linenoise/linenoise.h
@@ -3,39 +3,44 @@
  *
  * See linenoise.c for more information.
  *
+ * ------------------------------------------------------------------------
+ *
  * Copyright (c) 2010, Salvatore Sanfilippo <antirez at gmail dot com>
  * Copyright (c) 2010, Pieter Noordhuis <pcnoordhuis at gmail dot com>
  *
  * All rights reserved.
- *
+ * 
  * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are met:
+ * modification, are permitted provided that the following conditions are
+ * met:
+ * 
+ *  *  Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
  *
- *   * Redistributions of source code must retain the above copyright notice,
- *     this list of conditions and the following disclaimer.
- *   * Redistributions in binary form must reproduce the above copyright
+ *  *  Redistributions in binary form must reproduce the above copyright
  *     notice, this list of conditions and the following disclaimer in the
  *     documentation and/or other materials provided with the distribution.
- *   * Neither the name of Redis nor the names of its contributors may be used
- *     to endorse or promote products derived from this software without
- *     specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
- * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- * POSSIBILITY OF SUCH DAMAGE.
+ * 
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #ifndef __LINENOISE_H
 #define __LINENOISE_H
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 typedef struct linenoiseCompletions {
   size_t len;
   char **cvec;
@@ -43,13 +48,19 @@ typedef struct linenoiseCompletions {
 
 typedef void(linenoiseCompletionCallback)(const char *, linenoiseCompletions *);
 void linenoiseSetCompletionCallback(linenoiseCompletionCallback *);
-void linenoiseAddCompletion(linenoiseCompletions *, char *);
+void linenoiseAddCompletion(linenoiseCompletions *, const char *);
 
 char *linenoise(const char *prompt);
 int linenoiseHistoryAdd(const char *line);
 int linenoiseHistorySetMaxLen(int len);
-int linenoiseHistorySave(char *filename);
-int linenoiseHistoryLoad(char *filename);
+int linenoiseHistorySave(const char *filename);
+int linenoiseHistoryLoad(const char *filename);
 void linenoiseClearScreen(void);
+void linenoiseSetMultiLine(int ml);
+void linenoisePrintKeyCodes(void);
+
+#ifdef __cplusplus
+}
+#endif
 
 #endif /* __LINENOISE_H */
diff --git a/redis.conf b/redis.conf
index 7fb4e4953..00a2f9193 100644
--- a/redis.conf
+++ b/redis.conf
@@ -44,6 +44,15 @@ pidfile /var/run/redis.pid
 # If port 0 is specified Redis will not listen on a TCP socket.
 port 6379
 
+# TCP listen() backlog.
+#
+# In high requests-per-second environments you need an high backlog in order
+# to avoid slow clients connections issues. Note that the Linux kernel
+# will silently truncate it to the value of /proc/sys/net/core/somaxconn so
+# make sure to raise both the value of somaxconn and tcp_max_syn_backlog
+# in order to get the desired effect.
+tcp-backlog 511
+
 # By default Redis listens for connections from all the network interfaces
 # available on the server. It is possible to listen to just one or multiple
 # interfaces using the "bind" configuration directive, followed by one or
@@ -407,15 +416,18 @@ slave-priority 100
 #
 # The default is:
 #
-# maxmemory-policy volatile-lru
+# maxmemory-policy noeviction
 
 # LRU and minimal TTL algorithms are not precise algorithms but approximated
-# algorithms (in order to save memory), so you can select as well the sample
-# size to check. For instance for default Redis will check three keys and
-# pick the one that was used less recently, you can change the sample size
-# using the following configuration directive.
+# algorithms (in order to save memory), so you can tune it for speed or
+# accuracy. For default Redis will check five keys and pick the one that was
+# used less recently, you can change the sample size using the following
+# configuration directive.
+#
+# The default of 5 produces good enough results. 10 Approximates very closely
+# true LRU but costs a bit more CPU. 3 is very fast but not very accurate.
 #
-# maxmemory-samples 3
+# maxmemory-samples 5
 
 ############################## APPEND ONLY MODE ###############################
 
@@ -551,6 +563,25 @@ lua-time-limit 5000
 #
 # cluster-node-timeout 15000
 
+# Cluster slaves are able to migrate to orphaned masters, that are masters
+# that are left without working slaves. This improves the cluster ability
+# to resist to failures as otherwise an orphaned master can't be failed over
+# in case of failure if it has no working slaves.
+#
+# Slaves migrate to orphaned masters only if there are still at least a
+# given number of other working slaves for their old master. This number
+# is the "migration barrier". A migration barrier of 1 means that a slave
+# will migrate only if there is at least 1 other working slave for its master
+# and so forth. It usually reflects the number of slaves you want for every
+# master in your cluster.
+#
+# Default is 1 (slaves migrate only if their masters remain with at least
+# one slave). To disable migration just set it to a very large value.
+# A value of 0 can be set but is useful only for debugging and dangerous
+# in production.
+#
+# cluster-migration-barrier 1
+
 # In order to setup your cluster make sure to read the documentation
 # available at http://redis.io web site.
 
@@ -651,6 +682,20 @@ set-max-intset-entries 512
 zset-max-ziplist-entries 128
 zset-max-ziplist-value 64
 
+# HyperLogLog sparse representation bytes limit. The limit includes the
+# 16 bytes header. When an HyperLogLog using the sparse representation crosses
+# this limit, it is converted into the dense representation.
+#
+# A value greater than 16000 is totally useless, since at that point the
+# dense representation is more memory efficient.
+# 
+# The suggested value is ~ 3000 in order to have the benefits of
+# the space efficient encoding without slowing down too much PFADD,
+# which is O(N) with the sparse encoding. The value can be raised to
+# ~ 10000 when CPU is not a concern, but space is, and the data set is
+# composed of many HyperLogLogs with cardinality in the 0 - 15000 range.
+hll-sparse-max-bytes 3000
+
 # Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in
 # order to help rehashing the main Redis hash table (the one mapping top-level
 # keys to values). The hash table implementation Redis uses (see dict.c)
diff --git a/runtest-cluster b/runtest-cluster
new file mode 100755
index 000000000..27829a5fe
--- /dev/null
+++ b/runtest-cluster
@@ -0,0 +1,14 @@
+#!/bin/sh
+TCL_VERSIONS="8.5 8.6"
+TCLSH=""
+
+for VERSION in $TCL_VERSIONS; do
+	TCL=`which tclsh$VERSION 2>/dev/null` && TCLSH=$TCL
+done
+
+if [ -z $TCLSH ]
+then
+    echo "You need tcl 8.5 or newer in order to run the Redis Sentinel test"
+    exit 1
+fi
+$TCLSH tests/cluster/run.tcl $*
diff --git a/runtest-sentinel b/runtest-sentinel
new file mode 100755
index 000000000..3fb1ef615
--- /dev/null
+++ b/runtest-sentinel
@@ -0,0 +1,14 @@
+#!/bin/sh
+TCL_VERSIONS="8.5 8.6"
+TCLSH=""
+
+for VERSION in $TCL_VERSIONS; do
+	TCL=`which tclsh$VERSION 2>/dev/null` && TCLSH=$TCL
+done
+
+if [ -z $TCLSH ]
+then
+    echo "You need tcl 8.5 or newer in order to run the Redis Sentinel test"
+    exit 1
+fi
+$TCLSH tests/sentinel/run.tcl $*
diff --git a/sentinel.conf b/sentinel.conf
index e44342221..114b8474f 100644
--- a/sentinel.conf
+++ b/sentinel.conf
@@ -4,6 +4,13 @@
 # The port that this sentinel instance will run on
 port 26379
 
+# dir <working-directory>
+# Every long running process should have a well-defined working directory.
+# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
+# for the process to don't interferer with administrative tasks such as
+# unmounting filesystems.
+dir /tmp
+
 # sentinel monitor <master-name> <ip> <redis-port> <quorum>
 #
 # Tells Sentinel to monitor this master, and to consider it in O_DOWN
@@ -86,10 +93,10 @@ sentinel failover-timeout mymaster 180000
 # or to reconfigure clients after a failover. The scripts are executed
 # with the following rules for error handling:
 #
-# If script exists with "1" the execution is retried later (up to a maximum
+# If script exits with "1" the execution is retried later (up to a maximum
 # number of times currently set to 10).
 #
-# If script exists with "2" (or an higher value) the script execution is
+# If script exits with "2" (or an higher value) the script execution is
 # not retried.
 #
 # If script terminates because it receives a signal the behavior is the same
diff --git a/src/Makefile b/src/Makefile
index 0b4cff7a1..289371666 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -107,7 +107,7 @@ endif
 
 REDIS_SERVER_NAME=redis-server
 REDIS_SENTINEL_NAME=redis-sentinel
-REDIS_SERVER_OBJ=adlist.o ae.o anet.o dict.o redis.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o scripting.o bio.o rio.o rand.o memtest.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o
+REDIS_SERVER_OBJ=adlist.o ae.o anet.o dict.o redis.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o scripting.o bio.o rio.o rand.o memtest.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o
 REDIS_CLI_NAME=redis-cli
 REDIS_CLI_OBJ=anet.o sds.o adlist.o redis-cli.o zmalloc.o release.o anet.o ae.o crc64.o
 REDIS_BENCHMARK_NAME=redis-benchmark
@@ -204,6 +204,9 @@ distclean: clean
 test: $(REDIS_SERVER_NAME) $(REDIS_CHECK_AOF_NAME)
 	@(cd ..; ./runtest)
 
+test-sentinel: $(REDIS_SENTINEL_NAME)
+	@(cd ..; ./runtest-sentinel)
+
 check: test
 
 lcov:
diff --git a/src/Makefile.dep b/src/Makefile.dep
index b66e00df4..d118050fd 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -14,6 +14,9 @@ bio.o: bio.c redis.h fmacros.h config.h ../deps/lua/src/lua.h \
 bitops.o: bitops.c redis.h fmacros.h config.h ../deps/lua/src/lua.h \
  ../deps/lua/src/luaconf.h ae.h sds.h dict.h adlist.h zmalloc.h anet.h \
  ziplist.h intset.h version.h util.h rdb.h rio.h
+blocked.o: blocked.c redis.h fmacros.h config.h ../deps/lua/src/lua.h \
+ ../deps/lua/src/luaconf.h ae.h sds.h dict.h adlist.h zmalloc.h anet.h \
+ ziplist.h intset.h version.h util.h rdb.h rio.h
 cluster.o: cluster.c redis.h fmacros.h config.h ../deps/lua/src/lua.h \
  ../deps/lua/src/luaconf.h ae.h sds.h dict.h adlist.h zmalloc.h anet.h \
  ziplist.h intset.h version.h util.h rdb.h rio.h cluster.h endianconv.h
@@ -32,6 +35,10 @@ debug.o: debug.c redis.h fmacros.h config.h ../deps/lua/src/lua.h \
  ziplist.h intset.h version.h util.h rdb.h rio.h sha1.h crc64.h bio.h
 dict.o: dict.c fmacros.h dict.h zmalloc.h redisassert.h
 endianconv.o: endianconv.c
+hyperloglog.o: hyperloglog.c redis.h fmacros.h config.h \
+ ../deps/lua/src/lua.h ../deps/lua/src/luaconf.h ae.h sds.h dict.h \
+ adlist.h zmalloc.h anet.h ziplist.h intset.h version.h util.h rdb.h \
+ rio.h
 intset.o: intset.c intset.h zmalloc.h endianconv.h config.h
 lzf_c.o: lzf_c.c lzfP.h
 lzf_d.o: lzf_d.c lzfP.h
@@ -117,6 +124,3 @@ ziplist.o: ziplist.c zmalloc.h util.h sds.h ziplist.h endianconv.h \
  config.h redisassert.h
 zipmap.o: zipmap.c zmalloc.h endianconv.h config.h
 zmalloc.o: zmalloc.c config.h zmalloc.h
-blocked.o: blocked.c redis.h fmacros.h config.h ../deps/lua/src/lua.h \
- ../deps/lua/src/luaconf.h ae.h sds.h dict.h adlist.h zmalloc.h anet.h \
- ziplist.h intset.h version.h util.h rdb.h rio.h
diff --git a/src/anet.c b/src/anet.c
index a42fde304..cc850a1f8 100644
--- a/src/anet.c
+++ b/src/anet.c
@@ -261,11 +261,12 @@ static int anetCreateSocket(char *err, int domain) {
 
 #define ANET_CONNECT_NONE 0
 #define ANET_CONNECT_NONBLOCK 1
-static int anetTcpGenericConnect(char *err, char *addr, int port, int flags)
+static int anetTcpGenericConnect(char *err, char *addr, int port,
+                                 char *source_addr, int flags)
 {
     int s = ANET_ERR, rv;
     char portstr[6];  /* strlen("65535") + 1; */
-    struct addrinfo hints, *servinfo, *p;
+    struct addrinfo hints, *servinfo, *bservinfo, *p, *b;
 
     snprintf(portstr,sizeof(portstr),"%d",port);
     memset(&hints,0,sizeof(hints));
@@ -285,6 +286,24 @@ static int anetTcpGenericConnect(char *err, char *addr, int port, int flags)
         if (anetSetReuseAddr(err,s) == ANET_ERR) goto error;
         if (flags & ANET_CONNECT_NONBLOCK && anetNonBlock(err,s) != ANET_OK)
             goto error;
+        if (source_addr) {
+            int bound = 0;
+            /* Using getaddrinfo saves us from self-determining IPv4 vs IPv6 */
+            if ((rv = getaddrinfo(source_addr, NULL, &hints, &bservinfo)) != 0) {
+                anetSetError(err, "%s", gai_strerror(rv));
+                goto end;
+            }
+            for (b = bservinfo; b != NULL; b = b->ai_next) {
+                if (bind(s,b->ai_addr,b->ai_addrlen) != -1) {
+                    bound = 1;
+                    break;
+                }
+            }
+            if (!bound) {
+                anetSetError(err, "bind: %s", strerror(errno));
+                goto end;
+            }
+        }
         if (connect(s,p->ai_addr,p->ai_addrlen) == -1) {
             /* If the socket is non-blocking, it is ok for connect() to
              * return an EINPROGRESS error here. */
@@ -317,7 +336,7 @@ static int anetTcpGenericConnect(char *err, char *addr, int port, int flags)
  */
 int anetTcpConnect(char *err, char *addr, int port)
 {
-    return anetTcpGenericConnect(err,addr,port,ANET_CONNECT_NONE);
+    return anetTcpGenericConnect(err,addr,port,NULL,ANET_CONNECT_NONE);
 }
 
 /*
@@ -325,7 +344,12 @@ int anetTcpConnect(char *err, char *addr, int port)
  */
 int anetTcpNonBlockConnect(char *err, char *addr, int port)
 {
-    return anetTcpGenericConnect(err,addr,port,ANET_CONNECT_NONBLOCK);
+    return anetTcpGenericConnect(err,addr,port,NULL,ANET_CONNECT_NONBLOCK);
+}
+
+int anetTcpNonBlockBindConnect(char *err, char *addr, int port, char *source_addr)
+{
+    return anetTcpGenericConnect(err,addr,port,source_addr,ANET_CONNECT_NONBLOCK);
 }
 
 int anetUnixGenericConnect(char *err, char *path, int flags)
@@ -409,17 +433,14 @@ int anetWrite(int fd, char *buf, int count)
 /*
  * 绑定并创建监听套接字
  */
-static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len) {
+static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {
     if (bind(s,sa,len) == -1) {
         anetSetError(err, "bind: %s", strerror(errno));
         close(s);
         return ANET_ERR;
     }
 
-    /* Use a backlog of 512 entries. We pass 511 to the listen() call because
-     * the kernel does: backlogsize = roundup_pow_of_two(backlogsize + 1);
-     * which will thus give us a backlog of 512 entries */
-    if (listen(s, 511) == -1) {
+    if (listen(s, backlog) == -1) {
         anetSetError(err, "listen: %s", strerror(errno));
         close(s);
         return ANET_ERR;
@@ -437,7 +458,7 @@ static int anetV6Only(char *err, int s) {
     return ANET_OK;
 }
 
-static int _anetTcpServer(char *err, int port, char *bindaddr, int af)
+static int _anetTcpServer(char *err, int port, char *bindaddr, int af, int backlog)
 {
     int s, rv;
     char _port[6];  /* strlen("65535") */
@@ -459,7 +480,7 @@ static int _anetTcpServer(char *err, int port, char *bindaddr, int af)
 
         if (af == AF_INET6 && anetV6Only(err,s) == ANET_ERR) goto error;
         if (anetSetReuseAddr(err,s) == ANET_ERR) goto error;
-        if (anetListen(err,s,p->ai_addr,p->ai_addrlen) == ANET_ERR) goto error;
+        if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) goto error;
         goto end;
     }
     if (p == NULL) {
@@ -474,20 +495,20 @@ static int _anetTcpServer(char *err, int port, char *bindaddr, int af)
     return s;
 }
 
-int anetTcpServer(char *err, int port, char *bindaddr)
+int anetTcpServer(char *err, int port, char *bindaddr, int backlog)
 {
-    return _anetTcpServer(err, port, bindaddr, AF_INET);
+    return _anetTcpServer(err, port, bindaddr, AF_INET, backlog);
 }
 
-int anetTcp6Server(char *err, int port, char *bindaddr)
+int anetTcp6Server(char *err, int port, char *bindaddr, int backlog)
 {
-    return _anetTcpServer(err, port, bindaddr, AF_INET6);
+    return _anetTcpServer(err, port, bindaddr, AF_INET6, backlog);
 }
 
 /*
  * 创建一个本地连接用的服务器监听套接字
  */
-int anetUnixServer(char *err, char *path, mode_t perm)
+int anetUnixServer(char *err, char *path, mode_t perm, int backlog)
 {
     int s;
     struct sockaddr_un sa;
@@ -498,7 +519,7 @@ int anetUnixServer(char *err, char *path, mode_t perm)
     memset(&sa,0,sizeof(sa));
     sa.sun_family = AF_LOCAL;
     strncpy(sa.sun_path,path,sizeof(sa.sun_path)-1);
-    if (anetListen(err,s,(struct sockaddr*)&sa,sizeof(sa)) == ANET_ERR)
+    if (anetListen(err,s,(struct sockaddr*)&sa,sizeof(sa),backlog) == ANET_ERR)
         return ANET_ERR;
     if (perm)
         chmod(sa.sun_path, perm);
@@ -529,7 +550,7 @@ int anetTcpAccept(char *err, int s, char *ip, size_t ip_len, int *port) {
     int fd;
     struct sockaddr_storage sa;
     socklen_t salen = sizeof(sa);
-    if ((fd = anetGenericAccept(err,s,(struct sockaddr*)&sa,&salen)) == ANET_ERR)
+    if ((fd = anetGenericAccept(err,s,(struct sockaddr*)&sa,&salen)) == -1)
         return ANET_ERR;
 
     if (sa.ss_family == AF_INET) {
@@ -551,7 +572,7 @@ int anetUnixAccept(char *err, int s) {
     int fd;
     struct sockaddr_un sa;
     socklen_t salen = sizeof(sa);
-    if ((fd = anetGenericAccept(err,s,(struct sockaddr*)&sa,&salen)) == ANET_ERR)
+    if ((fd = anetGenericAccept(err,s,(struct sockaddr*)&sa,&salen)) == -1)
         return ANET_ERR;
 
     return fd;
diff --git a/src/anet.h b/src/anet.h
index 2ab9398ad..c4659cd35 100644
--- a/src/anet.h
+++ b/src/anet.h
@@ -45,14 +45,15 @@
 
 int anetTcpConnect(char *err, char *addr, int port);
 int anetTcpNonBlockConnect(char *err, char *addr, int port);
+int anetTcpNonBlockBindConnect(char *err, char *addr, int port, char *source_addr);
 int anetUnixConnect(char *err, char *path);
 int anetUnixNonBlockConnect(char *err, char *path);
 int anetRead(int fd, char *buf, int count);
 int anetResolve(char *err, char *host, char *ipbuf, size_t ipbuf_len);
 int anetResolveIP(char *err, char *host, char *ipbuf, size_t ipbuf_len);
-int anetTcpServer(char *err, int port, char *bindaddr);
-int anetTcp6Server(char *err, int port, char *bindaddr);
-int anetUnixServer(char *err, char *path, mode_t perm);
+int anetTcpServer(char *err, int port, char *bindaddr, int backlog);
+int anetTcp6Server(char *err, int port, char *bindaddr, int backlog);
+int anetUnixServer(char *err, char *path, mode_t perm, int backlog);
 int anetTcpAccept(char *err, int serversock, char *ip, size_t ip_len, int *port);
 int anetUnixAccept(char *err, int serversock);
 int anetWrite(int fd, char *buf, int count);
diff --git a/src/aof.c b/src/aof.c
index 787533fe8..a9a1fba6a 100644
--- a/src/aof.c
+++ b/src/aof.c
@@ -354,6 +354,7 @@ int startAppendOnly(void) {
  * 不过，如果 force 为 1 的话，那么不管后台是否正在 fsync ，
  * 程序都直接进行写入。
  */
+#define AOF_WRITE_LOG_ERROR_RATE 30 /* Seconds between errors logging. */
 void flushAppendOnlyFile(int force) {
     ssize_t nwritten;
     int sync_in_progress = 0;
@@ -439,39 +440,80 @@ void flushAppendOnlyFile(int force) {
      */
     nwritten = write(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
     if (nwritten != (signed)sdslen(server.aof_buf)) {
-        /* Ooops, we are in troubles. The best thing to do for now is
-         * aborting instead of giving the illusion that everything is
-         * working as expected. 
-         *
-         * 糟糕了，成功写入的字节数不等于缓存的字节数
-         * 可能是磁盘满了 0 <= nwritten < sdslen(server.aof_buf) ，
-         * 也可能是写入失败 nwritten == -1
-         *
-         * 立即停机，向用户报告错误
-         */
+
+        static time_t last_write_error_log = 0;
+        int can_log = 0;
+
+        /* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */
+        // 将日志的记录频率限制在每行 AOF_WRITE_LOG_ERROR_RATE 秒
+        if ((server.unixtime - last_write_error_log) > AOF_WRITE_LOG_ERROR_RATE) {
+            can_log = 1;
+            last_write_error_log = server.unixtime;
+        }
+
+        /* Lof the AOF write error and record the error code. */
+        // 如果写入出错，那么尝试将该情况写入到日志里面
         if (nwritten == -1) {
-            // 写入出错
-            redisLog(REDIS_WARNING,"Exiting on error writing to the append-only file: %s",strerror(errno));
+            if (can_log) {
+                redisLog(REDIS_WARNING,"Error writing to the AOF file: %s",
+                    strerror(errno));
+                server.aof_last_write_errno = errno;
+            }
         } else {
-            // 写入不完整
-            redisLog(REDIS_WARNING,"Exiting on short write while writing to "
-                                   "the append-only file: %s (nwritten=%ld, "
-                                   "expected=%ld)",
-                                   strerror(errno),
-                                   (long)nwritten,
-                                   (long)sdslen(server.aof_buf));
+            if (can_log) {
+                redisLog(REDIS_WARNING,"Short write while writing to "
+                                       "the AOF file: (nwritten=%lld, "
+                                       "expected=%lld)",
+                                       (long long)nwritten,
+                                       (long long)sdslen(server.aof_buf));
+            }
 
             // 尝试移除新追加的不完整内容
             if (ftruncate(server.aof_fd, server.aof_current_size) == -1) {
-                redisLog(REDIS_WARNING, "Could not remove short write "
-                         "from the append-only file.  Redis may refuse "
-                         "to load the AOF the next time it starts.  "
-                         "ftruncate: %s", strerror(errno));
+                if (can_log) {
+                    redisLog(REDIS_WARNING, "Could not remove short write "
+                             "from the append-only file.  Redis may refuse "
+                             "to load the AOF the next time it starts.  "
+                             "ftruncate: %s", strerror(errno));
+                }
+            } else {
+                /* If the ftrunacate() succeeded we can set nwritten to
+                 * -1 since there is no longer partial data into the AOF. */
+                nwritten = -1;
             }
+            server.aof_last_write_errno = ENOSPC;
         }
 
-        // 服务器退出
-        exit(1);
+        /* Handle the AOF write error. */
+        if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
+            /* We can't recover when the fsync policy is ALWAYS since the
+             * reply for the client is already in the output buffers, and we
+             * have the contract with the user that on acknowledged write data
+             * is synched on disk. */
+            redisLog(REDIS_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting...");
+            exit(1);
+        } else {
+            /* Recover from failed write leaving data into the buffer. However
+             * set an error to stop accepting writes as long as the error
+             * condition is not cleared. */
+            server.aof_last_write_status = REDIS_ERR;
+
+            /* Trim the sds buffer if there was a partial write, and there
+             * was no way to undo it with ftruncate(2). */
+            if (nwritten > 0) {
+                server.aof_current_size += nwritten;
+                sdsrange(server.aof_buf,nwritten,-1);
+            }
+            return; /* We'll try again on the next call... */
+        }
+    } else {
+        /* Successful write(2). If AOF was in error state, restore the
+         * OK state and log the event. */
+        if (server.aof_last_write_status == REDIS_ERR) {
+            redisLog(REDIS_WARNING,
+                "AOF write error looks solved, Redis can write again.");
+            server.aof_last_write_status = REDIS_OK;
+        }
     }
 
     // 更新写入后的 AOF 文件大小
@@ -748,6 +790,7 @@ struct redisClient *createFakeClient(void) {
     c->reply_bytes = 0;
     c->obuf_soft_limit_reached_time = 0;
     c->watched_keys = listCreate();
+    c->peerid = NULL;
     listSetFreeMethod(c->reply,decrRefCountVoid);
     listSetDupMethod(c->reply,dupClientReplyValue);
     initClientMultiState(c);
@@ -841,7 +884,7 @@ int loadAppendOnlyFile(char *filename) {
          */
         if (!(loops++ % 1000)) {
             loadingProgress(ftello(fp));
-            aeProcessEvents(server.el, AE_FILE_EVENTS|AE_DONT_WAIT);
+            processEventsWhileBlocked();
         }
 
         // 读入文件内容到缓存
@@ -1405,9 +1448,9 @@ int rewriteAppendOnlyFile(char *filename) {
 
     /* Make sure data will not remain on the OS's output buffers */
     // 冲洗并关闭新 AOF 文件
-    fflush(fp);
-    aof_fsync(fileno(fp));
-    fclose(fp);
+    if (fflush(fp) == EOF) goto werr;
+    if (aof_fsync(fileno(fp)) == -1) goto werr;
+    if (fclose(fp) == EOF) goto werr;
 
     /* Use RENAME to make sure the DB file is changed atomically only
      * if the generate DB file is ok. 
diff --git a/src/bitops.c b/src/bitops.c
index 3d24ef626..6d2e86566 100644
--- a/src/bitops.c
+++ b/src/bitops.c
@@ -64,19 +64,25 @@ static int getBitOffsetFromArgument(redisClient *c, robj *o, size_t *offset) {
 // 这个函数只能在最大为 512 MB 的字符串上使用
 size_t redisPopcount(void *s, long count) {
     size_t bits = 0;
-    unsigned char *p;
-    uint32_t *p4 = s;
-
+    unsigned char *p = s;
+    uint32_t *p4;
     // 通过查表来计算，对于 1 字节所能表示的值来说
     // 这些值的二进制表示所带有的 1 的数量
     // 比如整数 3 的二进制表示 0011 ，带有两个 1
     // 正好是查表 bitsinbyte[3] == 2
     static const unsigned char bitsinbyte[256] = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8};
 
+    /* Count initial bytes not aligned to 32 bit. */
+    while((unsigned long)p & 3 && count) {
+        bits += bitsinbyte[*p++];
+        count--;
+    }
+
     /* Count bits 16 bytes at a time */
     // 每次统计 16 字节
     // 关于这里所使用的优化算法，可以参考：
     // http://yesteapea.wordpress.com/2013/03/03/counting-the-number-of-set-bits-in-an-integer/
+    p4 = (uint32_t*)p;
     while(count>=16) {
         uint32_t aux1, aux2, aux3, aux4;
 
@@ -100,13 +106,100 @@ size_t redisPopcount(void *s, long count) {
                 ((((aux4 + (aux4 >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24);
     }
 
-    /* Count the remaining bytes */
+    /* Count the remaining bytes. */
     // 不足 16 字节的，剩下的每个字节通过查表来完成
     p = (unsigned char*)p4;
     while(count--) bits += bitsinbyte[*p++];
     return bits;
 }
 
+/* Return the position of the first bit set to one (if 'bit' is 1) or
+ * zero (if 'bit' is 0) in the bitmap starting at 's' and long 'count' bytes.
+ *
+ * The function is guaranteed to return a value >= 0 if 'bit' is 0 since if
+ * no zero bit is found, it returns count*8 assuming the string is zero
+ * padded on the right. However if 'bit' is 1 it is possible that there is
+ * not a single set bit in the bitmap. In this special case -1 is returned. */
+long redisBitpos(void *s, long count, int bit) {
+    unsigned long *l;
+    unsigned char *c;
+    unsigned long skipval, word = 0, one;
+    long pos = 0; /* Position of bit, to return to the caller. */
+    int j;
+
+    /* Process whole words first, seeking for first word that is not
+     * all ones or all zeros respectively if we are lookig for zeros
+     * or ones. This is much faster with large strings having contiguous
+     * blocks of 1 or 0 bits compared to the vanilla bit per bit processing.
+     *
+     * Note that if we start from an address that is not aligned
+     * to sizeof(unsigned long) we consume it byte by byte until it is
+     * aligned. */
+
+    /* Skip initial bits not aligned to sizeof(unsigned long) byte by byte. */
+    skipval = bit ? 0 : UCHAR_MAX;
+    c = (unsigned char*) s;
+    while((unsigned long)c & (sizeof(*l)-1) && count) {
+        if (*c != skipval) break;
+        c++;
+        count--;
+        pos += 8;
+    }
+
+    /* Skip bits with full word step. */
+    skipval = bit ? 0 : ULONG_MAX;
+    l = (unsigned long*) c;
+    while (count >= sizeof(*l)) {
+        if (*l != skipval) break;
+        l++;
+        count -= sizeof(*l);
+        pos += sizeof(*l)*8;
+    }
+
+    /* Load bytes into "word" considering the first byte as the most significant
+     * (we basically consider it as written in big endian, since we consider the
+     * string as a set of bits from left to right, with the first bit at position
+     * zero.
+     *
+     * Note that the loading is designed to work even when the bytes left
+     * (count) are less than a full word. We pad it with zero on the right. */
+    c = (unsigned char*)l;
+    for (j = 0; j < sizeof(*l); j++) {
+        word <<= 8;
+        if (count) {
+            word |= *c;
+            c++;
+            count--;
+        }
+    }
+
+    /* Special case:
+     * If bits in the string are all zero and we are looking for one,
+     * return -1 to signal that there is not a single "1" in the whole
+     * string. This can't happen when we are looking for "0" as we assume
+     * that the right of the string is zero padded. */
+    if (bit == 1 && word == 0) return -1;
+
+    /* Last word left, scan bit by bit. The first thing we need is to
+     * have a single "1" set in the most significant position in an
+     * unsigned long. We don't know the size of the long so we use a
+     * simple trick. */
+    one = ULONG_MAX; /* All bits set to 1.*/
+    one >>= 1;       /* All bits set to 1 but the MSB. */
+    one = ~one;      /* All bits set to 0 but the MSB. */
+
+    while(one) {
+        if (((one & word) != 0) == bit) return pos;
+        pos++;
+        one >>= 1;
+    }
+
+    /* If we reached this point, there is a bug in the algorithm, since
+     * the case of no match is handled as a special case before. */
+    redisPanic("End of redisBitpos() reached.");
+    return 0; /* Just to avoid warnings. */
+}
+
 /* -----------------------------------------------------------------------------
  * Bits related string commands: GETBIT, SETBIT, BITCOUNT, BITOP.
  * -------------------------------------------------------------------------- */
@@ -155,14 +248,7 @@ void setbitCommand(redisClient *c) {
         // 对象存在，检查类型是否字符串
         if (checkType(c,o,REDIS_STRING)) return;
 
-        /* Create a copy when the object is shared or encoded. */
-        // 如果对象被共享或者编码，那么创建一个复制对象
-        if (o->refcount != 1 || o->encoding != REDIS_ENCODING_RAW) {
-            robj *decoded = getDecodedObject(o);
-            o = createRawStringObject(decoded->ptr, sdslen(decoded->ptr));
-            decrRefCount(decoded);
-            dbOverwrite(c->db,c->argv[1],o);
-        }
+        o = dbUnshareStringValue(c->db,c->argv[1],o);
     }
 
     /* Grow sds value to the right length if necessary */
@@ -498,3 +584,90 @@ void bitcountCommand(redisClient *c) {
         addReplyLongLong(c,redisPopcount(p+start,bytes));
     }
 }
+
+/* BITPOS key bit [start [end]] */
+void bitposCommand(redisClient *c) {
+    robj *o;
+    long bit, start, end, strlen;
+    unsigned char *p;
+    char llbuf[32];
+    int end_given = 0;
+
+    /* Parse the bit argument to understand what we are looking for, set
+     * or clear bits. */
+    if (getLongFromObjectOrReply(c,c->argv[2],&bit,NULL) != REDIS_OK)
+        return;
+    if (bit != 0 && bit != 1) {
+        addReplyError(c, "The bit argument must be 1 or 0.");
+        return;
+    }
+
+    /* If the key does not exist, from our point of view it is an infinite
+     * array of 0 bits. If the user is looking for the fist clear bit return 0,
+     * If the user is looking for the first set bit, return -1. */
+    if ((o = lookupKeyRead(c->db,c->argv[1])) == NULL) {
+        addReplyLongLong(c, bit ? -1 : 0);
+        return;
+    }
+    if (checkType(c,o,REDIS_STRING)) return;
+
+    /* Set the 'p' pointer to the string, that can be just a stack allocated
+     * array if our string was integer encoded. */
+    if (o->encoding == REDIS_ENCODING_INT) {
+        p = (unsigned char*) llbuf;
+        strlen = ll2string(llbuf,sizeof(llbuf),(long)o->ptr);
+    } else {
+        p = (unsigned char*) o->ptr;
+        strlen = sdslen(o->ptr);
+    }
+
+    /* Parse start/end range if any. */
+    if (c->argc == 4 || c->argc == 5) {
+        if (getLongFromObjectOrReply(c,c->argv[3],&start,NULL) != REDIS_OK)
+            return;
+        if (c->argc == 5) {
+            if (getLongFromObjectOrReply(c,c->argv[4],&end,NULL) != REDIS_OK)
+                return;
+            end_given = 1;
+        } else {
+            end = strlen-1;
+        }
+        /* Convert negative indexes */
+        if (start < 0) start = strlen+start;
+        if (end < 0) end = strlen+end;
+        if (start < 0) start = 0;
+        if (end < 0) end = 0;
+        if (end >= strlen) end = strlen-1;
+    } else if (c->argc == 3) {
+        /* The whole string. */
+        start = 0;
+        end = strlen-1;
+    } else {
+        /* Syntax error. */
+        addReply(c,shared.syntaxerr);
+        return;
+    }
+
+    /* For empty ranges (start > end) we return -1 as an empty range does
+     * not contain a 0 nor a 1. */
+    if (start > end) {
+        addReplyLongLong(c, -1);
+    } else {
+        long bytes = end-start+1;
+        long pos = redisBitpos(p+start,bytes,bit);
+
+        /* If we are looking for clear bits, and the user specified an exact
+         * range with start-end, we can't consider the right of the range as
+         * zero padded (as we do when no explicit end is given).
+         *
+         * So if redisBitpos() returns the first bit outside the range,
+         * we return -1 to the caller, to mean, in the specified range there
+         * is not a single "0" bit. */
+        if (end_given && bit == 0 && pos == bytes*8) {
+            addReplyLongLong(c,-1);
+            return;
+        }
+        if (pos != -1) pos += start*8; /* Adjust for the bytes we skipped. */
+        addReplyLongLong(c,pos);
+    }
+}
diff --git a/src/cluster.c b/src/cluster.c
index 383fe3704..f53bc3652 100644
--- a/src/cluster.c
+++ b/src/cluster.c
@@ -39,6 +39,12 @@
 #include <unistd.h>
 #include <sys/socket.h>
 #include <sys/stat.h>
+#include <sys/file.h>
+
+/* A global reference to myself is handy to make code more clear.
+ * Myself always points to server.cluster->myself, that is, the clusterNode
+ * that represents this node. */
+clusterNode *myself = NULL;
 
 clusterNode *createClusterNode(char *nodename, int flags);
 int clusterAddNode(clusterNode *node);
@@ -58,20 +64,22 @@ int clusterDelNodeSlots(clusterNode *node);
 int clusterNodeSetSlotBit(clusterNode *n, int slot);
 void clusterSetMaster(clusterNode *n);
 void clusterHandleSlaveFailover(void);
+void clusterHandleSlaveMigration(int max_slaves);
 int bitmapTestBit(unsigned char *bitmap, int pos);
 void clusterDoBeforeSleep(int flags);
 void clusterSendUpdate(clusterLink *link, clusterNode *node);
+void resetManualFailover(void);
+void clusterCloseAllSlots(void);
+void clusterSetNodeAsMaster(clusterNode *n);
+void clusterDelNode(clusterNode *delnode);
 
 /* -----------------------------------------------------------------------------
  * Initialization
  * -------------------------------------------------------------------------- */
 
-/* This function is called at startup in order to set the currentEpoch
- * (which is not saved on permanent storage) to the greatest configEpoch found
- * in the loaded nodes (configEpoch is stored on permanent storage as soon as
- * it changes for some node). */
-// 设置配置纪元
-void clusterSetStartupEpoch() {
+/* Return the greatest configEpoch found in the cluster. */
+uint64_t clusterGetMaxEpoch(void) {
+    uint64_t max = 0;
     dictIterator *di;
     dictEntry *de;
 
@@ -79,19 +87,43 @@ void clusterSetStartupEpoch() {
     di = dictGetSafeIterator(server.cluster->nodes);
     while((de = dictNext(di)) != NULL) {
         clusterNode *node = dictGetVal(de);
-        if (node->configEpoch > server.cluster->currentEpoch)
-            server.cluster->currentEpoch = node->configEpoch;
+        if (node->configEpoch > max) max = node->configEpoch;
     }
     dictReleaseIterator(di);
+    if (max < server.cluster->currentEpoch) max = server.cluster->currentEpoch;
+    return max;
 }
 
 // 载入集群配置
+/* Load the cluster config from 'filename'.
+ *
+ * If the file does not exist or is zero-length (this may happen because
+ * when we lock the nodes.conf file, we create a zero-length one for the
+ * sake of locking if it does not already exist), REDIS_ERR is returned.
+ * If the configuration was loaded from the file, REDIS_OK is returned. */
 int clusterLoadConfig(char *filename) {
     FILE *fp = fopen(filename,"r");
+    struct stat sb;
     char *line;
     int maxline, j;
-   
-    if (fp == NULL) return REDIS_ERR;
+
+    if (fp == NULL) {
+        if (errno == ENOENT) {
+            return REDIS_ERR;
+        } else {
+            redisLog(REDIS_WARNING,
+                "Loading the cluster node config from %s: %s",
+                filename, strerror(errno));
+            exit(1);
+        }
+    }
+
+    /* Check if the file is zero-length: if so return REDIS_ERR to signal
+     * we have to write the config. */
+    if (fstat(fileno(fp),&sb) != -1 && sb.st_size == 0) {
+        fclose(fp);
+        return REDIS_ERR;
+    }
 
     /* Parse the file. Note that single liens of the cluster config file can
      * be really long as they include all the hash slots of the node.
@@ -126,6 +158,25 @@ int clusterLoadConfig(char *filename) {
         argv = sdssplitargs(line,&argc);
         if (argv == NULL) goto fmterr;
 
+        /* Handle the special "vars" line. Don't pretend it is the last
+         * line even if it actually is when generated by Redis. */
+        if (strcasecmp(argv[0],"vars") == 0) {
+            for (j = 1; j < argc; j += 2) {
+                if (strcasecmp(argv[j],"currentEpoch") == 0) {
+                    server.cluster->currentEpoch =
+                            strtoull(argv[j+1],NULL,10);
+                } else if (strcasecmp(argv[j],"lastVoteEpoch") == 0) {
+                    server.cluster->lastVoteEpoch =
+                            strtoull(argv[j+1],NULL,10);
+                } else {
+                    redisLog(REDIS_WARNING,
+                        "Skipping unknown cluster config variable '%s'",
+                        argv[j]);
+                }
+            }
+            continue;
+        }
+
         /* Create this node if it does not exist */
         // 检查节点是否已经存在
         n = clusterLookupNode(argv[0]);
@@ -150,7 +201,7 @@ int clusterLoadConfig(char *filename) {
             // 这是节点本身
             if (!strcasecmp(s,"myself")) {
                 redisAssert(server.cluster->myself == NULL);
-                server.cluster->myself = n;
+                myself = server.cluster->myself = n;
                 n->flags |= REDIS_NODE_MYSELF;
             // 这是一个主节点
             } else if (!strcasecmp(s,"master")) {
@@ -264,15 +315,19 @@ int clusterLoadConfig(char *filename) {
 
     /* Config sanity check */
     redisAssert(server.cluster->myself != NULL);
-    redisLog(REDIS_NOTICE,"Node configuration loaded, I'm %.40s",
-        server.cluster->myself->name);
-    // 设置配置纪元
-    clusterSetStartupEpoch();
+    redisLog(REDIS_NOTICE,"Node configuration loaded, I'm %.40s", myself->name);
 
+    /* Something that should never happen: currentEpoch smaller than
+     * the max epoch found in the nodes configuration. However we handle this
+     * as some form of protection against manual editing of critical files. */
+    if (clusterGetMaxEpoch() > server.cluster->currentEpoch) {
+        server.cluster->currentEpoch = clusterGetMaxEpoch();
+    }
     return REDIS_OK;
 
 fmterr:
-    redisLog(REDIS_WARNING,"Unrecoverable error: corrupted cluster config file.");
+    redisLog(REDIS_WARNING,
+        "Unrecoverable error: corrupted cluster config file.");
     fclose(fp);
     exit(1);
 }
@@ -291,11 +346,21 @@ int clusterLoadConfig(char *filename) {
  * the file afterward. */
 // 写入 nodes.conf 文件
 int clusterSaveConfig(int do_fsync) {
-    sds ci = clusterGenNodesDescription(REDIS_NODE_HANDSHAKE);
-    size_t content_size = sdslen(ci);
+    sds ci;
+    size_t content_size;
     struct stat sb;
     int fd;
-    
+
+    server.cluster->todo_before_sleep &= ~CLUSTER_TODO_SAVE_CONFIG;
+
+    /* Get the nodes description and concatenate our "vars" directive to
+     * save currentEpoch and lastVoteEpoch. */
+    ci = clusterGenNodesDescription(REDIS_NODE_HANDSHAKE);
+    ci = sdscatprintf(ci,"vars currentEpoch %llu lastVoteEpoch %llu\n",
+        (unsigned long long) server.cluster->currentEpoch,
+        (unsigned long long) server.cluster->lastVoteEpoch);
+    content_size = sdslen(ci);
+
     if ((fd = open(server.cluster_configfile,O_WRONLY|O_CREAT,0644))
         == -1) goto err;
 
@@ -307,7 +372,10 @@ int clusterSaveConfig(int do_fsync) {
         }
     }
     if (write(fd,ci,sdslen(ci)) != (ssize_t)sdslen(ci)) goto err;
-    if (do_fsync) fsync(fd);
+    if (do_fsync) {
+        server.cluster->todo_before_sleep &= ~CLUSTER_TODO_FSYNC_CONFIG;
+        fsync(fd);
+    }
 
     /* Truncate the file if needed to remove the final \n padding that
      * is just garbage. */
@@ -332,6 +400,46 @@ void clusterSaveConfigOrDie(int do_fsync) {
     }
 }
 
+/* Lock the cluster config using flock(), and leaks the file descritor used to
+ * acquire the lock so that the file will be locked forever.
+ *
+ * This works because we always update nodes.conf with a new version
+ * in-place, reopening the file, and writing to it in place (later adjusting
+ * the length with ftruncate()).
+ *
+ * On success REDIS_OK is returned, otherwise an error is logged and
+ * the function returns REDIS_ERR to signal a lock was not acquired. */
+int clusterLockConfig(char *filename) {
+    /* To lock it, we need to open the file in a way it is created if
+     * it does not exist, otherwise there is a race condition with other
+     * processes. */
+    int fd = open(filename,O_WRONLY|O_CREAT,0644);
+    if (fd == -1) {
+        redisLog(REDIS_WARNING,
+            "Can't open %s in order to acquire a lock: %s",
+            filename, strerror(errno));
+        return REDIS_ERR;
+    }
+
+    if (flock(fd,LOCK_EX|LOCK_NB) == -1) {
+        if (errno == EWOULDBLOCK) {
+            redisLog(REDIS_WARNING,
+                 "Sorry, the cluster configuration file %s is already used "
+                 "by a different Redis Cluster node. Please make sure that "
+                 "different nodes use different cluster configuration "
+                 "files.", filename);
+        } else {
+            redisLog(REDIS_WARNING,
+                "Impossible to lock %s: %s", filename, strerror(errno));
+        }
+        close(fd);
+        return REDIS_ERR;
+    }
+    /* Lock acquired: leak the 'fd' by not closing it, so that we'll retain the
+     * lock to the file as long as the process exists. */
+    return REDIS_OK;
+}
+
 // 初始化集群
 void clusterInit(void) {
     int saveconf = 0;
@@ -348,30 +456,28 @@ void clusterInit(void) {
         dictCreate(&clusterNodesBlackListDictType,NULL);
     server.cluster->failover_auth_time = 0;
     server.cluster->failover_auth_count = 0;
+    server.cluster->failover_auth_rank = 0;
     server.cluster->failover_auth_epoch = 0;
-    server.cluster->last_vote_epoch = 0;
+    server.cluster->lastVoteEpoch = 0;
     server.cluster->stats_bus_messages_sent = 0;
     server.cluster->stats_bus_messages_received = 0;
-    memset(server.cluster->migrating_slots_to,0,
-        sizeof(server.cluster->migrating_slots_to));
-    memset(server.cluster->importing_slots_from,0,
-        sizeof(server.cluster->importing_slots_from));
-    memset(server.cluster->slots,0,
-        sizeof(server.cluster->slots));
+    memset(server.cluster->slots,0, sizeof(server.cluster->slots));
+    clusterCloseAllSlots();
+
+    /* Lock the cluster config file to make sure every node uses
+     * its own nodes.conf. */
+    if (clusterLockConfig(server.cluster_configfile) == REDIS_ERR)
+        exit(1);
 
-    // 载入 nodes.conf 配置文件
+    /* Load or create a new nodes configuration. */
     if (clusterLoadConfig(server.cluster_configfile) == REDIS_ERR) {
         /* No configuration found. We will just use the random name provided
          * by the createClusterNode() function. */
-        // 未载入到配置文件，为节点创建一个随机名字
-        server.cluster->myself =
+        myself = server.cluster->myself =
             createClusterNode(NULL,REDIS_NODE_MYSELF|REDIS_NODE_MASTER);
         redisLog(REDIS_NOTICE,"No cluster configuration found, I'm %.40s",
-            server.cluster->myself->name);
-
-        // 将节点添加到集群中
-        clusterAddNode(server.cluster->myself);
-
+            myself->name);
+        clusterAddNode(myself);
         saveconf = 1;
     }
 
@@ -381,6 +487,19 @@ void clusterInit(void) {
     /* We need a listening TCP port for our cluster messaging needs. */
     // 监听 TCP 端口
     server.cfd_count = 0;
+
+    /* Port sanity check II
+     * The other handshake port check is triggered too late to stop
+     * us from trying to use a too-high cluster port number. */
+    if (server.port > (65535-REDIS_CLUSTER_PORT_INCR)) {
+        redisLog(REDIS_WARNING, "Redis port number too high. "
+                   "Cluster communication port is 10,000 port "
+                   "numbers higher than your Redis port. "
+                   "Your Redis port number must be "
+                   "lower than 55535.");
+        exit(1);
+    }
+
     if (listenToPort(server.port+REDIS_CLUSTER_PORT_INCR,
         server.cfd,&server.cfd_count) == REDIS_ERR)
     {
@@ -400,6 +519,66 @@ void clusterInit(void) {
     /* The slots -> keys map is a sorted set. Init it. */
     // slots -> keys 映射是一个有序集合
     server.cluster->slots_to_keys = zslCreate();
+    resetManualFailover();
+}
+
+/* Reset a node performing a soft or hard reset:
+ *
+ * 1) All other nodes are forget.
+ * 2) All the assigned / open slots are released.
+ * 3) If the node is a slave, it turns into a master.
+ * 5) Only for hard reset: a new Node ID is generated.
+ * 6) Only for hard reset: currentEpoch and configEpoch are set to 0.
+ * 7) The new configuration is saved and the cluster state updated.  */
+void clusterReset(int hard) {
+    dictIterator *di;
+    dictEntry *de;
+    int j;
+
+    /* Turn into master. */
+    if (nodeIsSlave(myself)) {
+        clusterSetNodeAsMaster(myself);
+        replicationUnsetMaster();
+    }
+
+    /* Close slots, reset manual failover state. */
+    clusterCloseAllSlots();
+    resetManualFailover();
+
+    /* Unassign all the slots. */
+    for (j = 0; j < REDIS_CLUSTER_SLOTS; j++) clusterDelSlot(j);
+
+    /* Forget all the nodes, but myself. */
+    di = dictGetSafeIterator(server.cluster->nodes);
+    while((de = dictNext(di)) != NULL) {
+        clusterNode *node = dictGetVal(de);
+
+        if (node == myself) continue;
+        clusterDelNode(node);
+    }
+    dictReleaseIterator(di);
+
+    /* Hard reset only: set epochs to 0, change node ID. */
+    if (hard) {
+        sds oldname;
+
+        server.cluster->currentEpoch = 0;
+        server.cluster->lastVoteEpoch = 0;
+        myself->configEpoch = 0;
+
+        /* To change the Node ID we need to remove the old name from the
+         * nodes table, change the ID, and re-add back with new name. */
+        oldname = sdsnewlen(myself->name, REDIS_CLUSTER_NAMELEN);
+        dictDelete(server.cluster->nodes,oldname);
+        sdsfree(oldname);
+        getRandomHexChars(myself->name, REDIS_CLUSTER_NAMELEN);
+        clusterAddNode(myself);
+    }
+
+    /* Make sure to persist the new config and update the state. */
+    clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
+                         CLUSTER_TODO_UPDATE_STATE|
+                         CLUSTER_TODO_FSYNC_CONFIG);
 }
 
 /* -----------------------------------------------------------------------------
@@ -446,35 +625,42 @@ void freeClusterLink(clusterLink *link) {
 }
 
 // 监听事件处理器
+#define MAX_CLUSTER_ACCEPTS_PER_CALL 1000
 void clusterAcceptHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
     int cport, cfd;
+    int max = MAX_CLUSTER_ACCEPTS_PER_CALL;
     char cip[REDIS_IP_STR_LEN];
     clusterLink *link;
     REDIS_NOTUSED(el);
     REDIS_NOTUSED(mask);
     REDIS_NOTUSED(privdata);
 
-    // accept 连接
-    cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);
-    if (cfd == AE_ERR) {
-        redisLog(REDIS_VERBOSE,"Accepting cluster node: %s", server.neterr);
-        return;
+    /* If the server is starting up, don't accept cluster connections:
+     * UPDATE messages may interact with the database content. */
+    if (server.masterhost == NULL && server.loading) return;
+
+    while(max--) {
+        cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);
+        if (cfd == ANET_ERR) {
+            if (errno != EWOULDBLOCK)
+                redisLog(REDIS_VERBOSE,
+                    "Accepting cluster node: %s", server.neterr);
+            return;
+        }
+        anetNonBlock(NULL,cfd);
+        anetEnableTcpNoDelay(NULL,cfd);
+
+        /* Use non-blocking I/O for cluster messages. */
+        redisLog(REDIS_VERBOSE,"Accepted cluster node %s:%d", cip, cport);
+        /* Create a link object we use to handle the connection.
+         * It gets passed to the readable handler when data is available.
+         * Initiallly the link->node pointer is set to NULL as we don't know
+         * which node is, but the right node is references once we know the
+         * node identity. */
+        link = createClusterLink(NULL);
+        link->fd = cfd;
+        aeCreateFileEvent(server.el,cfd,AE_READABLE,clusterReadHandler,link);
     }
-    anetNonBlock(NULL,cfd);
-    anetEnableTcpNoDelay(NULL,cfd);
-
-    /* Use non-blocking I/O for cluster messages. */
-    /* IPV6: might want to wrap a v6 address in [] */
-    redisLog(REDIS_VERBOSE,"Accepted cluster node %s:%d", cip, cport);
-    /* We need to create a temporary node in order to read the incoming
-     * packet in a valid contest. This node will be released once we
-     * read the packet and reply. */
-    // 创建一个临时节点，并将其用于测试连接是否正常
-    // 一旦连接测试完成，这个临时节点就会被释放
-    link = createClusterLink(NULL);
-    link->fd = cfd;
-    // 关联读事件
-    aeCreateFileEvent(server.el,cfd,AE_READABLE,clusterReadHandler,link);
 }
 
 /* -----------------------------------------------------------------------------
@@ -482,10 +668,31 @@ void clusterAcceptHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
  * -------------------------------------------------------------------------- */
 
 /* We have 16384 hash slots. The hash slot of a given key is obtained
- * as the least significant 14 bits of the crc16 of the key. */
+ * as the least significant 14 bits of the crc16 of the key.
+ *
+ * However if the key contains the {...} pattern, only the part between
+ * { and } is hashed. This may be useful in the future to force certain
+ * keys to be in the same node (assuming no resharding is in progress). */
 // 计算给定键应该被分配到那个槽
 unsigned int keyHashSlot(char *key, int keylen) {
-    return crc16(key,keylen) & 0x3FFF;
+    int s, e; /* start-end indexes of { and } */
+
+    for (s = 0; s < keylen; s++)
+        if (key[s] == '{') break;
+
+    /* No '{' ? Hash the whole key. This is the base case. */
+    if (s == keylen) return crc16(key,keylen) & 0x3FFF;
+
+    /* '{' found? Check if we have the corresponding '}'. */
+    for (e = s+1; e < keylen; e++)
+        if (key[e] == '}') break;
+
+    /* No '}' or nothing betweeen {} ? Hash the whole key. */
+    if (e == keylen || e == s+1) return crc16(key,keylen) & 0x3FFF;
+
+    /* If we are here there is both a { and a } on its right. Hash
+     * what is in the middle between { and }. */
+    return crc16(key+s+1,e-s-1) & 0x3FFF;
 }
 
 /* -----------------------------------------------------------------------------
@@ -745,6 +952,14 @@ void clusterNodeResetSlaves(clusterNode *n) {
     n->slaves = NULL;
 }
 
+int clusterCountNonFailingSlaves(clusterNode *n) {
+    int j, okslaves = 0;
+
+    for (j = 0; j < n->numslaves; j++)
+        if (!nodeFailed(n->slaves[j])) okslaves++;
+    return okslaves;
+}
+
 // 释放节点
 void freeClusterNode(clusterNode *n) {
     sds nodename;
@@ -772,7 +987,6 @@ void freeClusterNode(clusterNode *n) {
 // 将给定 node 添加到节点表里面
 int clusterAddNode(clusterNode *node) {
     int retval;
-    
     // 将 node 添加到当前节点的 nodes 表中
     // 这样接下来当前节点就会创建连向 node 的节点
     retval = dictAdd(server.cluster->nodes,
@@ -824,7 +1038,12 @@ void clusterDelNode(clusterNode *delnode) {
     }
     dictReleaseIterator(di);
 
-    /* 3) Free the node, unlinking it from the cluster. */
+    /* 3) Remove this node from its master's slaves if needed. */
+    // 将节点从它的主节点的从节点列表中移除
+    if (nodeIsSlave(delnode) && delnode->slaveof)
+        clusterNodeRemoveSlave(delnode->slaveof,delnode);
+
+    /* 4) Free the node, unlinking it from the cluster. */
     // 释放节点
     freeClusterNode(delnode);
 }
@@ -833,7 +1052,7 @@ void clusterDelNode(clusterNode *delnode) {
 // 根据名字，查找给定的节点
 clusterNode *clusterLookupNode(char *name) {
     sds s = sdsnewlen(name, REDIS_CLUSTER_NAMELEN);
-    struct dictEntry *de;
+    dictEntry *de;
 
     de = dictFind(server.cluster->nodes,s);
     sdsfree(s);
@@ -855,7 +1074,7 @@ clusterNode *clusterLookupNode(char *name) {
 void clusterRenameNode(clusterNode *node, char *newname) {
     int retval;
     sds s = sdsnewlen(node->name, REDIS_CLUSTER_NAMELEN);
-   
+
     redisLog(REDIS_DEBUG,"Renaming node %.40s into %.40s",
         node->name, newname);
     retval = dictDelete(server.cluster->nodes, s);
@@ -1035,20 +1254,15 @@ void markNodeAsFailingIfNeeded(clusterNode *node) {
     // 标记为 FAIL 所需的节点数量，需要超过集群节点数量的一半
     int needed_quorum = (server.cluster->size / 2) + 1;
 
-    // 不能对未进入 PFAIL 状态的节点标记 FAIL 状态
-    if (!(node->flags & REDIS_NODE_PFAIL)) return; /* We can reach it. */
-
-    // 节点已经是 FAIL 状态的了
-    if (node->flags & REDIS_NODE_FAIL) return; /* Already FAILing. */
+    if (!nodeTimedOut(node)) return; /* We can reach it. */
+    if (nodeFailed(node)) return; /* Already FAILing. */
 
     // 统计将 node 标记为 PFAIL 或者 FAIL 的节点数量（不包括当前节点）
     failures = clusterNodeFailureReportsCount(node);
 
     /* Also count myself as a voter if I'm a master. */
     // 如果当前节点是主节点，那么将当前节点也算在 failures 之内
-    if (server.cluster->myself->flags & REDIS_NODE_MASTER)
-        failures += 1;
-
+    if (nodeIsMaster(myself)) failures++;
     // 报告下线节点的数量不足节点总数的一半，不能将节点判断为 FAIL ，返回
     if (failures < needed_quorum) return; /* No weak agreement from masters. */
 
@@ -1065,9 +1279,7 @@ void markNodeAsFailingIfNeeded(clusterNode *node) {
      * reachable nodes to flag the node as FAIL. */
     // 如果当前节点是主节点的话，那么向其他节点发送报告 node 的 FAIL 信息
     // 让其他节点也将 node 标记为 FAIL
-    if (server.cluster->myself->flags & REDIS_NODE_MASTER)
-        clusterSendFail(node->name);
-
+    if (nodeIsMaster(myself)) clusterSendFail(node->name);
     clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
 }
 
@@ -1081,16 +1293,16 @@ void markNodeAsFailingIfNeeded(clusterNode *node) {
 void clearNodeFailureIfNeeded(clusterNode *node) {
     mstime_t now = mstime();
 
-    redisAssert(node->flags & REDIS_NODE_FAIL);
+    redisAssert(nodeFailed(node));
 
     /* For slaves we always clear the FAIL flag if we can contact the
      * node again. */
     // 如果 FAIL 的是从节点，那么当前节点会直接移除该节点的 FAIL
-    if (node->flags & REDIS_NODE_SLAVE) {
+    if (nodeIsSlave(node) || node->numslots == 0) {
         redisLog(REDIS_NOTICE,
-            "Clear FAIL state for node %.40s: slave is reachable again.",
-                node->name);
-
+            "Clear FAIL state for node %.40s: %s is reachable again.",
+                node->name,
+                nodeIsSlave(node) ? "slave" : "master without slots");
         // 移除
         node->flags &= ~REDIS_NODE_FAIL;
 
@@ -1111,8 +1323,7 @@ void clearNodeFailureIfNeeded(clusterNode *node) {
      *
      * 那么说明 FAIL 节点仍然有槽没有迁移完，那么当前节点移除该节点的 FAIL 标识。
      */
-    if (node->flags & REDIS_NODE_MASTER &&
-        node->numslots > 0 &&
+    if (nodeIsMaster(node) && node->numslots > 0 &&
         (now - node->fail_time) >
         (server.cluster_node_timeout * REDIS_CLUSTER_FAIL_UNDO_TIME_MULT))
     {
@@ -1146,7 +1357,7 @@ int clusterHandshakeInProgress(char *ip, int port) {
         clusterNode *node = dictGetVal(de);
 
         // 跳过非握手状态的节点，之后剩下的都是正在握手的节点
-        if (!(node->flags & REDIS_NODE_HANDSHAKE)) continue;
+        if (!nodeInHandshake(node)) continue;
 
         // 给定 ip 和 port 的节点正在进行握手
         if (!strcasecmp(node->ip,ip) && node->port == port) break;
@@ -1203,11 +1414,11 @@ int clusterStartHandshake(char *ip, int port) {
     if (sa.ss_family == AF_INET)
         inet_ntop(AF_INET,
             (void*)&(((struct sockaddr_in *)&sa)->sin_addr),
-            norm_ip,REDIS_CLUSTER_IPLEN);
+            norm_ip,REDIS_IP_STR_LEN);
     else
         inet_ntop(AF_INET6,
             (void*)&(((struct sockaddr_in6 *)&sa)->sin6_addr),
-            norm_ip,REDIS_CLUSTER_IPLEN);
+            norm_ip,REDIS_IP_STR_LEN);
 
     // 检查节点是否已经发送握手请求，如果是的话，那么直接返回，防止出现重复握手
     if (clusterHandshakeInProgress(norm_ip,port)) {
@@ -1287,12 +1498,8 @@ void clusterProcessGossipSection(clusterMsg *hdr, clusterLink *link) {
         if (node) {
             /* We already know this node.
                Handle failure reports, only when the sender is a master. */
-
             // 如果 sender 是一个主节点，那么我们需要处理下线报告
-            if (sender && sender->flags & REDIS_NODE_MASTER &&
-                node != server.cluster->myself)
-            {
-
+            if (sender && nodeIsMaster(sender) && node != myself) {
                 // 节点处于 FAIL 或者 PFAIL 状态
                 if (flags & (REDIS_NODE_FAIL|REDIS_NODE_PFAIL)) {
 
@@ -1365,22 +1572,9 @@ void clusterProcessGossipSection(clusterMsg *hdr, clusterLink *link) {
 /* IP -> string conversion. 'buf' is supposed to at least be 46 bytes. */
 // 将 ip 转换为字符串
 void nodeIp2String(char *buf, clusterLink *link) {
-    struct sockaddr_storage sa;
-    socklen_t salen = sizeof(sa);
-
-    if (getpeername(link->fd, (struct sockaddr*) &sa, &salen) == -1)
-        redisPanic("getpeername() failed.");
-
-    if (sa.ss_family == AF_INET) {
-        struct sockaddr_in *s = (struct sockaddr_in *)&sa;
-        inet_ntop(AF_INET,(void*)&(s->sin_addr),buf,REDIS_CLUSTER_IPLEN);
-    } else {
-        struct sockaddr_in6 *s = (struct sockaddr_in6 *)&sa;
-        inet_ntop(AF_INET6,(void*)&(s->sin6_addr),buf,REDIS_CLUSTER_IPLEN);
-    }
+    anetPeerToString(link->fd, buf, REDIS_IP_STR_LEN, NULL);
 }
 
-
 /* Update the node address to the IP address that can be extracted
  * from link->fd, and at the specified port.
  *
@@ -1431,11 +1625,8 @@ int nodeUpdateAddressIfNeeded(clusterNode *node, clusterLink *link, int port) {
     /* Check if this is our master and we have to change the
      * replication target as well. */
     // 如果连接来自当前节点（从节点）的主节点，那么根据新地址设置复制对象
-    if (server.cluster->myself->flags & REDIS_NODE_SLAVE &&
-        server.cluster->myself->slaveof == node)
-    {
+    if (nodeIsSlave(myself) && myself->slaveof == node)
         replicationSetMaster(node->ip, node->port);
-    }
     return 1;
 }
 
@@ -1448,7 +1639,7 @@ int nodeUpdateAddressIfNeeded(clusterNode *node, clusterLink *link, int port) {
 void clusterSetNodeAsMaster(clusterNode *n) {
 
     // 已经是主节点了。
-    if (n->flags & REDIS_NODE_MASTER) return;
+    if (nodeIsMaster(n)) return;
 
     // 移除 slaveof
     if (n->slaveof) clusterNodeRemoveSlave(n->slaveof,n);
@@ -1490,11 +1681,18 @@ void clusterSetNodeAsMaster(clusterNode *n) {
  *
  * 根据情况， sender 参数可以是消息的发送者，也可以是消息发送者的主节点。
  */
-void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoch,
-                                  unsigned char *slots)
-{
+void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoch, unsigned char *slots) {
     int j;
     clusterNode *curmaster, *newmaster = NULL;
+    /* The dirty slots list is a list of slots for which we lose the ownership
+     * while having still keys inside. This usually happens after a failover
+     * or after a manual cluster reconfiguration operated by the admin.
+     *
+     * If the update message is not able to demote a master to slave (in this
+     * case we'll resync with the master updating the whole key space), we
+     * need to delete all the keys in the slots we lost ownership. */
+    uint16_t dirty_slots[REDIS_CLUSTER_SLOTS];
+    int dirty_slots_count = 0;
 
     /* Here we set curmaster to this node or the node this node
      * replicates to if it's a slave. In the for loop we are
@@ -1502,30 +1700,44 @@ void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoc
     // 1）如果当前节点是主节点，那么将 curmaster 设置为当前节点
     // 2）如果当前节点是从节点，那么将 curmaster 设置为当前节点正在复制的主节点
     // 稍后在 for 循环中我们将使用 curmaster 检查与当前节点有关的槽是否发生了变动
-    if (server.cluster->myself->flags & REDIS_NODE_MASTER)
-        curmaster = server.cluster->myself;
-    else
-        curmaster = server.cluster->myself->slaveof;
+    curmaster = nodeIsMaster(myself) ? myself : myself->slaveof;
+
+    if (sender == myself) {
+        redisLog(REDIS_WARNING,"Discarding UPDATE message about myself.");
+        return;
+    }
 
     // 更新槽布局
     for (j = 0; j < REDIS_CLUSTER_SLOTS; j++) {
 
         // 如果 slots 中的槽 j 已经被指派，那么执行以下代码
         if (bitmapTestBit(slots,j)) {
-            /* We rebind the slot to the new node claiming it if:
-             * 1) The slot was unassigned.
-             * 2) The new node claims it with a greater configEpoch. */
-
-            // 槽 j 已经指派给 sender 了，略过
+            /* The slot is already bound to the sender of this message. */
             if (server.cluster->slots[j] == sender) continue;
 
-            // 槽 j 未指派
-            // 或者当前槽 j 指派的节点的配置纪元比 sender 的配置纪元要低（可能发生了自动故障转移）
-            // 那么更新槽 j 的指派节点
+            /* The slot is in importing state, it should be modified only
+             * manually via redis-trib (example: a resharding is in progress
+             * and the migrating side slot was already closed and is advertising
+             * a new config. We still want the slot to be closed manually). */
+            if (server.cluster->importing_slots_from[j]) continue;
+
+            /* We rebind the slot to the new node claiming it if:
+             * 1) The slot was unassigned or the new node claims it with a
+             *    greater configEpoch.
+             * 2) We are not currently importing the slot. */
             if (server.cluster->slots[j] == NULL ||
-                server.cluster->slots[j]->configEpoch <
-                senderConfigEpoch)
+                server.cluster->slots[j]->configEpoch < senderConfigEpoch)
             {
+                /* Was this slot mine, and still contains keys? Mark it as
+                 * a dirty slot. */
+                if (server.cluster->slots[j] == myself &&
+                    countKeysInSlot(j) &&
+                    sender != myself)
+                {
+                    dirty_slots[dirty_slots_count] = j;
+                    dirty_slots_count++;
+                }
+
                 // 负责槽 j 的原节点是当前节点的主节点？
                 // 如果是的话，说明故障转移发生了，将当前节点的复制对象设置为新的主节点
                 if (server.cluster->slots[j] == curmaster)
@@ -1563,17 +1775,91 @@ void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoc
      *    这时应该将当前节点设置为新主节点的从节点。
      */
     if (newmaster && curmaster->numslots == 0) {
-        redisLog(REDIS_WARNING,"Configuration change detected. Reconfiguring myself as a replica of %.40s", sender->name);
-
+        redisLog(REDIS_WARNING,
+            "Configuration change detected. Reconfiguring myself "
+            "as a replica of %.40s", sender->name);
         // 将 sender 设置为当前节点的主节点
         clusterSetMaster(sender);
 
         clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
                              CLUSTER_TODO_UPDATE_STATE|
                              CLUSTER_TODO_FSYNC_CONFIG);
+    } else if (dirty_slots_count) {
+        /* If we are here, we received an update message which removed
+         * ownership for certain slots we still have keys about, but still
+         * we are serving some slots, so this master node was not demoted to
+         * a slave.
+         *
+         * In order to maintain a consistent state between keys and slots
+         * we need to remove all the keys from the slots we lost. */
+        for (j = 0; j < dirty_slots_count; j++)
+            delKeysInSlot(dirty_slots[j]);
     }
 }
 
+/* This function is called when this node is a master, and we receive from
+ * another master a configuration epoch that is equal to our configuration
+ * epoch.
+ *
+ * BACKGROUND
+ *
+ * It is not possible that different slaves get the same config
+ * epoch during a failover election, because the slaves need to get voted
+ * by a majority. However when we perform a manual resharding of the cluster
+ * the node will assign a configuration epoch to itself without to ask
+ * for agreement. Usually resharding happens when the cluster is working well
+ * and is supervised by the sysadmin, however it is possible for a failover
+ * to happen exactly while the node we are resharding a slot to assigns itself
+ * a new configuration epoch, but before it is able to propagate it.
+ *
+ * So technically it is possible in this condition that two nodes end with
+ * the same configuration epoch.
+ *
+ * Another possibility is that there are bugs in the implementation causing
+ * this to happen.
+ *
+ * Moreover when a new cluster is created, all the nodes start with the same
+ * configEpoch. This collision resolution code allows nodes to automatically
+ * end with a different configEpoch at startup automatically.
+ *
+ * In all the cases, we want a mechanism that resolves this issue automatically
+ * as a safeguard. The same configuration epoch for masters serving different
+ * set of slots is not harmful, but it is if the nodes end serving the same
+ * slots for some reason (manual errors or software bugs) without a proper
+ * failover procedure.
+ *
+ * In general we want a system that eventually always ends with different
+ * masters having different configuration epochs whatever happened, since
+ * nothign is worse than a split-brain condition in a distributed system.
+ *
+ * BEHAVIOR
+ *
+ * When this function gets called, what happens is that if this node
+ * has the lexicographically smaller Node ID compared to the other node
+ * with the conflicting epoch (the 'sender' node), it will assign itself
+ * the greatest configuration epoch currently detected among nodes plus 1.
+ *
+ * This means that even if there are multiple nodes colliding, the node
+ * with the greatest Node ID never moves forward, so eventually all the nodes
+ * end with a different configuration epoch.
+ */
+void clusterHandleConfigEpochCollision(clusterNode *sender) {
+    /* Prerequisites: nodes have the same configEpoch and are both masters. */
+    if (sender->configEpoch != myself->configEpoch ||
+        !nodeIsMaster(sender) || !nodeIsMaster(myself)) return;
+    /* Don't act if the colliding node has a smaller Node ID. */
+    if (memcmp(sender->name,myself->name,REDIS_CLUSTER_NAMELEN) <= 0) return;
+    /* Get the next ID available at the best of this node knowledge. */
+    server.cluster->currentEpoch++;
+    myself->configEpoch = server.cluster->currentEpoch;
+    clusterSaveConfigOrDie(1);
+    redisLog(REDIS_VERBOSE,
+        "WARNING: configEpoch collision with node %.40s."
+        " Updating my configEpoch to %llu",
+        sender->name,
+        (unsigned long long) myself->configEpoch);
+}
+
 /* When this function is called, there is a packet to process starting
  * at node->rcvbuf. Releasing the buffer is up to the caller, so this
  * function should just handle the higher level stuff of processing the
@@ -1617,7 +1903,8 @@ int clusterProcessPacket(clusterLink *link) {
 
     /* Perform sanity checks */
     // 合法性检查
-    if (totlen < 8) return 1;
+    if (totlen < 16) return 1; /* At least signature, version, totlen, count. */
+    if (ntohs(hdr->ver) != 0) return 1; /* Can't handle versions other than 0.*/
     if (totlen > sdslen(link->rcvbuf)) return 1;
     if (type == CLUSTERMSG_TYPE_PING || type == CLUSTERMSG_TYPE_PONG ||
         type == CLUSTERMSG_TYPE_MEET)
@@ -1641,7 +1928,9 @@ int clusterProcessPacket(clusterLink *link) {
                 ntohl(hdr->data.publish.msg.message_len);
         if (totlen != explen) return 1;
     } else if (type == CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST ||
-               type == CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK) {
+               type == CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK ||
+               type == CLUSTERMSG_TYPE_MFSTART)
+    {
         uint32_t explen = sizeof(clusterMsg)-sizeof(union clusterMsgData);
 
         if (totlen != explen) return 1;
@@ -1655,10 +1944,9 @@ int clusterProcessPacket(clusterLink *link) {
     /* Check if the sender is a known node. */
     // 查找发送者节点
     sender = clusterLookupNode(hdr->sender);
-
     // 节点存在，并且不是 HANDSHAKE 节点
     // 那么个更新节点的配置纪元信息
-    if (sender && !(sender->flags & REDIS_NODE_HANDSHAKE)) {
+    if (sender && !nodeInHandshake(sender)) {
         /* Update our curretEpoch if we see a newer epoch in the cluster. */
         senderCurrentEpoch = ntohu64(hdr->currentEpoch);
         senderConfigEpoch = ntohu64(hdr->configEpoch);
@@ -1667,7 +1955,25 @@ int clusterProcessPacket(clusterLink *link) {
         /* Update the sender configEpoch if it is publishing a newer one. */
         if (senderConfigEpoch > sender->configEpoch) {
             sender->configEpoch = senderConfigEpoch;
-            clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_FSYNC_CONFIG);
+            clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
+                                 CLUSTER_TODO_FSYNC_CONFIG);
+        }
+        /* Update the replication offset info for this node. */
+        sender->repl_offset = ntohu64(hdr->offset);
+        sender->repl_offset_time = mstime();
+        /* If we are a slave performing a manual failover and our master
+         * sent its offset while already paused, populate the MF state. */
+        if (server.cluster->mf_end &&
+            nodeIsSlave(myself) &&
+            myself->slaveof == sender &&
+            hdr->mflags[0] & CLUSTERMSG_FLAG0_PAUSED &&
+            server.cluster->mf_master_offset == 0)
+        {
+            server.cluster->mf_master_offset = sender->repl_offset;
+            redisLog(REDIS_WARNING,
+                "Received replication offset for paused "
+                "master manual failover: %lld",
+                server.cluster->mf_master_offset);
         }
     }
 
@@ -1727,15 +2033,14 @@ int clusterProcessPacket(clusterLink *link) {
 
         // 连接的 clusterNode 结构存在
         if (link->node) {
-
             // 节点处于 HANDSHAKE 状态
-            if (link->node->flags & REDIS_NODE_HANDSHAKE) {
-
+            if (nodeInHandshake(link->node)) {
                 /* If we already have this node, try to change the
                  * IP/port of the node with the new one. */
                 if (sender) {
-                    redisLog(REDIS_WARNING,
-                        "Handshake error: we already know node %.40s, updating the address if needed.", sender->name);
+                    redisLog(REDIS_VERBOSE,
+                        "Handshake: we already know node %.40s, "
+                        "updating the address if needed.", sender->name);
                     // 如果有需要的话，更新节点的地址
                     if (nodeUpdateAddressIfNeeded(sender,link,ntohs(hdr->port)))
                     {
@@ -1791,10 +2096,11 @@ int clusterProcessPacket(clusterLink *link) {
         // 并且发送者不在 HANDSHAKE 状态
         // 那么更新发送者的信息
         if (sender && type == CLUSTERMSG_TYPE_PING &&
-            !(sender->flags & REDIS_NODE_HANDSHAKE) &&
+            !nodeInHandshake(sender) &&
             nodeUpdateAddressIfNeeded(sender,link,ntohs(hdr->port)))
         {
-            clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_UPDATE_STATE);
+            clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
+                                 CLUSTER_TODO_UPDATE_STATE);
         }
 
         /* Update our info about the node */
@@ -1819,14 +2125,13 @@ int clusterProcessPacket(clusterLink *link) {
              * 如果节点的状态为 FAIL ，
              * 那么是否撤销该状态要根据 clearNodeFailureIfNeeded() 函数来决定。
              */
-            if (link->node->flags & REDIS_NODE_PFAIL) {
-                
+            if (nodeTimedOut(link->node)) {
                 // 撤销 PFAIL
                 link->node->flags &= ~REDIS_NODE_PFAIL;
 
                 clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
                                      CLUSTER_TODO_UPDATE_STATE);
-            } else if (link->node->flags & REDIS_NODE_FAIL) {
+            } else if (nodeFailed(link->node)) {
                 // 看是否可以撤销 FAIL
                 clearNodeFailureIfNeeded(link->node);
             }
@@ -1853,8 +2158,7 @@ int clusterProcessPacket(clusterLink *link) {
                 clusterNode *master = clusterLookupNode(hdr->slaveof);
 
                 // sender 由主节点变成了从节点，重新配置 sender
-                if (sender->flags & REDIS_NODE_MASTER) {
-
+                if (nodeIsMaster(sender)) {
                     /* Master turned into a slave! Reconfigure the node. */
 
                     // 删除所有由该节点负责的槽
@@ -1874,9 +2178,9 @@ int clusterProcessPacket(clusterLink *link) {
                 }
 
                 /* Master node changed for this slave? */
-                // 检查 sender 的主节点是否变更
-                if (sender->slaveof != master) {
 
+                // 检查 sender 的主节点是否变更
+                if (master && sender->slaveof != master) {
                     // 如果 sender 之前的主节点不是现在的主节点
                     // 那么在旧主节点的从节点列表中移除 sender
                     if (sender->slaveof)
@@ -1913,8 +2217,7 @@ int clusterProcessPacket(clusterLink *link) {
         int dirty_slots = 0; /* Sender claimed slots don't match my view? */
 
         if (sender) {
-            sender_master = (sender->flags & REDIS_NODE_MASTER) ?
-                sender : sender->slaveof;
+            sender_master = nodeIsMaster(sender) ? sender : sender->slaveof;
             if (sender_master) {
                 dirty_slots = memcmp(sender_master->slots,
                         hdr->myslots,sizeof(hdr->myslots)) != 0;
@@ -1926,9 +2229,8 @@ int clusterProcessPacket(clusterLink *link) {
          *    need to update our configuration. */
         // 如果 sender 是主节点，并且 sender 的槽布局出现了变动
         // 那么检查当前节点对 sender 的槽布局设置，看是否需要进行更新
-        if (sender && sender->flags & REDIS_NODE_MASTER && dirty_slots) {
+        if (sender && nodeIsMaster(sender) && dirty_slots)
             clusterUpdateSlotsConfigWith(sender,senderConfigEpoch,hdr->myslots);
-        }
 
         /* 2) We also check for the reverse condition, that is, the sender
          *    claims to serve slots we know are served by a master with a
@@ -1987,13 +2289,14 @@ int clusterProcessPacket(clusterLink *link) {
                     if (server.cluster->slots[j]->configEpoch >
                         senderConfigEpoch)
                     {
-                        redisLog(REDIS_WARNING,
+                        redisLog(REDIS_VERBOSE,
                             "Node %.40s has old slots configuration, sending "
                             "an UPDATE message about %.40s",
                                 sender->name, server.cluster->slots[j]->name);
 
                         // 向 sender 发送关于槽 j 的更新信息
-                        clusterSendUpdate(sender->link,server.cluster->slots[j]);
+                        clusterSendUpdate(sender->link,
+                            server.cluster->slots[j]);
 
                         /* TODO: instead of exiting the loop send every other
                          * UPDATE packet for other nodes that are the new owner
@@ -2004,6 +2307,15 @@ int clusterProcessPacket(clusterLink *link) {
             }
         }
 
+        /* If our config epoch collides with the sender's try to fix
+         * the problem. */
+        if (sender &&
+            nodeIsMaster(myself) && nodeIsMaster(sender) &&
+            senderConfigEpoch == myself->configEpoch)
+        {
+            clusterHandleConfigEpochCollision(sender);
+        }
+
         /* Get info from the gossip section */
         // 分析并提取出消息 gossip 协议部分的信息
         clusterProcessGossipSection(hdr,link);
@@ -2029,8 +2341,8 @@ int clusterProcessPacket(clusterLink *link) {
                 failing->fail_time = mstime();
                 // 关闭 PFAIL 状态
                 failing->flags &= ~REDIS_NODE_PFAIL;
-
-                clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_UPDATE_STATE);
+                clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
+                                     CLUSTER_TODO_UPDATE_STATE);
             }
         } else {
             redisLog(REDIS_NOTICE,
@@ -2088,8 +2400,7 @@ int clusterProcessPacket(clusterLink *link) {
         // 1） sender 是主节点
         // 2） sender 正在处理至少一个槽
         // 3） sender 的配置纪元大于等于当前节点的配置纪元
-        if (sender->flags & REDIS_NODE_MASTER &&
-            sender->numslots > 0 &&
+        if (nodeIsMaster(sender) && sender->numslots > 0 &&
             senderCurrentEpoch >= server.cluster->failover_auth_epoch)
         {
             // 增加支持票数
@@ -2100,12 +2411,22 @@ int clusterProcessPacket(clusterLink *link) {
             clusterDoBeforeSleep(CLUSTER_TODO_HANDLE_FAILOVER);
         }
 
-    // 这是一条更新消息： sender 告知当前节点，当前节点需要更新某个节点的槽布局
+    } else if (type == CLUSTERMSG_TYPE_MFSTART) {
+        /* This message is acceptable only if I'm a master and the sender
+         * is one of my slaves. */
+        if (!sender || sender->slaveof != myself) return 1;
+        /* Manual failover requested from slaves. Initialize the state
+         * accordingly. */
+        resetManualFailover();
+        server.cluster->mf_end = mstime() + REDIS_CLUSTER_MF_TIMEOUT;
+        server.cluster->mf_slave = sender;
+        pauseClients(mstime()+(REDIS_CLUSTER_MF_TIMEOUT*2));
+        redisLog(REDIS_WARNING,"Manual failover requested by slave %.40s.",
+            sender->name);
     } else if (type == CLUSTERMSG_TYPE_UPDATE) {
         clusterNode *n; /* The node the update is about. */
-
-        // 消息中的配置纪元
-        uint64_t reportedConfigEpoch = ntohu64(hdr->data.update.nodecfg.configEpoch);
+        uint64_t reportedConfigEpoch =
+                    ntohu64(hdr->data.update.nodecfg.configEpoch);
 
         if (!sender) return 1;  /* We don't know the sender. */
 
@@ -2120,7 +2441,12 @@ int clusterProcessPacket(clusterLink *link) {
         /* If in our current config the node is a slave, set it as a master. */
         // 如果节点 n 为从节点，但它的槽配置更新了
         // 那么说明这个节点已经变为主节点，将它设置为主节点
-        if (n->flags & REDIS_NODE_SLAVE) clusterSetNodeAsMaster(n);
+        if (nodeIsSlave(n)) clusterSetNodeAsMaster(n);
+
+        /* Update the node's configEpoch. */
+        n->configEpoch = reportedConfigEpoch;
+        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
+                             CLUSTER_TODO_FSYNC_CONFIG);
 
         /* Check the bitmap of served slots and udpate our
          * config accordingly. */
@@ -2142,7 +2468,7 @@ int clusterProcessPacket(clusterLink *link) {
    this connection and will try to get it connected again.
 
    我们将节点的状态设置为断开状态，Cluster Cron 会根据该状态尝试重新连接节点。
-   
+
    Instead if the node is a temporary node used to accept a query, we
    completely free the node on error. 
 
@@ -2206,23 +2532,24 @@ void clusterReadHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
 
         // 检查输入缓冲区的长度
         rcvbuflen = sdslen(link->rcvbuf);
-
-        // 头信息（4字节）未读入完
-        if (rcvbuflen < 4) {
-            /* First, obtain the first four bytes to get the full message
+        // 头信息（8 字节）未读入完
+        if (rcvbuflen < 8) {
+            /* First, obtain the first 8 bytes to get the full message
              * length. */
-            readlen = 4 - rcvbuflen;
-
-        // 已读入完整的头信息
+            readlen = 8 - rcvbuflen;
+        // 已读入完整的信息
         } else {
             /* Finally read the full message. */
             hdr = (clusterMsg*) link->rcvbuf;
-            if (rcvbuflen == 4) {
-                /* Perform some sanity check on the message length. */
-                // 检查信息长度是否在合理范围
-                if (ntohl(hdr->totlen) < CLUSTERMSG_MIN_LEN) {
+            if (rcvbuflen == 8) {
+                /* Perform some sanity check on the message signature
+                 * and length. */
+                if (memcmp(hdr->sig,"RCmb",4) != 0 ||
+                    ntohl(hdr->totlen) < CLUSTERMSG_MIN_LEN)
+                {
                     redisLog(REDIS_WARNING,
-                        "Bad message length received from Cluster bus.");
+                        "Bad message length or signature received "
+                        "from Cluster bus.");
                     handleLinkIOError(link);
                     return;
                 }
@@ -2255,8 +2582,7 @@ void clusterReadHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
 
         /* Total length obtained? Process this packet. */
         // 检查已读入内容的长度，看是否整条信息已经被读入了
-        if (rcvbuflen >= 4 && rcvbuflen == ntohl(hdr->totlen)) {
-
+        if (rcvbuflen >= 8 && rcvbuflen == ntohl(hdr->totlen)) {
             // 如果是的话，执行处理信息的函数
             if (clusterProcessPacket(link)) {
                 sdsfree(link->rcvbuf);
@@ -2297,7 +2623,7 @@ void clusterSendMessage(clusterLink *link, unsigned char *msg, size_t msglen) {
  * a connected link.
  *
  * 向节点连接的所有其他节点发送信息。
- * 
+ *
  * It is guaranteed that this function will never have as a side effect
  * some node->link to be invalidated, so it is safe to call this function
  * from event handlers that will do stuff with node links later. */
@@ -2328,7 +2654,7 @@ void clusterBroadcastMessage(void *buf, size_t len) {
 void clusterBuildMessageHdr(clusterMsg *hdr, int type) {
     int totlen = 0;
     uint64_t offset;
-    clusterNode *master, *myself = server.cluster->myself;
+    clusterNode *master;
 
     /* If this node is a master, we send its slots bitmap and configEpoch.
      *
@@ -2343,12 +2669,17 @@ void clusterBuildMessageHdr(clusterMsg *hdr, int type) {
      * 因为接收信息的节点通过标识可以知道这个节点是一个从节点，
      * 所以接收信息的节点不会将从节点错认作是主节点。
      */
-    master = (myself->flags & REDIS_NODE_SLAVE && myself->slaveof) ?
+    master = (nodeIsSlave(myself) && myself->slaveof) ?
               myself->slaveof : myself;
 
     // 清零信息头
     memset(hdr,0,sizeof(*hdr));
 
+    hdr->sig[0] = 'R';
+    hdr->sig[1] = 'C';
+    hdr->sig[2] = 'm';
+    hdr->sig[3] = 'b';
+
     // 设置信息类型
     hdr->type = htons(type);
 
@@ -2382,18 +2713,16 @@ void clusterBuildMessageHdr(clusterMsg *hdr, int type) {
 
     /* Set the replication offset. */
     // 设置复制偏移量
-    if (myself->flags & REDIS_NODE_SLAVE) {
-        if (server.master)
-            offset = server.master->reploff;
-        else if (server.cached_master)
-            offset = server.cached_master->reploff;
-        else
-            offset = 0;
-    } else {
+    if (nodeIsSlave(myself))
+        offset = replicationGetSlaveOffset();
+    else
         offset = server.master_repl_offset;
-    }
     hdr->offset = htonu64(offset);
 
+    /* Set the message flags. */
+    if (nodeIsMaster(myself) && server.cluster->mf_end)
+        hdr->mflags[0] |= CLUSTERMSG_FLAG0_PAUSED;
+
     /* Compute the message length for certain messages. For other messages
      * this is up to the caller. */
     // 计算信息的长度
@@ -2437,7 +2766,7 @@ void clusterSendPing(clusterLink *link, int type) {
 
     // 将当前节点的信息（比如名字、地址、端口号、负责处理的槽）记录到消息里面
     clusterBuildMessageHdr(hdr,type);
-        
+
     /* Populate the gossip fields */
     // 从当前节点已知的节点中随机选出两个节点
     // 并通过这条消息捎带给目标节点，从而实现 gossip 协议
@@ -2445,9 +2774,8 @@ void clusterSendPing(clusterLink *link, int type) {
     // 每个节点有 freshnodes 次发送 gossip 信息的机会
     // 每次向目标节点发送 2 个被选中节点的 gossip 信息（gossipcount 计数）
     while(freshnodes > 0 && gossipcount < 3) {
-
         // 从 nodes 字典中随机选出一个节点（被选中节点）
-        struct dictEntry *de = dictGetRandomKey(server.cluster->nodes);
+        dictEntry *de = dictGetRandomKey(server.cluster->nodes);
         clusterNode *this = dictGetVal(de);
 
         clusterMsgDataGossip *gossip;
@@ -2464,7 +2792,7 @@ void clusterSendPing(clusterLink *link, int type) {
          * 4) Disconnected nodes if they don't have configured slots.
          *    因为不处理任何槽而被断开连接的节点 
          */
-        if (this == server.cluster->myself ||
+        if (this == myself ||
             this->flags & (REDIS_NODE_HANDSHAKE|REDIS_NODE_NOADDR) ||
             (this->link == NULL && this->numslots == 0))
         {
@@ -2535,8 +2863,16 @@ void clusterSendPing(clusterLink *link, int type) {
  * 因此广播 PONG 回复在配置发生变化（比如从节点转变为主节点），
  * 并且当前节点想让其他节点尽快知悉这一变化的时候，
  * 就会广播 PONG 回复。
+ *
+ * The 'target' argument specifies the receiving instances using the
+ * defines below:
+ *
+ * CLUSTER_BROADCAST_ALL -> All known instances.
+ * CLUSTER_BROADCAST_LOCAL_SLAVES -> All slaves in my master-slaves ring.
  */
-void clusterBroadcastPong(void) {
+#define CLUSTER_BROADCAST_ALL 0
+#define CLUSTER_BROADCAST_LOCAL_SLAVES 1
+void clusterBroadcastPong(int target) {
     dictIterator *di;
     dictEntry *de;
 
@@ -2547,9 +2883,13 @@ void clusterBroadcastPong(void) {
 
         // 不向未建立连接的节点发送
         if (!node->link) continue;
-        // 不向 HANDSHAKE 以及自己发送
-        if (node->flags & (REDIS_NODE_MYSELF|REDIS_NODE_HANDSHAKE)) continue;
-
+        if (node == myself || nodeInHandshake(node)) continue;
+        if (target == CLUSTER_BROADCAST_LOCAL_SLAVES) {
+            int local_slave =
+                nodeIsSlave(node) && node->slaveof &&
+                (node->slaveof == myself || node->slaveof == myself->slaveof);
+            if (!local_slave) continue;
+        }
         // 发送 PONG 信息
         clusterSendPing(node->link,CLUSTERMSG_TYPE_PONG);
     }
@@ -2708,6 +3048,10 @@ void clusterRequestFailoverAuth(void) {
 
     // 设置信息头（包含当前节点的信息）
     clusterBuildMessageHdr(hdr,CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST);
+    /* If this is a manual failover, set the CLUSTERMSG_FLAG0_FORCEACK bit
+     * in the header to communicate the nodes receiving the message that
+     * they should authorized the failover even if the master is working. */
+    if (server.cluster->mf_end) hdr->mflags[0] |= CLUSTERMSG_FLAG0_FORCEACK;
     totlen = sizeof(clusterMsg)-sizeof(union clusterMsgData);
     hdr->totlen = htonl(totlen);
 
@@ -2729,6 +3073,19 @@ void clusterSendFailoverAuth(clusterNode *node) {
     clusterSendMessage(node->link,buf,totlen);
 }
 
+/* Send a MFSTART message to the specified node. */
+void clusterSendMFStart(clusterNode *node) {
+    unsigned char buf[sizeof(clusterMsg)];
+    clusterMsg *hdr = (clusterMsg*) buf;
+    uint32_t totlen;
+
+    if (!node->link) return;
+    clusterBuildMessageHdr(hdr,CLUSTERMSG_TYPE_MFSTART);
+    totlen = sizeof(clusterMsg)-sizeof(union clusterMsgData);
+    hdr->totlen = htonl(totlen);
+    clusterSendMessage(node->link,buf,totlen);
+}
+
 /* Vote for the node asking for our vote if there are the conditions. */
 // 在条件满足的情况下，为请求进行故障转移的节点 node 进行投票，支持它进行故障转移
 void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
@@ -2744,18 +3101,17 @@ void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
 
     // 请求节点的槽布局
     unsigned char *claimed_slots = request->myslots;
-
+    int force_ack = request->mflags[0] & CLUSTERMSG_FLAG0_FORCEACK;
     int j;
 
     /* IF we are not a master serving at least 1 slot, we don't have the
      * right to vote, as the cluster size in Redis Cluster is the number
      * of masters serving at least one slot, and quorum is the cluster
      * size + 1 */
-    // 非主节点无权投票
-    if (!(server.cluster->myself->flags & REDIS_NODE_MASTER)) return;
 
-    // 没有处理任何槽的节点无权投票
-    if (server.cluster->myself->numslots == 0) return;
+    // 如果节点为从节点，或者是一个没有处理任何槽的主节点，
+    // 那么它没有投票权
+    if (nodeIsSlave(myself) || myself->numslots == 0) return;
 
     /* Request epoch must be >= our currentEpoch. */
     // 请求的配置纪元必须大于等于当前节点的配置纪元
@@ -2763,13 +3119,13 @@ void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
 
     /* I already voted for this epoch? Return ASAP. */
     // 已经投过票了
-    if (server.cluster->last_vote_epoch == server.cluster->currentEpoch) return;
+    if (server.cluster->lastVoteEpoch == server.cluster->currentEpoch) return;
 
-    /* Node must be a slave and its master down. */
-    // 请求节点必须是从服务器，并且它的主节点必须已经 FAIL
-    if (!(node->flags & REDIS_NODE_SLAVE) ||
-        master == NULL ||
-        !(master->flags & REDIS_NODE_FAIL)) return;
+    /* Node must be a slave and its master down.
+     * The master can be non failing if the request is flagged
+     * with CLUSTERMSG_FLAG0_FORCEACK (manual failover). */
+    if (nodeIsMaster(node) || master == NULL ||
+        (!nodeFailed(master) && !force_ack)) return;
 
     /* We did not voted for a slave about this master for two
      * times the node timeout. This is not strictly needed for correctness
@@ -2788,7 +3144,10 @@ void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
 
         // 查找是否有某个槽的配置纪元大于节点请求的纪元
         if (server.cluster->slots[j] == NULL ||
-            server.cluster->slots[j]->configEpoch <= requestConfigEpoch) continue;
+            server.cluster->slots[j]->configEpoch <= requestConfigEpoch)
+        {
+            continue;
+        }
 
         // 如果有的话，说明节点请求的纪元已经过期，没有必要进行投票
         /* If we reached this point we found a slot that in our current slots
@@ -2800,12 +3159,39 @@ void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
     /* We can vote for this slave. */
     // 为节点投票
     clusterSendFailoverAuth(node);
-
     // 更新时间值
-    server.cluster->last_vote_epoch = server.cluster->currentEpoch;
+    server.cluster->lastVoteEpoch = server.cluster->currentEpoch;
     node->slaveof->voted_time = mstime();
 }
 
+/* This function returns the "rank" of this instance, a slave, in the context
+ * of its master-slaves ring. The rank of the slave is given by the number of
+ * other slaves for the same master that have a better replication offset
+ * compared to the local one (better means, greater, so they claim more data).
+ *
+ * A slave with rank 0 is the one with the greatest (most up to date)
+ * replication offset, and so forth. Note that because how the rank is computed
+ * multiple slaves may have the same rank, in case they have the same offset.
+ *
+ * The slave rank is used to add a delay to start an election in order to
+ * get voted and replace a failing master. Slaves with better replication
+ * offsets are more likely to win. */
+int clusterGetSlaveRank(void) {
+    long long myoffset;
+    int j, rank = 0;
+    clusterNode *master;
+
+    redisAssert(nodeIsSlave(myself));
+    master = myself->slaveof;
+    if (master == NULL) return 0; /* Never called by slaves without master. */
+
+    myoffset = replicationGetSlaveOffset();
+    for (j = 0; j < master->numslaves; j++)
+        if (master->slaves[j] != myself &&
+            master->slaves[j]->repl_offset > myoffset) rank++;
+    return rank;
+}
+
 /* This function is called if we are a slave node and our master serving
  * a non-zero amount of hash slots is in FAIL state.
  *
@@ -2827,31 +3213,44 @@ void clusterHandleSlaveFailover(void) {
     mstime_t data_age;
     mstime_t auth_age = mstime() - server.cluster->failover_auth_time;
     int needed_quorum = (server.cluster->size / 2) + 1;
+    int manual_failover = server.cluster->mf_end != 0 &&
+                          server.cluster->mf_can_start;
     int j;
+    mstime_t auth_timeout, auth_retry_time;
+
+    server.cluster->todo_before_sleep &= ~CLUSTER_TODO_HANDLE_FAILOVER;
+
+    /* Compute the failover timeout (the max time we have to send votes
+     * and wait for replies), and the failover retry time (the time to wait
+     * before waiting again.
+     *
+     * Timeout is MIN(NODE_TIMEOUT*2,2000) milliseconds.
+     * Retry is two times the Timeout.
+     */
+    auth_timeout = server.cluster_node_timeout*2;
+    if (auth_timeout < 2000) auth_timeout = 2000;
+    auth_retry_time = auth_timeout*2;
+
+    /* Pre conditions to run the function, that must be met both in case
+     * of an automatic or manual failover:
+     * 1) We are a slave.
+     * 2) Our master is flagged as FAIL, or this is a manual failover.
+     * 3) It is serving slots. */
+    if (nodeIsMaster(myself) ||
+        myself->slaveof == NULL ||
+        (!nodeFailed(myself->slaveof) && !manual_failover) ||
+        myself->slaveof->numslots == 0) return;
 
     /* Set data_age to the number of seconds we are disconnected from
      * the master. */
     // 将 data_age 设置为从节点与主节点的断开秒数
     if (server.repl_state == REDIS_REPL_CONNECTED) {
-        data_age = (server.unixtime - server.master->lastinteraction) * 1000;
+        data_age = (mstime_t)(server.unixtime - server.master->lastinteraction)
+                   * 1000;
     } else {
-        data_age = (server.unixtime - server.repl_down_since) * 1000;
+        data_age = (mstime_t)(server.unixtime - server.repl_down_since) * 1000;
     }
 
-    /* Pre conditions to run the function:
-     * 执行函数的条件：
-     * 1) We are a slave.
-     *    当前节点是从节点
-     * 2) Our master is flagged as FAIL.
-     *    这个从节点的主节点状态为 FAIL
-     * 3) It is serving slots. 
-     *    FAIL 的主节点正在处理某个（或某些）槽
-     */
-    if (!(server.cluster->myself->flags & REDIS_NODE_SLAVE) ||
-        server.cluster->myself->slaveof == NULL ||
-        !(server.cluster->myself->slaveof->flags & REDIS_NODE_FAIL) ||
-        server.cluster->myself->slaveof->numslots == 0) return;
-
     /* Remove the node timeout from the data age as it is fine that we are
      * disconnected from our master at least for the time it was down to be
      * flagged as FAIL, that's the baseline. */
@@ -2861,31 +3260,70 @@ void clusterHandleSlaveFailover(void) {
 
     /* Check if our data is recent enough. For now we just use a fixed
      * constant of ten times the node timeout since the cluster should
-     * react much faster to a master down. */
+     * react much faster to a master down.
+     *
+     * Check bypassed for manual failovers. */
     // 检查这个从节点的数据是否较新：
     // 目前的检测办法是断线时间不能超过 node timeout 的十倍
     if (data_age >
-        (server.repl_ping_slave_period * 1000) +
+        ((mstime_t)server.repl_ping_slave_period * 1000) +
         (server.cluster_node_timeout * REDIS_CLUSTER_SLAVE_VALIDITY_MULT))
-        return;
-
-    /* Compute the time at which we can start an election. */
-    // 在开始故障转移之前，先等待一段时间
-    if (auth_age >
-        server.cluster_node_timeout * REDIS_CLUSTER_FAILOVER_AUTH_RETRY_MULT)
     {
+        if (!manual_failover) return;
+    }
+
+    /* If the previous failover attempt timedout and the retry time has
+     * elapsed, we can setup a new one. */
+    if (auth_age > auth_retry_time) {
         server.cluster->failover_auth_time = mstime() +
             500 + /* Fixed delay of 500 milliseconds, let FAIL msg propagate. */
-            data_age / 10 + /* Add 100 milliseconds for every second of age. */
             random() % 500; /* Random delay between 0 and 500 milliseconds. */
         server.cluster->failover_auth_count = 0;
         server.cluster->failover_auth_sent = 0;
+        server.cluster->failover_auth_rank = clusterGetSlaveRank();
+        /* We add another delay that is proportional to the slave rank.
+         * Specifically 1 second * rank. This way slaves that have a probably
+         * less updated replication offset, are penalized. */
+        server.cluster->failover_auth_time +=
+            server.cluster->failover_auth_rank * 1000;
+        /* However if this is a manual failover, no delay is needed. */
+        if (server.cluster->mf_end) {
+            server.cluster->failover_auth_time = mstime();
+            server.cluster->failover_auth_rank = 0;
+        }
         redisLog(REDIS_WARNING,
-            "Start of election delayed for %lld milliseconds.",
-            server.cluster->failover_auth_time - mstime());
+            "Start of election delayed for %lld milliseconds "
+            "(rank #%d, offset %lld).",
+            server.cluster->failover_auth_time - mstime(),
+            server.cluster->failover_auth_rank,
+            replicationGetSlaveOffset());
+        /* Now that we have a scheduled election, broadcast our offset
+         * to all the other slaves so that they'll updated their offsets
+         * if our offset is better. */
+        clusterBroadcastPong(CLUSTER_BROADCAST_LOCAL_SLAVES);
         return;
     }
 
+    /* It is possible that we received more updated offsets from other
+     * slaves for the same master since we computed our election delay.
+     * Update the delay if our rank changed.
+     *
+     * Not performed if this is a manual failover. */
+    if (server.cluster->failover_auth_sent == 0 &&
+        server.cluster->mf_end == 0)
+    {
+        int newrank = clusterGetSlaveRank();
+        if (newrank > server.cluster->failover_auth_rank) {
+            long long added_delay =
+                (newrank - server.cluster->failover_auth_rank) * 1000;
+            server.cluster->failover_auth_time += added_delay;
+            server.cluster->failover_auth_rank = newrank;
+            redisLog(REDIS_WARNING,
+                "Slave rank updated to #%d, added %lld milliseconds of delay.",
+                newrank, added_delay);
+        }
+    }
+
     /* Return ASAP if we can't still start the election. */
     // 如果执行故障转移的时间未到，先返回
     if (mstime() < server.cluster->failover_auth_time) return;
@@ -2894,7 +3332,7 @@ void clusterHandleSlaveFailover(void) {
     // 如果距离应该执行故障转移的时间已经过了很久
     // 那么不应该再执行故障转移了（因为可能已经没有需要了）
     // 直接返回
-    if (auth_age > server.cluster_node_timeout) return;
+    if (auth_age > auth_timeout) return;
 
     /* Ask for votes if needed. */
     // 向其他节点发送故障转移请求
@@ -2929,9 +3367,8 @@ void clusterHandleSlaveFailover(void) {
     /* Check if we reached the quorum. */
     // 如果当前节点获得了足够多的投票，那么对下线主节点进行故障转移
     if (server.cluster->failover_auth_count >= needed_quorum) {
-
         // 旧主节点
-        clusterNode *oldmaster = server.cluster->myself->slaveof;
+        clusterNode *oldmaster = myself->slaveof;
 
         redisLog(REDIS_WARNING,
             "Failover election won: I'm the new master.");
@@ -2942,15 +3379,7 @@ void clusterHandleSlaveFailover(void) {
          * 1) Turn this node into a master. 
          *    将当前节点的身份由从节点改为主节点
          */
-        // 在 slaves 字典中移除当前节点
-        clusterNodeRemoveSlave(server.cluster->myself->slaveof,
-                               server.cluster->myself);
-        // 关闭从节点标记
-        server.cluster->myself->flags &= ~REDIS_NODE_SLAVE;
-        // 打开主节点标记
-        server.cluster->myself->flags |= REDIS_NODE_MASTER;
-        // 清空 slaveof 对象
-        server.cluster->myself->slaveof = NULL;
+        clusterSetNodeAsMaster(myself);
         // 让从节点取消复制，成为新的主节点
         replicationUnsetMaster();
 
@@ -2961,14 +3390,13 @@ void clusterHandleSlaveFailover(void) {
                 // 将槽设置为未分配的
                 clusterDelSlot(j);
                 // 将槽的负责人设置为当前节点
-                clusterAddSlot(server.cluster->myself,j);
+                clusterAddSlot(myself,j);
             }
         }
 
         /* 3) Update my configEpoch to the epoch of the election. */
         // 更新集群配置纪元
-        server.cluster->myself->configEpoch =
-            server.cluster->failover_auth_epoch;
+        myself->configEpoch = server.cluster->failover_auth_epoch;
 
         /* 4) Update state and save config. */
         // 更新节点状态
@@ -2980,7 +3408,172 @@ void clusterHandleSlaveFailover(void) {
          *    accordingly and detect that we switched to master role. */
         // 向所有节点发送 PONG 信息
         // 让它们可以知道当前节点已经升级为主节点了
-        clusterBroadcastPong();
+        clusterBroadcastPong(CLUSTER_BROADCAST_ALL);
+
+        /* 6) If there was a manual failover in progress, clear the state. */
+        resetManualFailover();
+    }
+}
+
+/* -----------------------------------------------------------------------------
+ * CLUSTER slave migration
+ *
+ * Slave migration is the process that allows a slave of a master that is
+ * already covered by at least another slave, to "migrate" to a master that
+ * is orpaned, that is, left with no working slaves.
+ * -------------------------------------------------------------------------- */
+
+/* This function is responsible to decide if this replica should be migrated
+ * to a different (orphaned) master. It is called by the clusterCron() function
+ * only if:
+ *
+ * 1) We are a slave node.
+ * 2) It was detected that there is at least one orphaned master in
+ *    the cluster.
+ * 3) We are a slave of one of the masters with the greatest number of
+ *    slaves.
+ *
+ * This checks are performed by the caller since it requires to iterate
+ * the nodes anyway, so we spend time into clusterHandleSlaveMigration()
+ * if definitely needed.
+ *
+ * The fuction is called with a pre-computed max_slaves, that is the max
+ * number of working (not in FAIL state) slaves for a single master.
+ *
+ * Additional conditions for migration are examined inside the function.
+ */
+void clusterHandleSlaveMigration(int max_slaves) {
+    int j, okslaves = 0;
+    clusterNode *mymaster = myself->slaveof, *target = NULL, *candidate = NULL;
+    dictIterator *di;
+    dictEntry *de;
+
+    /* Step 1: Don't migrate if the cluster state is not ok. */
+    if (server.cluster->state != REDIS_CLUSTER_OK) return;
+
+    /* Step 2: Don't migrate if my master will not be left with at least
+     *         'migration-barrier' slaves after my migration. */
+    if (mymaster == NULL) return;
+    for (j = 0; j < mymaster->numslaves; j++)
+        if (!nodeFailed(mymaster->slaves[j]) &&
+            !nodeTimedOut(mymaster->slaves[j])) okslaves++;
+    if (okslaves <= server.cluster_migration_barrier) return;
+
+    /* Step 3: Idenitfy a candidate for migration, and check if among the
+     * masters with the greatest number of ok slaves, I'm the one with the
+     * smaller node ID.
+     *
+     * Note that this means that eventually a replica migration will occurr
+     * since slaves that are reachable again always have their FAIL flag
+     * cleared. At the same time this does not mean that there are no
+     * race conditions possible (two slaves migrating at the same time), but
+     * this is extremely unlikely to happen, and harmless. */
+    candidate = myself;
+    di = dictGetSafeIterator(server.cluster->nodes);
+    while((de = dictNext(di)) != NULL) {
+        clusterNode *node = dictGetVal(de);
+        int okslaves;
+
+        /* Only iterate over working masters. */
+        if (nodeIsSlave(node) || nodeFailed(node)) continue;
+        okslaves = clusterCountNonFailingSlaves(node);
+
+        if (okslaves == 0 && target == NULL && node->numslots > 0)
+            target = node;
+
+        if (okslaves == max_slaves) {
+            for (j = 0; j < node->numslaves; j++) {
+                if (memcmp(node->slaves[j]->name,
+                           candidate->name,
+                           REDIS_CLUSTER_NAMELEN) < 0)
+                {
+                    candidate = node->slaves[j];
+                }
+            }
+        }
+    }
+
+    /* Step 4: perform the migration if there is a target, and if I'm the
+     * candidate. */
+    if (target && candidate == myself) {
+        redisLog(REDIS_WARNING,"Migrating to orphaned master %.40s",
+            target->name);
+        clusterSetMaster(target);
+    }
+}
+
+/* -----------------------------------------------------------------------------
+ * CLUSTER manual failover
+ *
+ * This are the important steps performed by slaves during a manual failover:
+ * 1) User send CLUSTER FAILOVER command. The failover state is initialized
+ *    setting mf_end to the millisecond unix time at which we'll abort the
+ *    attempt.
+ * 2) Slave sends a MFSTART message to the master requesting to pause clients
+ *    for two times the manual failover timeout REDIS_CLUSTER_MF_TIMEOUT.
+ *    When master is paused for manual failover, it also starts to flag
+ *    packets with CLUSTERMSG_FLAG0_PAUSED.
+ * 3) Slave waits for master to send its replication offset flagged as PAUSED.
+ * 4) If slave received the offset from the master, and its offset matches,
+ *    mf_can_start is set to 1, and clusterHandleSlaveFailover() will perform
+ *    the failover as usually, with the difference that the vote request
+ *    will be modified to force masters to vote for a slave that has a
+ *    working master.
+ *
+ * From the point of view of the master things are simpler: when a
+ * PAUSE_CLIENTS packet is received the master sets mf_end as well and
+ * the sender in mf_slave. During the time limit for the manual failover
+ * the master will just send PINGs more often to this slave, flagged with
+ * the PAUSED flag, so that the slave will set mf_master_offset when receiving
+ * a packet from the master with this flag set.
+ *
+ * The gaol of the manual failover is to perform a fast failover without
+ * data loss due to the asynchronous master-slave replication.
+ * -------------------------------------------------------------------------- */
+
+/* Reset the manual failover state. This works for both masters and slavesa
+ * as all the state about manual failover is cleared.
+ *
+ * The function can be used both to initialize the manual failover state at
+ * startup or to abort a manual failover in progress. */
+void resetManualFailover(void) {
+    if (server.cluster->mf_end && clientsArePaused()) {
+        server.clients_pause_end_time = 0;
+        clientsArePaused(); /* Just use the side effect of the function. */
+    }
+    server.cluster->mf_end = 0; /* No manual failover in progress. */
+    server.cluster->mf_can_start = 0;
+    server.cluster->mf_slave = NULL;
+    server.cluster->mf_master_offset = 0;
+}
+
+/* If a manual failover timed out, abort it. */
+void manualFailoverCheckTimeout(void) {
+    if (server.cluster->mf_end && server.cluster->mf_end < mstime()) {
+        redisLog(REDIS_WARNING,"Manual failover timed out.");
+        resetManualFailover();
+    }
+}
+
+/* This function is called from the cluster cron function in order to go
+ * forward with a manual failover state machine. */
+void clusterHandleManualFailover(void) {
+    /* Return ASAP if no manual failover is in progress. */
+    if (server.cluster->mf_end == 0) return;
+
+    /* If mf_can_start is non-zero, the failover was alrady triggered so the
+     * next steps are performed by clusterHandleSlaveFailover(). */
+    if (server.cluster->mf_can_start) return;
+
+    if (server.cluster->mf_master_offset == 0) return; /* Wait for offset... */
+
+    if (server.cluster->mf_master_offset == replicationGetSlaveOffset()) {
+        /* Our replication offset matches the master replication offset
+         * announced after clients were paused. We can start the failover. */
+        server.cluster->mf_can_start = 1;
+        redisLog(REDIS_WARNING,
+            "All master replication stream processed, "
+            "manual failover can start.");
     }
 }
 
@@ -2993,7 +3586,10 @@ void clusterHandleSlaveFailover(void) {
 void clusterCron(void) {
     dictIterator *di;
     dictEntry *de;
-    int j, update_state = 0;
+    int update_state = 0;
+    int orphaned_masters; /* How many masters there are without ok slaves. */
+    int max_slaves; /* Max number of ok slaves for a single master. */
+    int this_slaves; /* Number of ok slaves for our master (if we are slave). */
     mstime_t min_pong = 0, now = mstime();
     clusterNode *min_pong_node = NULL;
     // 迭代计数器，一个静态变量
@@ -3027,9 +3623,7 @@ void clusterCron(void) {
         /* A Node in HANDSHAKE state has a limited lifespan equal to the
          * configured node timeout. */
         // 如果 handshake 节点已超时，释放它
-        if (node->flags & REDIS_NODE_HANDSHAKE &&
-            now - node->ctime > handshake_timeout)
-        {
+        if (nodeInHandshake(node) && now - node->ctime > handshake_timeout) {
             freeClusterNode(node);
             continue;
         }
@@ -3040,15 +3634,21 @@ void clusterCron(void) {
             mstime_t old_ping_sent;
             clusterLink *link;
 
-            // 创建连接
-            fd = anetTcpNonBlockConnect(server.neterr, node->ip,
-                node->port+REDIS_CLUSTER_PORT_INCR);
-            if (fd == -1) continue;
+            fd = anetTcpNonBlockBindConnect(server.neterr, node->ip,
+                node->port+REDIS_CLUSTER_PORT_INCR,
+                    server.bindaddr_count ? server.bindaddr[0] : NULL);
+            if (fd == -1) {
+                redisLog(REDIS_DEBUG, "Unable to connect to "
+                    "Cluster Node [%s]:%d -> %s", node->ip,
+                    node->port+REDIS_CLUSTER_PORT_INCR,
+                    server.neterr);
+                continue;
+            }
             link = createClusterLink(node);
             link->fd = fd;
             node->link = link;
-            // 关联读事件处理器
-            aeCreateFileEvent(server.el,link->fd,AE_READABLE,clusterReadHandler,link);
+            aeCreateFileEvent(server.el,link->fd,AE_READABLE,
+                    clusterReadHandler,link);
             /* Queue a PING in the new connection ASAP: this is crucial
              * to avoid false positives in failure detection.
              *
@@ -3087,7 +3687,8 @@ void clusterCron(void) {
              */
             node->flags &= ~REDIS_NODE_MEET;
 
-            redisLog(REDIS_DEBUG,"Connecting with Node %.40s at %s:%d", node->name, node->ip, node->port+REDIS_CLUSTER_PORT_INCR);
+            redisLog(REDIS_DEBUG,"Connecting with Node %.40s at %s:%d",
+                    node->name, node->ip, node->port+REDIS_CLUSTER_PORT_INCR);
         }
     }
     dictReleaseIterator(di);
@@ -3096,6 +3697,8 @@ void clusterCron(void) {
      * one random node every second. */
     // clusterCron() 每执行 10 次（至少间隔一秒钟），就向一个随机节点发送 gossip 信息
     if (!(iteration % 10)) {
+        int j;
+
         /* Check a few random nodes and ping the one with the oldest
          * pong_received time. */
         // 随机 5 个节点，选出其中一个
@@ -3108,7 +3711,9 @@ void clusterCron(void) {
             /* Don't ping nodes disconnected or with a ping currently active. */
             // 不要 PING 连接断开的节点，也不要 PING 最近已经 PING 过的节点
             if (this->link == NULL || this->ping_sent != 0) continue;
-            if (this->flags & (REDIS_NODE_MYSELF|REDIS_NODE_HANDSHAKE)) continue;
+
+            if (this->flags & (REDIS_NODE_MYSELF|REDIS_NODE_HANDSHAKE))
+                continue;
 
             // 选出 5 个随机节点中最近一次接收 PONG 回复距离现在最旧的节点
             if (min_pong_node == NULL || min_pong > this->pong_received) {
@@ -3124,8 +3729,16 @@ void clusterCron(void) {
         }
     }
 
-    /* Iterate nodes to check if we need to flag something as failing */
     // 遍历所有节点，检查是否需要将某个节点标记为下线
+    /* Iterate nodes to check if we need to flag something as failing.
+     * This loop is also responsible to:
+     * 1) Check if there are orphaned masters (masters without non failing
+     *    slaves).
+     * 2) Count the max number of non failing slaves for a single master.
+     * 3) Count the number of slaves for our master, if we are a slave. */
+    orphaned_masters = 0;
+    max_slaves = 0;
+    this_slaves = 0;
     di = dictGetSafeIterator(server.cluster->nodes);
     while((de = dictNext(di)) != NULL) {
         clusterNode *node = dictGetVal(de);
@@ -3137,6 +3750,17 @@ void clusterCron(void) {
             (REDIS_NODE_MYSELF|REDIS_NODE_NOADDR|REDIS_NODE_HANDSHAKE))
                 continue;
 
+        /* Orphaned master check, useful only if the current instance
+         * is a slave that may migrate to another master. */
+        if (nodeIsSlave(myself) && nodeIsMaster(node) && !nodeFailed(node)) {
+            int okslaves = clusterCountNonFailingSlaves(node);
+
+            if (okslaves == 0 && node->numslots > 0) orphaned_masters++;
+            if (okslaves > max_slaves) max_slaves = okslaves;
+            if (nodeIsSlave(myself) && myself->slaveof == node)
+                this_slaves = okslaves;
+        }
+
         /* If we are waiting for the PONG more than half the cluster
          * timeout, reconnect the link: maybe there is a connection
          * issue even if the node is alive. */
@@ -3171,6 +3795,17 @@ void clusterCron(void) {
             continue;
         }
 
+        /* If we are a master and one of the slaves requested a manual
+         * failover, ping it continuously. */
+        if (server.cluster->mf_end &&
+            nodeIsMaster(myself) &&
+            server.cluster->mf_slave == node &&
+            node->link)
+        {
+            clusterSendPing(node->link, CLUSTERMSG_TYPE_PING);
+            continue;
+        }
+
         /* Check only if we have an active ping for this instance. */
         // 以下代码只在节点发送了 PING 命令的情况下执行
         if (node->ping_sent == 0) continue;
@@ -3200,18 +3835,30 @@ void clusterCron(void) {
      * enable it if we know the address of our master and it appears to
      * be up. */
     // 如果从节点没有在复制主节点，那么对从节点进行设置
-    if (server.cluster->myself->flags & REDIS_NODE_SLAVE &&
+    if (nodeIsSlave(myself) &&
         server.masterhost == NULL &&
-        server.cluster->myself->slaveof &&
-        !(server.cluster->myself->slaveof->flags & REDIS_NODE_NOADDR))
+        myself->slaveof &&
+        nodeHasAddr(myself->slaveof))
     {
-        replicationSetMaster(server.cluster->myself->slaveof->ip,
-                             server.cluster->myself->slaveof->port);
+        replicationSetMaster(myself->slaveof->ip, myself->slaveof->port);
+    }
+
+    /* Abourt a manual failover if the timeout is reached. */
+    manualFailoverCheckTimeout();
+
+    if (nodeIsSlave(myself)) {
+        clusterHandleManualFailover();
+        clusterHandleSlaveFailover();
+        /* If there are orphaned slaves, and we are a slave among the masters
+         * with the max number of non-failing slaves, consider migrating to
+         * the orphaned masters. Note that it does not make sense to try
+         * a migration if there is no master with at least *two* working
+         * slaves. */
+        if (orphaned_masters && max_slaves >= 2 && this_slaves == max_slaves)
+            clusterHandleSlaveMigration(max_slaves);
     }
 
-    // 如果条件满足的话，执行故障转移
-    clusterHandleSlaveFailover();
-    // 更新节点状态
+    // 更新集群状态
     if (update_state || server.cluster->state == REDIS_CLUSTER_FAIL)
         clusterUpdateState();
 }
@@ -3241,12 +3888,13 @@ void clusterBeforeSleep(void) {
     /* Save the config, possibly using fsync. */
     // 保存 nodes.conf 配置文件
     if (server.cluster->todo_before_sleep & CLUSTER_TODO_SAVE_CONFIG) {
-        int fsync = server.cluster->todo_before_sleep & CLUSTER_TODO_FSYNC_CONFIG;
+        int fsync = server.cluster->todo_before_sleep &
+                    CLUSTER_TODO_FSYNC_CONFIG;
         clusterSaveConfigOrDie(fsync);
     }
 
-    /* Reset our flags. */
-    // 重置 flag
+    /* Reset our flags (not strictly needed since every single function
+     * called for flags set should be able to clear its flag). */
     server.cluster->todo_before_sleep = 0;
 }
 
@@ -3371,6 +4019,15 @@ int clusterDelNodeSlots(clusterNode *node) {
     return deleted;
 }
 
+/* Clear the migrating / importing state for all the slots.
+ * This is useful at initialization and when turning a master into slave. */
+void clusterCloseAllSlots(void) {
+    memset(server.cluster->migrating_slots_to,0,
+        sizeof(server.cluster->migrating_slots_to));
+    memset(server.cluster->importing_slots_from,0,
+        sizeof(server.cluster->importing_slots_from));
+}
+
 /* -----------------------------------------------------------------------------
  * Cluster state evaluation function
  * -------------------------------------------------------------------------- */
@@ -3390,6 +4047,8 @@ void clusterUpdateState(void) {
     static mstime_t among_minority_time;
     static mstime_t first_call_time = 0;
 
+    server.cluster->todo_before_sleep &= ~CLUSTER_TODO_UPDATE_STATE;
+
     /* If this is a master node, wait some time before turning the state
      * into OK, since it is not a good idea to rejoin the cluster as a writable
      * master, after a reboot, without giving the cluster a chance to
@@ -3397,7 +4056,7 @@ void clusterUpdateState(void) {
      * the first call to this function and not since the server start, in order
      * to don't count the DB loading time. */
     if (first_call_time == 0) first_call_time = mstime();
-    if (server.cluster->myself->flags & REDIS_NODE_MASTER &&
+    if (nodeIsMaster(myself) &&
         mstime() - first_call_time < REDIS_CLUSTER_WRITABLE_DELAY) return;
 
     /* Start assuming the state is OK. We'll turn it into FAIL if there
@@ -3433,7 +4092,7 @@ void clusterUpdateState(void) {
         while((de = dictNext(di)) != NULL) {
             clusterNode *node = dictGetVal(de);
 
-            if (node->flags & REDIS_NODE_MASTER && node->numslots) {
+            if (nodeIsMaster(node) && node->numslots) {
                 server.cluster->size++;
                 if (node->flags & (REDIS_NODE_FAIL|REDIS_NODE_PFAIL))
                     unreachable_masters++;
@@ -3451,7 +4110,7 @@ void clusterUpdateState(void) {
      */
     {
         int needed_quorum = (server.cluster->size / 2) + 1;
-        
+
         if (unreachable_masters >= needed_quorum) {
             new_state = REDIS_CLUSTER_FAIL;
             among_minority_time = mstime();
@@ -3473,7 +4132,7 @@ void clusterUpdateState(void) {
             rejoin_delay = REDIS_CLUSTER_MIN_REJOIN_DELAY;
 
         if (new_state == REDIS_CLUSTER_OK &&
-            server.cluster->myself->flags & REDIS_NODE_MASTER &&
+            nodeIsMaster(myself) &&
             mstime() - among_minority_time < rejoin_delay)
         {
             return;
@@ -3519,7 +4178,7 @@ int verifyClusterConfigWithData(void) {
     /* If this node is a slave, don't perform the check at all as we
      * completely depend on the replication stream. */
     // 不对从节点进行检查
-    if (server.cluster->myself->flags & REDIS_NODE_SLAVE) return REDIS_OK;
+    if (nodeIsSlave(myself)) return REDIS_OK;
 
     /* Make sure we only have keys in DB0. */
     // 确保只有 0 号数据库有数据
@@ -3536,7 +4195,7 @@ int verifyClusterConfigWithData(void) {
          * In both cases check the next slot as the configuration makes
          * sense. */
         // 跳过正在导入的槽
-        if (server.cluster->slots[j] == server.cluster->myself ||
+        if (server.cluster->slots[j] == myself ||
             server.cluster->importing_slots_from[j] != NULL) continue;
 
         /* If we are here data and cluster config don't agree, and we have
@@ -3550,7 +4209,7 @@ int verifyClusterConfigWithData(void) {
             redisLog(REDIS_WARNING, "I've keys about slot %d that is "
                                     "unassigned. Taking responsability "
                                     "for it.",j);
-            clusterAddSlot(server.cluster->myself,j);
+            clusterAddSlot(myself,j);
         } else {
             // 如果一个槽已经被其他节点接管
             // 那么将槽中的资料发送给对方
@@ -3571,21 +4230,16 @@ int verifyClusterConfigWithData(void) {
  * SLAVE nodes handling
  * -------------------------------------------------------------------------- */
 
-/* Set the specified node 'n' as master. Setup the node as a slave if
- * needed. */
-// 将节点 n 设置为当前节点的主节点
+/* Set the specified node 'n' as master for this node.
+ * If this node is currently a master, it is turned into a slave. */
 void clusterSetMaster(clusterNode *n) {
-
-    // 指向当前节点
-    clusterNode *myself = server.cluster->myself;
-
     redisAssert(n != myself);
     redisAssert(myself->numslots == 0);
 
-    // 设置当前节点的标识值
-    if (myself->flags & REDIS_NODE_MASTER) {
+    if (nodeIsMaster(myself)) {
         myself->flags &= ~REDIS_NODE_MASTER;
         myself->flags |= REDIS_NODE_SLAVE;
+        clusterCloseAllSlots();
     } else {
         if (myself->slaveof)
             clusterNodeRemoveSlave(myself->slaveof,myself);
@@ -3597,6 +4251,7 @@ void clusterSetMaster(clusterNode *n) {
     // 设置主节点的 IP 和地址，开始对它进行复制
     clusterNodeAddSlave(n,myself);
     replicationSetMaster(n->ip, n->port);
+    resetManualFailover();
 }
 
 /* -----------------------------------------------------------------------------
@@ -3651,7 +4306,7 @@ sds clusterGenNodeDescription(clusterNode *node) {
             if (start == -1) start = j;
         }
         if (start != -1 && (!bit || j == REDIS_CLUSTER_SLOTS-1)) {
-            if (j == REDIS_CLUSTER_SLOTS-1) j++;
+            if (bit && j == REDIS_CLUSTER_SLOTS-1) j++;
 
             if (start == j-1) {
                 ci = sdscatprintf(ci," %d",start);
@@ -3751,11 +4406,12 @@ void clusterCommand(redisClient *c) {
         /* CLUSTER MEET <ip> <port> */
         // 将给定地址的节点添加到当前节点所处的集群里面
 
-        long port;
+        long long port;
 
         // 检查 port 参数的合法性
-        if (getLongFromObjectOrReply(c, c->argv[3], &port, NULL) != REDIS_OK) {
-            addReplyError(c,"Invalid TCP port specified");
+        if (getLongLongFromObject(c->argv[3], &port) != REDIS_OK) {
+            addReplyErrorFormat(c,"Invalid TCP port specified: %s",
+                                (char*)c->argv[3]->ptr);
             return;
         }
 
@@ -3764,7 +4420,8 @@ void clusterCommand(redisClient *c) {
             errno == EINVAL)
         {
             // 连接失败
-            addReplyError(c,"Invalid node address specified");
+            addReplyErrorFormat(c,"Invalid node address specified: %s:%s",
+                            (char*)c->argv[2]->ptr, (char*)c->argv[3]->ptr);
         } else {
             // 连接成功
             addReply(c,shared.ok);
@@ -3790,7 +4447,7 @@ void clusterCommand(redisClient *c) {
             return;
         }
         // 删除所有由该节点处理的槽
-        clusterDelNodeSlots(server.cluster->myself);
+        clusterDelNodeSlots(myself);
         clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
         addReply(c,shared.ok);
 
@@ -3860,7 +4517,7 @@ void clusterCommand(redisClient *c) {
 
                 // 添加或者删除指定 slot
                 retval = del ? clusterDelSlot(j) :
-                               clusterAddSlot(server.cluster->myself,j);
+                               clusterAddSlot(myself,j);
                 redisAssertWithInfo(c,NULL,retval == REDIS_OK);
             }
         }
@@ -3882,9 +4539,8 @@ void clusterCommand(redisClient *c) {
         // CLUSTER SETSLOT <slot> MIGRATING <node id>
         // 将本节点的槽 slot 迁移至 node id 所指定的节点
         if (!strcasecmp(c->argv[3]->ptr,"migrating") && c->argc == 5) {
-
             // 被迁移的槽必须属于本节点
-            if (server.cluster->slots[slot] != server.cluster->myself) {
+            if (server.cluster->slots[slot] != myself) {
                 addReplyErrorFormat(c,"I'm not the owner of hash slot %u",slot);
                 return;
             }
@@ -3904,7 +4560,7 @@ void clusterCommand(redisClient *c) {
         } else if (!strcasecmp(c->argv[3]->ptr,"importing") && c->argc == 5) {
 
             // 如果 slot 槽本身已经由本节点处理，那么无须进行导入
-            if (server.cluster->slots[slot] == server.cluster->myself) {
+            if (server.cluster->slots[slot] == myself) {
                 addReplyErrorFormat(c,
                     "I'm already the owner of hash slot %u",slot);
                 return;
@@ -3944,29 +4600,50 @@ void clusterCommand(redisClient *c) {
             /* If this hash slot was served by 'myself' before to switch
              * make sure there are no longer local keys for this hash slot. */
             // 如果这个槽之前由当前节点负责处理，那么必须保证槽里面没有键存在
-            if (server.cluster->slots[slot] == server.cluster->myself &&
-                n != server.cluster->myself)
-            {
+            if (server.cluster->slots[slot] == myself && n != myself) {
                 if (countKeysInSlot(slot) != 0) {
-                    addReplyErrorFormat(c, "Can't assign hashslot %d to a different node while I still hold keys for this hash slot.", slot);
+                    addReplyErrorFormat(c,
+                        "Can't assign hashslot %d to a different node "
+                        "while I still hold keys for this hash slot.", slot);
                     return;
                 }
             }
-
-            /* If this node was the slot owner and the slot was marked as
-             * migrating, assigning the slot to another node will clear
+            /* If this slot is in migrating status but we have no keys
+             * for it assigning the slot to another node will clear
              * the migratig status. */
-            // 撤销本节点对 slot 的迁移计划
-            if (server.cluster->slots[slot] == server.cluster->myself &&
+            if (countKeysInSlot(slot) == 0 &&
                 server.cluster->migrating_slots_to[slot])
                 server.cluster->migrating_slots_to[slot] = NULL;
 
             /* If this node was importing this slot, assigning the slot to
              * itself also clears the importing status. */
             // 撤销本节点对 slot 的导入计划
-            if (n == server.cluster->myself &&
+            if (n == myself &&
                 server.cluster->importing_slots_from[slot])
+            {
+                /* This slot was manually migrated, set this node configEpoch
+                 * to a new epoch so that the new version can be propagated
+                 * by the cluster.
+                 *
+                 * Note that if this ever results in a collision with another
+                 * node getting the same configEpoch, for example because a
+                 * failover happens at the same time we close the slot, the
+                 * configEpoch collision resolution will fix it assigning
+                 * a different epoch to each node. */
+                uint64_t maxEpoch = clusterGetMaxEpoch();
+
+                if (myself->configEpoch == 0 ||
+                    myself->configEpoch != maxEpoch)
+                {
+                    server.cluster->currentEpoch++;
+                    myself->configEpoch = server.cluster->currentEpoch;
+                    clusterDoBeforeSleep(CLUSTER_TODO_FSYNC_CONFIG);
+                    redisLog(REDIS_WARNING,
+                        "configEpoch set to %llu after importing slot %d",
+                        (unsigned long long) myself->configEpoch, slot);
+                }
                 server.cluster->importing_slots_from[slot] = NULL;
+            }
 
             // 将槽设置为未指派
             clusterDelSlot(slot);
@@ -3975,10 +4652,11 @@ void clusterCommand(redisClient *c) {
             clusterAddSlot(n,slot);
 
         } else {
-            addReplyError(c,"Invalid CLUSTER SETSLOT action or number of arguments");
+            addReplyError(c,
+                "Invalid CLUSTER SETSLOT action or number of arguments");
             return;
         }
-        clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
+        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_UPDATE_STATE);
         addReply(c,shared.ok);
 
     } else if (!strcasecmp(c->argv[1]->ptr,"info") && c->argc == 2) {
@@ -3998,13 +4676,9 @@ void clusterCommand(redisClient *c) {
 
             // 统计已指派节点的数量
             slots_assigned++;
-
-            // 统计各个不同状态下的节点的数量
-            if (n->flags & REDIS_NODE_FAIL) {
-                // 已下线节点
+            if (nodeFailed(n)) {
                 slots_fail++;
-            } else if (n->flags & REDIS_NODE_PFAIL) {
-                // 疑似下线节点
+            } else if (nodeTimedOut(n)) {
                 slots_pfail++;
             } else {
                 // 正常节点
@@ -4090,7 +4764,8 @@ void clusterCommand(redisClient *c) {
         if (getLongLongFromObjectOrReply(c,c->argv[2],&slot,NULL) != REDIS_OK)
             return;
         // 取出 count 参数
-        if (getLongLongFromObjectOrReply(c,c->argv[3],&maxkeys,NULL) != REDIS_OK)
+        if (getLongLongFromObjectOrReply(c,c->argv[3],&maxkeys,NULL)
+            != REDIS_OK)
             return;
         // 检查参数的合法性
         if (slot < 0 || slot >= REDIS_CLUSTER_SLOTS || maxkeys < 0) {
@@ -4119,11 +4794,10 @@ void clusterCommand(redisClient *c) {
         if (!n) {
             addReplyErrorFormat(c,"Unknown node %s", (char*)c->argv[2]->ptr);
             return;
-        } else if (n == server.cluster->myself) {
+        } else if (n == myself) {
             addReplyError(c,"I tried hard but I can't forget myself...");
             return;
-        } else if (server.cluster->myself->flags & REDIS_NODE_SLAVE &&
-                   server.cluster->myself->slaveof == n) {
+        } else if (nodeIsSlave(myself) && myself->slaveof == n) {
             addReplyError(c,"Can't forget my master!");
             return;
         }
@@ -4132,8 +4806,8 @@ void clusterCommand(redisClient *c) {
         clusterBlacklistAddNode(n);
         // 从集群中删除该节点
         clusterDelNode(n);
-
-        clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
+        clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|
+                             CLUSTER_TODO_SAVE_CONFIG);
         addReply(c,shared.ok);
 
     } else if (!strcasecmp(c->argv[1]->ptr,"replicate") && c->argc == 3) {
@@ -4151,7 +4825,7 @@ void clusterCommand(redisClient *c) {
 
         /* I can't replicate myself. */
         // 指定节点是自己，不能进行复制
-        if (n == server.cluster->myself) {
+        if (n == myself) {
             addReplyError(c,"Can't replicate myself");
             return;
         }
@@ -4167,11 +4841,11 @@ void clusterCommand(redisClient *c) {
          * slots nor keys to accept to replicate some other node.
          * Slaves can switch to another master without issues. */
         // 节点必须没有被指派任何槽，并且数据库必须为空
-        if (server.cluster->myself->flags & REDIS_NODE_MASTER &&
-            (server.cluster->myself->numslots != 0 ||
-            dictSize(server.db[0].dict) != 0))
-        {
-            addReplyError(c,"To set a master the node must be empty and without assigned slots.");
+        if (nodeIsMaster(myself) &&
+            (myself->numslots != 0 || dictSize(server.db[0].dict) != 0)) {
+            addReplyError(c,
+                "To set a master the node must be empty and "
+                "without assigned slots.");
             return;
         }
 
@@ -4191,7 +4865,7 @@ void clusterCommand(redisClient *c) {
             return;
         }
 
-        if (n->flags & REDIS_NODE_SLAVE) {
+        if (nodeIsSlave(n)) {
             addReplyError(c,"The specified node is not a master");
             return;
         }
@@ -4202,6 +4876,102 @@ void clusterCommand(redisClient *c) {
             addReplyBulkCString(c,ni);
             sdsfree(ni);
         }
+    } else if (!strcasecmp(c->argv[1]->ptr,"failover") &&
+               (c->argc == 2 || c->argc == 3))
+    {
+        /* CLUSTER FAILOVER [FORCE] */
+        int force = 0;
+
+        if (c->argc == 3) {
+            if (!strcasecmp(c->argv[2]->ptr,"force")) {
+                force = 1;
+            } else {
+                addReply(c,shared.syntaxerr);
+                return;
+            }
+        }
+
+        if (nodeIsMaster(myself)) {
+            addReplyError(c,"You should send CLUSTER FAILOVER to a slave");
+            return;
+        } else if (!force &&
+                   (myself->slaveof == NULL || nodeFailed(myself->slaveof) ||
+                   myself->slaveof->link == NULL))
+        {
+            addReplyError(c,"Master is down or failed, "
+                            "please use CLUSTER FAILOVER FORCE");
+            return;
+        }
+        resetManualFailover();
+        server.cluster->mf_end = mstime() + REDIS_CLUSTER_MF_TIMEOUT;
+
+        /* If this is a forced failover, we don't need to talk with our master
+         * to agree about the offset. We just failover taking over it without
+         * coordination. */
+        if (force) {
+            server.cluster->mf_can_start = 1;
+        } else {
+            clusterSendMFStart(myself->slaveof);
+        }
+        redisLog(REDIS_WARNING,"Manual failover user request accepted.");
+        addReply(c,shared.ok);
+    } else if (!strcasecmp(c->argv[1]->ptr,"set-config-epoch") && c->argc == 3)
+    {
+        /* CLUSTER SET-CONFIG-EPOCH <epoch>
+         *
+         * The user is allowed to set the config epoch only when a node is
+         * totally fresh: no config epoch, no other known node, and so forth.
+         * This happens at cluster creation time to start with a cluster where
+         * every node has a different node ID, without to rely on the conflicts
+         * resolution system which is too slow when a big cluster is created. */
+        long long epoch;
+
+        if (getLongLongFromObjectOrReply(c,c->argv[2],&epoch,NULL) != REDIS_OK)
+            return;
+
+        if (epoch < 0) {
+            addReplyErrorFormat(c,"Invalid config epoch specified: %lld",epoch);
+        } else if (dictSize(server.cluster->nodes) > 1) {
+            addReplyError(c,"The user can assign a config epoch only when the "
+                            "node does not know any other node.");
+        } else if (myself->configEpoch != 0) {
+            addReplyError(c,"Node config epoch is already non-zero");
+        } else {
+            myself->configEpoch = epoch;
+            /* No need to fsync the config here since in the unlucky event
+             * of a failure to persist the config, the conflict resolution code
+             * will assign an unique config to this node. */
+            clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|
+                                 CLUSTER_TODO_SAVE_CONFIG);
+            addReply(c,shared.ok);
+        }
+    } else if (!strcasecmp(c->argv[1]->ptr,"reset") &&
+               (c->argc == 2 || c->argc == 3))
+    {
+        /* CLUSTER RESET [SOFT|HARD] */
+        int hard = 0;
+
+        /* Parse soft/hard argument. Default is soft. */
+        if (c->argc == 3) {
+            if (!strcasecmp(c->argv[2]->ptr,"hard")) {
+                hard = 1;
+            } else if (!strcasecmp(c->argv[2]->ptr,"soft")) {
+                hard = 0;
+            } else {
+                addReply(c,shared.syntaxerr);
+                return;
+            }
+        }
+
+        /* Slaves can be reset while containing data, but not master nodes
+         * that must be empty. */
+        if (nodeIsMaster(myself) && dictSize(c->db->dict) != 0) {
+            addReplyError(c,"CLUSTER RESET can't be called with "
+                            "master nodes containing keys");
+            return;
+        }
+        clusterReset(hard);
+        addReply(c,shared.ok);
     } else {
         addReplyError(c,"Wrong CLUSTER subcommand or number of arguments");
     }
@@ -4351,7 +5121,7 @@ void restoreCommand(redisClient *c) {
     /* Make sure this key does not already exist here... */
     // 如果没有给定 REPLACE 选项，并且键已经存在，那么返回错误
     if (!replace && lookupKeyWrite(c->db,c->argv[1]) != NULL) {
-        addReplyError(c,"Target key name is busy.");
+        addReply(c,shared.busykeyerr);
         return;
     }
 
@@ -4499,7 +5269,8 @@ int migrateGetSocket(redisClient *c, robj *host, robj *port, long timeout) {
     // 检查连接的超时设置
     if ((aeWait(fd,AE_WRITABLE,timeout) & AE_WRITABLE) == 0) {
         sdsfree(name);
-        addReplySds(c,sdsnew("-IOERR error or timeout connecting to the client\r\n"));
+        addReplySds(c,
+            sdsnew("-IOERR error or timeout connecting to the client\r\n"));
         close(fd);
         return -1;
     }
@@ -4604,7 +5375,7 @@ void migrateCommand(redisClient *c) {
         addReplySds(c,sdsnew("+NOKEY\r\n"));
         return;
     }
-    
+
     /* Connect */
     // 获取套接字连接
     fd = migrateGetSocket(c,c->argv[1],c->argv[2],timeout);
@@ -4635,7 +5406,8 @@ void migrateCommand(redisClient *c) {
 
     // 写入键名和过期时间
     redisAssertWithInfo(c,NULL,sdsEncodedObject(c->argv[3]));
-    redisAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,c->argv[3]->ptr,sdslen(c->argv[3]->ptr)));
+    redisAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,c->argv[3]->ptr,
+            sdslen(c->argv[3]->ptr)));
     redisAssertWithInfo(c,NULL,rioWriteBulkLongLong(&cmd,ttl));
 
     /* Emit the payload argument, that is the serialized object using
@@ -4780,33 +5552,43 @@ void readwriteCommand(redisClient *c) {
 }
 
 /* Return the pointer to the cluster node that is able to serve the command.
- * For the function to succeed the command should only target a single
- * key (or the same key multiple times).
+ * For the function to succeed the command should only target either:
  *
- * 返回负责处理命令 cmd 的节点的 clusterNode ，集群目前只允许执行处理单个键的命令。
+ * 1) A single key (even multiple times like LPOPRPUSH mylist mylist).
+ * 2) Multiple keys in the same hash slot, while the slot is stable (no
+ *    resharding in progress).
  *
- * If the returned node should be used only for this request, the *ask
- * integer is set to '1', otherwise to '0'. This is used in order to
- * let the caller know if we should reply with -MOVED or with -ASK.
+ * On success the function returns the node that is able to serve the request.
+ * If the node is not 'myself' a redirection must be perfomed. The kind of
+ * redirection is specified setting the integer passed by reference
+ * 'error_code', which will be set to REDIS_CLUSTER_REDIR_ASK or
+ * REDIS_CLUSTER_REDIR_MOVED.
  *
- * 如果返回的节点仅被用于当此转向，那么将 ask 设置为 1 ，否则设置为 0 。
- * 根据 ask 的值，节点会判断应该是发送 -ASK 转向（临时转向）还是 -MOVED 转向（永久转向）。
+ * When the node is 'myself' 'error_code' is set to REDIS_CLUSTER_REDIR_NONE.
  *
- * If the command contains multiple keys, and as a consequence it is not
- * possible to handle the request in Redis Cluster, NULL is returned. 
+ * If the command fails NULL is returned, and the reason of the failure is
+ * provided via 'error_code', which will be set to:
  *
- * 如果命令包含多个键，那么这个命令不能被集群处理，函数返回 NULL 。
- */
-clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **argv, int argc, int *hashslot, int *ask) {
+ * REDIS_CLUSTER_REDIR_CROSS_SLOT if the request contains multiple keys that
+ * don't belong to the same hash slot.
+ *
+ * REDIS_CLUSTER_REDIR_UNSTABLE if the request contains mutliple keys
+ * belonging to the same slot, but the slot is not stable (in migration or
+ * importing state, likely because a resharding is in progress). */
+clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **argv, int argc, int *hashslot, int *error_code) {
 
     // 初始化为 NULL ，
     // 如果输入命令是无参数命令，那么 n 就会继续为 NULL
     clusterNode *n = NULL;
 
     robj *firstkey = NULL;
+    int multiple_keys = 0;
     multiState *ms, _ms;
     multiCmd mc;
-    int i, slot = 0;
+    int i, slot = 0, migrating_slot = 0, importing_slot = 0, missing_keys = 0;
+
+    /* Set error code optimistically for the base case. */
+    if (error_code) *error_code = REDIS_CLUSTER_REDIR_NONE;
 
     /* We handle all the cases as if they were EXEC commands, so we have
      * a common code path for everything */
@@ -4816,7 +5598,7 @@ clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **arg
     if (cmd->proc == execCommand) {
         /* If REDIS_MULTI flag is not set EXEC is just going to return an
          * error. */
-        if (!(c->flags & REDIS_MULTI)) return server.cluster->myself;
+        if (!(c->flags & REDIS_MULTI)) return myself;
         ms = &c->mstate;
     } else {
         /* In order to have a single codepath create a fake Multi State
@@ -4830,9 +5612,8 @@ clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **arg
         mc.cmd = cmd;
     }
 
-    /* Check that all the keys are the same key, and get the slot and
-     * node for this key. */
-    // 遍历事务中的命令
+    /* Check that all the keys are in the same hash slot, and obtain this
+     * slot and the node associated. */
     for (i = 0; i < ms->count; i++) {
         struct redisCommand *mcmd;
         robj **margv;
@@ -4843,84 +5624,92 @@ clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **arg
         margv = ms->commands[i].argv;
 
         // 定位命令的键位置
-        keyindex = getKeysFromCommand(mcmd,margv,margc,&numkeys,
-                                      REDIS_GETKEYS_ALL);
-
+        keyindex = getKeysFromCommand(mcmd,margv,margc,&numkeys);
         // 遍历命令中的所有键
         for (j = 0; j < numkeys; j++) {
+            robj *thiskey = margv[keyindex[j]];
+            int thisslot = keyHashSlot((char*)thiskey->ptr,
+                                       sdslen(thiskey->ptr));
+
             if (firstkey == NULL) {
                 // 这是事务中第一个被处理的键
                 // 获取该键的槽和负责处理该槽的节点
                 /* This is the first key we see. Check what is the slot
                  * and node. */
-
-                // 键
-                firstkey = margv[keyindex[j]];
-
-                // 计算负责处理键 firstkey 的槽
-                slot = keyHashSlot((char*)firstkey->ptr, sdslen(firstkey->ptr));
-
-                // 指向负责处理槽 slot 的节点
+                firstkey = thiskey;
+                slot = thisslot;
                 n = server.cluster->slots[slot];
                 redisAssertWithInfo(c,firstkey,n != NULL);
+                /* If we are migrating or importing this slot, we need to check
+                 * if we have all the keys in the request (the only way we
+                 * can safely serve the request, otherwise we return a TRYAGAIN
+                 * error). To do so we set the importing/migrating state and
+                 * increment a counter for every missing key. */
+                if (n == myself &&
+                    server.cluster->migrating_slots_to[slot] != NULL)
+                {
+                    migrating_slot = 1;
+                } else if (server.cluster->importing_slots_from[slot] != NULL) {
+                    importing_slot = 1;
+                }
             } else {
                 /* If it is not the first key, make sure it is exactly
                  * the same key as the first we saw. */
-                if (!equalStringObjects(firstkey,margv[keyindex[j]])) {
-                    getKeysFreeResult(keyindex);
-                    return NULL;
+                if (!equalStringObjects(firstkey,thiskey)) {
+                    if (slot != thisslot) {
+                        /* Error: multiple keys from different slots. */
+                        getKeysFreeResult(keyindex);
+                        if (error_code)
+                            *error_code = REDIS_CLUSTER_REDIR_CROSS_SLOT;
+                        return NULL;
+                    } else {
+                        /* Flag this request as one with multiple different
+                         * keys. */
+                        multiple_keys = 1;
+                    }
                 }
             }
+
+            /* Migarting / Improrting slot? Count keys we don't have. */
+            if ((migrating_slot || importing_slot) &&
+                lookupKeyRead(&server.db[0],thiskey) == NULL)
+            {
+                missing_keys++;
+            }
         }
         getKeysFreeResult(keyindex);
     }
 
-    if (ask) *ask = 0; /* This is the default. Set to 1 if needed later. */
-
     /* No key at all in command? then we can serve the request
-     * without redirections. */
-    // 这是一个无参数命令，无须转向，直接由本节点处理
-    if (n == NULL) return server.cluster->myself;
+     * without redirections or errors. */
+    if (n == NULL) return myself;
 
-    // 记录负责处理键的槽
+    /* Return the hashslot by reference. */
     if (hashslot) *hashslot = slot;
 
     /* This request is about a slot we are migrating into another instance?
-     * Then we need to check if we have the key. If we have it we can reply.
-     * If instead is a new key, we pass the request to the node that is
-     * receiving the slot. */
-    // 如果负责处理槽 slot 的是本节点
-    // 并且这个槽 slot 正在迁移至另一个节点
-    // 那么首先检查键 key 是否存在于本节点
-    // 如果没有的话，那么键 key 可能已经转移至另一个节点了
-    // 要求客户端进行 ASK 临时转向，到另一个节点去查找键 key
-    if (n == server.cluster->myself &&
-        server.cluster->migrating_slots_to[slot] != NULL)
-    {
-        // 在本节点中查找键 key
-        if (lookupKeyRead(&server.db[0],firstkey) == NULL) {
-
-            // 在本节点没找到键 key
-
-            // 进行 ASK 临时转向
-            if (ask) *ask = 1;
+     * Then if we have all the keys. */
 
-            // 返回转移槽 slot 的目标节点
-            return server.cluster->migrating_slots_to[slot];
-        }
+    /* If we don't have all the keys and we are migrating the slot, send
+     * an ASK redirection. */
+    if (migrating_slot && missing_keys) {
+        if (error_code) *error_code = REDIS_CLUSTER_REDIR_ASK;
+        return server.cluster->migrating_slots_to[slot];
     }
 
-    /* Handle the case in which we are receiving this hash slot from
-     * another instance, so we'll accept the query even if in the table
-     * it is assigned to a different node, but only if the client
-     * issued an ASKING command before. */
-    // 如果当前客户端正在从另一个节点中导入槽 slot ，并且
-    // 1）在接到这个命令之前，客户端先发送了一个 ASKING 命令
-    // 2）这个命令是一个带有 REDIS_CMD_ASKING 标识的命令
-    // 那么将这个命令的执行者设置为当前节点
-    if (server.cluster->importing_slots_from[slot] != NULL &&
-        (c->flags & REDIS_ASKING || cmd->flags & REDIS_CMD_ASKING)) {
-        return server.cluster->myself;
+    /* If we are receiving the slot, and the client correctly flagged the
+     * request as "ASKING", we can serve the request. However if the request
+     * involves multiple keys and we don't have them all, the only option is
+     * to send a TRYAGAIN error. */
+    if (importing_slot &&
+        (c->flags & REDIS_ASKING || cmd->flags & REDIS_CMD_ASKING))
+    {
+        if (multiple_keys && missing_keys) {
+            if (error_code) *error_code = REDIS_CLUSTER_REDIR_UNSTABLE;
+            return NULL;
+        } else {
+            return myself;
+        }
     }
 
     /* Handle the read-only client case reading from a slave: if this
@@ -4928,13 +5717,16 @@ clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **arg
      * is serving, we can reply without redirection. */
     if (c->flags & REDIS_READONLY &&
         cmd->flags & REDIS_CMD_READONLY &&
-        server.cluster->myself->flags & REDIS_NODE_SLAVE &&
-        server.cluster->myself->slaveof == n)
+        nodeIsSlave(myself) &&
+        myself->slaveof == n)
     {
-        return server.cluster->myself;
+        return myself;
     }
 
-    /* It's not a -ASK case. Base case: just return the right node. */
+    /* Base case: just return the right node. However if this node is not
+     * myself, set error_code to MOVED since we need to issue a rediretion. */
+    if (n != myself && error_code) *error_code = REDIS_CLUSTER_REDIR_MOVED;
+
     // 返回负责处理槽 slot 的节点 n
     return n;
 }
diff --git a/src/cluster.h b/src/cluster.h
index f9c5e7a30..9a9c3a35e 100644
--- a/src/cluster.h
+++ b/src/cluster.h
@@ -15,9 +15,6 @@
 #define REDIS_CLUSTER_NAMELEN 40    /* sha1 hex length */
 // 集群的实际端口号 = 用户指定的端口号 + REDIS_CLUSTER_PORT_INCR
 #define REDIS_CLUSTER_PORT_INCR 10000 /* Cluster port = baseport + PORT_INCR */
-// IPv6 地址的长度
-#define REDIS_CLUSTER_IPLEN INET6_ADDRSTRLEN /* IPv6 address string length */
-
 
 /* The following defines are amunt of time, sometimes expressed as
  * multiplicators of the node timeout value (when ending with MULT). 
@@ -35,10 +32,17 @@
 #define REDIS_CLUSTER_FAIL_UNDO_TIME_ADD 10 /* Some additional time. */
 // 在检查从节点数据是否有效时使用的乘法因子
 #define REDIS_CLUSTER_SLAVE_VALIDITY_MULT 10 /* Slave data validity. */
-// 发送投票请求的间隔时间的乘法因子
-#define REDIS_CLUSTER_FAILOVER_AUTH_RETRY_MULT 4 /* Auth request retry time. */
-// 在执行故障转移之前需要等待的秒数
 #define REDIS_CLUSTER_FAILOVER_DELAY 5 /* Seconds */
+#define REDIS_CLUSTER_DEFAULT_MIGRATION_BARRIER 1
+#define REDIS_CLUSTER_MF_TIMEOUT 5000 /* Milliseconds to do a manual failover. */
+#define REDIS_CLUSTER_MF_PAUSE_MULT 2 /* Master pause manual failover mult. */
+
+/* Redirection errors returned by getNodeByQuery(). */
+#define REDIS_CLUSTER_REDIR_NONE 0          /* Node can serve the request. */
+#define REDIS_CLUSTER_REDIR_CROSS_SLOT 1    /* Keys in different slots. */
+#define REDIS_CLUSTER_REDIR_UNSTABLE 2      /* Keys in slot resharding. */
+#define REDIS_CLUSTER_REDIR_ASK 3           /* -ASK redirection required. */
+#define REDIS_CLUSTER_REDIR_MOVED 4         /* -MOVED redirection required. */
 
 struct clusterNode;
 
@@ -64,8 +68,7 @@ typedef struct clusterLink {
 
 } clusterLink;
 
-
-/* Node flags 节点标识*/
+/* Cluster node flags and macros. */
 // 该节点为主节点
 #define REDIS_NODE_MASTER 1     /* The node is a master */
 // 该节点为从节点
@@ -88,6 +91,13 @@ typedef struct clusterLink {
 // 空名字（在节点为主节点时，用作消息中的 slaveof 属性的值）
 #define REDIS_NODE_NULL_NAME "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
 
+#define nodeIsMaster(n) ((n)->flags & REDIS_NODE_MASTER)
+#define nodeIsSlave(n) ((n)->flags & REDIS_NODE_SLAVE)
+#define nodeInHandshake(n) ((n)->flags & REDIS_NODE_HANDSHAKE)
+#define nodeHasAddr(n) (!((n)->flags & REDIS_NODE_NOADDR))
+#define nodeWithoutAddr(n) ((n)->flags & REDIS_NODE_NOADDR)
+#define nodeTimedOut(n) ((n)->flags & REDIS_NODE_PFAIL)
+#define nodeFailed(n) ((n)->flags & REDIS_NODE_FAIL)
 
 /* This structure represent elements of node->fail_reports. */
 // 每个 clusterNodeFailReport 结构保存了一条其他节点对目标节点的下线报告
@@ -233,14 +243,20 @@ typedef struct clusterState {
     // 如果值为 1 ，表示本节点已经向其他节点发送了投票请求
     int failover_auth_sent;     /* True if we already asked for votes. */
 
-    // 集群当前进行选举的配置纪元
+    int failover_auth_rank;     /* This slave rank for current auth request. */
     uint64_t failover_auth_epoch; /* Epoch of the current election. */
-
+    /* Manual failover state in common. */
+    mstime_t mf_end;            /* Manual failover time limit (ms unixtime).
+                                   It is zero if there is no MF in progress. */
+    /* Manual failover state of master. */
+    clusterNode *mf_slave;      /* Slave performing the manual failover. */
+    /* Manual failover state of slave. */
+    long long mf_master_offset; /* Master offset the slave needs to start MF
+                                   or zero if stil not received. */
+    int mf_can_start;           /* If non-zero signal that the manual failover
+                                   can start requesting masters vote. */
     /* The followign fields are uesd by masters to take state on elections. */
-    // 以下一个域是主节点在进行故障迁移投票时使用的域
-
-    // 节点最后投票的配置纪元
-    uint64_t last_vote_epoch;   /* Epoch of the last vote granted. */
+    uint64_t lastVoteEpoch;     /* Epoch of the last vote granted. */
 
     // 在进入下个事件循环之前要做的事情，以各个 flag 来记录
     int todo_before_sleep; /* Things to do in clusterBeforeSleep(). */
@@ -287,6 +303,7 @@ typedef struct clusterState {
 #define CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK 6     /* Yes, you have my vote */
 // 槽布局已经发生变化，消息发送者要求消息接收者进行相应的更新
 #define CLUSTERMSG_TYPE_UPDATE 7        /* Another node slots configuration */
+#define CLUSTERMSG_TYPE_MFSTART 8       /* Pause clients for manual failover */
 
 /* Initially we don't know our "name", but we'll find it once we connect
  * to the first node, using the getsockname() function. Then we'll use this
@@ -306,7 +323,7 @@ typedef struct {
     uint32_t pong_received;
 
     // 节点的 IP 地址
-    char ip[16];    /* IP address last time it was seen */
+    char ip[REDIS_IP_STR_LEN];    /* IP address last time it was seen */
 
     // 节点的端口号
     uint16_t port;  /* port last time it was seen */
@@ -381,9 +398,11 @@ union clusterMsgData {
 
 // 用来表示集群消息的结构（消息头，header）
 typedef struct {
-
+    char sig[4];        /* Siganture "RCmb" (Redis Cluster message bus). */
     // 消息的长度（包括这个消息头的长度和消息正文的长度）
     uint32_t totlen;    /* Total length of this message */
+    uint16_t ver;       /* Protocol version, currently set to 0. */
+    uint16_t notused0;  /* 2 bytes not used. */
 
     // 消息的类型
     uint16_t type;      /* Message type */
@@ -427,7 +446,7 @@ typedef struct {
     // 消息发送者所处集群的状态
     unsigned char state; /* Cluster state from the POV of the sender */
 
-    unsigned char notused2[3]; /* Reserved for future use. For alignment. */
+    unsigned char mflags[3]; /* Message flags: CLUSTERMSG_FLAG[012]_... */
 
     // 消息的正文（或者说，内容）
     union clusterMsgData data;
@@ -436,6 +455,12 @@ typedef struct {
 
 #define CLUSTERMSG_MIN_LEN (sizeof(clusterMsg)-sizeof(union clusterMsgData))
 
+/* Message flags better specify the packet content or are used to
+ * provide some information about the node state. */
+#define CLUSTERMSG_FLAG0_PAUSED (1<<0) /* Master paused for manual failover. */
+#define CLUSTERMSG_FLAG0_FORCEACK (1<<1) /* Give ACK to AUTH_REQUEST even if
+                                            master is up. */
+
 /* ---------------------- API exported outside cluster.c -------------------- */
 clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **argv, int argc, int *hashslot, int *ask);
 
diff --git a/src/config.c b/src/config.c
index e9ce386c5..96f3628f6 100644
--- a/src/config.c
+++ b/src/config.c
@@ -134,6 +134,11 @@ void loadServerConfigFromString(char *config) {
             if (server.port < 0 || server.port > 65535) {
                 err = "Invalid port"; goto loaderr;
             }
+        } else if (!strcasecmp(argv[0],"tcp-backlog") && argc == 2) {
+            server.tcp_backlog = atoi(argv[1]);
+            if (server.tcp_backlog < 0) {
+                err = "Invalid backlog value"; goto loaderr;
+            }
         } else if (!strcasecmp(argv[0],"bind") && argc >= 2) {
             int j, addresses = argc-1;
 
@@ -393,6 +398,8 @@ void loadServerConfigFromString(char *config) {
             server.zset_max_ziplist_entries = memtoll(argv[1], NULL);
         } else if (!strcasecmp(argv[0],"zset-max-ziplist-value") && argc == 2) {
             server.zset_max_ziplist_value = memtoll(argv[1], NULL);
+        } else if (!strcasecmp(argv[0],"hll-sparse-max-bytes") && argc == 2) {
+            server.hll_sparse_max_bytes = memtoll(argv[1], NULL);
         } else if (!strcasecmp(argv[0],"rename-command") && argc == 3) {
             struct redisCommand *cmd = lookupCommand(argv[1]);
             int retval;
@@ -429,6 +436,14 @@ void loadServerConfigFromString(char *config) {
             if (server.cluster_node_timeout <= 0) {
                 err = "cluster node timeout must be 1 or greater"; goto loaderr;
             }
+        } else if (!strcasecmp(argv[0],"cluster-migration-barrier")
+                   && argc == 2)
+        {
+            server.cluster_migration_barrier = atoi(argv[1]);
+            if (server.cluster_migration_barrier < 0) {
+                err = "cluster migration barrier must be positive";
+                goto loaderr;
+            }
         } else if (!strcasecmp(argv[0],"lua-time-limit") && argc == 2) {
             server.lua_time_limit = strtoll(argv[1],NULL,10);
         } else if (!strcasecmp(argv[0],"slowlog-log-slower-than") &&
@@ -612,7 +627,7 @@ void configSetCommand(redisClient *c) {
     } else if (!strcasecmp(c->argv[2]->ptr,"maxclients")) {
         int orig_value = server.maxclients;
 
-        if (getLongLongFromObject(o,&ll) == REDIS_ERR || ll < 0) goto badfmt;
+        if (getLongLongFromObject(o,&ll) == REDIS_ERR || ll < 1) goto badfmt;
 
         /* Try to check if the OS is capable of supporting so many FDs. */
         server.maxclients = ll;
@@ -777,6 +792,9 @@ void configSetCommand(redisClient *c) {
     } else if (!strcasecmp(c->argv[2]->ptr,"zset-max-ziplist-value")) {
         if (getLongLongFromObject(o,&ll) == REDIS_ERR || ll < 0) goto badfmt;
         server.zset_max_ziplist_value = ll;
+    } else if (!strcasecmp(c->argv[2]->ptr,"hll-sparse-max-bytes")) {
+        if (getLongLongFromObject(o,&ll) == REDIS_ERR || ll < 0) goto badfmt;
+        server.hll_sparse_max_bytes = ll;
     } else if (!strcasecmp(c->argv[2]->ptr,"lua-time-limit")) {
         if (getLongLongFromObject(o,&ll) == REDIS_ERR || ll < 0) goto badfmt;
         server.lua_time_limit = ll;
@@ -900,6 +918,10 @@ void configSetCommand(redisClient *c) {
         if (getLongLongFromObject(o,&ll) == REDIS_ERR ||
             ll <= 0) goto badfmt;
         server.cluster_node_timeout = ll;
+    } else if (!strcasecmp(c->argv[2]->ptr,"cluster-migration-barrier")) {
+        if (getLongLongFromObject(o,&ll) == REDIS_ERR ||
+            ll < 0) goto badfmt;
+        server.cluster_migration_barrier = ll;
     } else {
         addReplyErrorFormat(c,"Unsupported CONFIG parameter: %s",
             (char*)c->argv[2]->ptr);
@@ -982,12 +1004,15 @@ void configGetCommand(redisClient *c) {
             server.zset_max_ziplist_entries);
     config_get_numerical_field("zset-max-ziplist-value",
             server.zset_max_ziplist_value);
+    config_get_numerical_field("hll-sparse-max-bytes",
+            server.hll_sparse_max_bytes);
     config_get_numerical_field("lua-time-limit",server.lua_time_limit);
     config_get_numerical_field("slowlog-log-slower-than",
             server.slowlog_log_slower_than);
     config_get_numerical_field("slowlog-max-len",
             server.slowlog_max_len);
     config_get_numerical_field("port",server.port);
+    config_get_numerical_field("tcp-backlog",server.tcp_backlog);
     config_get_numerical_field("databases",server.dbnum);
     config_get_numerical_field("repl-ping-slave-period",server.repl_ping_slave_period);
     config_get_numerical_field("repl-timeout",server.repl_timeout);
@@ -1000,6 +1025,7 @@ void configGetCommand(redisClient *c) {
     config_get_numerical_field("min-slaves-max-lag",server.repl_min_slaves_max_lag);
     config_get_numerical_field("hz",server.hz);
     config_get_numerical_field("cluster-node-timeout",server.cluster_node_timeout);
+    config_get_numerical_field("cluster-migration-barrier",server.cluster_migration_barrier);
 
     /* Bool (yes/no) values */
     config_get_bool_field("no-appendfsync-on-rewrite",
@@ -1467,7 +1493,7 @@ void rewriteConfigSaveOption(struct rewriteConfigState *state) {
      * resulting into no RDB persistence as expected. */
     for (j = 0; j < server.saveparamslen; j++) {
         line = sdscatprintf(sdsempty(),"save %ld %d",
-            server.saveparams[j].seconds, server.saveparams[j].changes);
+            (long) server.saveparams[j].seconds, server.saveparams[j].changes);
         rewriteConfigRewriteLine(state,"save",line,1);
     }
     /* Mark "save" as processed in case server.saveparamslen is zero. */
@@ -1707,6 +1733,7 @@ int rewriteConfig(char *path) {
     rewriteConfigYesNoOption(state,"daemonize",server.daemonize,0);
     rewriteConfigStringOption(state,"pidfile",server.pidfile,REDIS_DEFAULT_PID_FILE);
     rewriteConfigNumericalOption(state,"port",server.port,REDIS_SERVERPORT);
+    rewriteConfigNumericalOption(state,"tcp-backlog",server.tcp_backlog,REDIS_TCP_BACKLOG);
     rewriteConfigBindOption(state);
     rewriteConfigStringOption(state,"unixsocket",server.unixsocket,NULL);
     rewriteConfigOctalOption(state,"unixsocketperm",server.unixsocketperm,REDIS_DEFAULT_UNIX_SOCKET_PERM);
@@ -1767,6 +1794,7 @@ int rewriteConfig(char *path) {
     rewriteConfigYesNoOption(state,"cluster-enabled",server.cluster_enabled,0);
     rewriteConfigStringOption(state,"cluster-config-file",server.cluster_configfile,REDIS_DEFAULT_CLUSTER_CONFIG_FILE);
     rewriteConfigNumericalOption(state,"cluster-node-timeout",server.cluster_node_timeout,REDIS_CLUSTER_DEFAULT_NODE_TIMEOUT);
+    rewriteConfigNumericalOption(state,"cluster-migration-barrier",server.cluster_migration_barrier,REDIS_CLUSTER_DEFAULT_MIGRATION_BARRIER);
     rewriteConfigNumericalOption(state,"slowlog-log-slower-than",server.slowlog_log_slower_than,REDIS_SLOWLOG_LOG_SLOWER_THAN);
     rewriteConfigNumericalOption(state,"slowlog-max-len",server.slowlog_max_len,REDIS_SLOWLOG_MAX_LEN);
     rewriteConfigNotifykeyspaceeventsOption(state);
@@ -1777,6 +1805,7 @@ int rewriteConfig(char *path) {
     rewriteConfigNumericalOption(state,"set-max-intset-entries",server.set_max_intset_entries,REDIS_SET_MAX_INTSET_ENTRIES);
     rewriteConfigNumericalOption(state,"zset-max-ziplist-entries",server.zset_max_ziplist_entries,REDIS_ZSET_MAX_ZIPLIST_ENTRIES);
     rewriteConfigNumericalOption(state,"zset-max-ziplist-value",server.zset_max_ziplist_value,REDIS_ZSET_MAX_ZIPLIST_VALUE);
+    rewriteConfigNumericalOption(state,"hll-sparse-max-bytes",server.hll_sparse_max_bytes,REDIS_DEFAULT_HLL_SPARSE_MAX_BYTES);
     rewriteConfigYesNoOption(state,"activerehashing",server.activerehashing,REDIS_DEFAULT_ACTIVE_REHASHING);
     rewriteConfigClientoutputbufferlimitOption(state);
     rewriteConfigNumericalOption(state,"hz",server.hz,REDIS_DEFAULT_HZ);
@@ -1811,14 +1840,7 @@ void configCommand(redisClient *c) {
         configGetCommand(c);
     } else if (!strcasecmp(c->argv[1]->ptr,"resetstat")) {
         if (c->argc != 2) goto badarity;
-        server.stat_keyspace_hits = 0;
-        server.stat_keyspace_misses = 0;
-        server.stat_numcommands = 0;
-        server.stat_numconnections = 0;
-        server.stat_expiredkeys = 0;
-        server.stat_rejected_conn = 0;
-        server.stat_fork_time = 0;
-        server.aof_delayed_fsync = 0;
+        resetServerStats();
         resetCommandTableStats();
         addReply(c,shared.ok);
     } else if (!strcasecmp(c->argv[1]->ptr,"rewrite")) {
@@ -1828,8 +1850,10 @@ void configCommand(redisClient *c) {
             return;
         }
         if (rewriteConfig(server.configfile) == -1) {
+            redisLog(REDIS_WARNING,"CONFIG REWRITE failed: %s", strerror(errno));
             addReplyErrorFormat(c,"Rewriting config file: %s", strerror(errno));
         } else {
+            redisLog(REDIS_WARNING,"CONFIG REWRITE executed with success.");
             addReply(c,shared.ok);
         }
     } else {
diff --git a/src/config.h b/src/config.h
index 9f2baaa1f..8041f7ebe 100644
--- a/src/config.h
+++ b/src/config.h
@@ -187,7 +187,7 @@ void setproctitle(const char *fmt, ...);
 
 #if (__i386 || __amd64) && __GNUC__
 #define GNUC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
-#if GNUC_VERSION >= 40100
+#if (GNUC_VERSION >= 40100) || defined(__clang__)
 #define HAVE_ATOMIC
 #endif
 #endif
diff --git a/src/db.c b/src/db.c
index be96e3b05..a38707c57 100644
--- a/src/db.c
+++ b/src/db.c
@@ -63,7 +63,7 @@ robj *lookupKey(redisDb *db, robj *key) {
          * a copy on write madness. */
         // 更新时间信息（只在不存在子进程时执行，防止破坏 copy-on-write 机制）
         if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)
-            val->lru = server.lruclock;
+            val->lru = LRU_CLOCK();
 
         // 返回值
         return val;
@@ -194,8 +194,7 @@ void dbAdd(redisDb *db, robj *key, robj *val) {
  * 如果键不存在，那么函数停止。
  */
 void dbOverwrite(redisDb *db, robj *key, robj *val) {
-
-    struct dictEntry *de = dictFind(db->dict,key->ptr);
+    dictEntry *de = dictFind(db->dict,key->ptr);
     
     // 节点必须存在，否则中止
     redisAssertWithInfo(NULL,key,de != NULL);
@@ -257,7 +256,7 @@ int dbExists(redisDb *db, robj *key) {
  * 这个函数保证被返回的键都是未过期的。
  */
 robj *dbRandomKey(redisDb *db) {
-    struct dictEntry *de;
+    dictEntry *de;
 
     while(1) {
         sds key;
@@ -311,6 +310,44 @@ int dbDelete(redisDb *db, robj *key) {
     }
 }
 
+/* Prepare the string object stored at 'key' to be modified destructively
+ * to implement commands like SETBIT or APPEND.
+ *
+ * An object is usually ready to be modified unless one of the two conditions
+ * are true:
+ *
+ * 1) The object 'o' is shared (refcount > 1), we don't want to affect
+ *    other users.
+ * 2) The object encoding is not "RAW".
+ *
+ * If the object is found in one of the above conditions (or both) by the
+ * function, an unshared / not-encoded copy of the string object is stored
+ * at 'key' in the specified 'db'. Otherwise the object 'o' itself is
+ * returned.
+ *
+ * USAGE:
+ *
+ * The object 'o' is what the caller already obtained by looking up 'key'
+ * in 'db', the usage pattern looks like this:
+ *
+ * o = lookupKeyWrite(db,key);
+ * if (checkType(c,o,REDIS_STRING)) return;
+ * o = dbUnshareStringValue(db,key,o);
+ *
+ * At this point the caller is ready to modify the object, for example
+ * using an sdscat() call to append some data, or anything else.
+ */
+robj *dbUnshareStringValue(redisDb *db, robj *key, robj *o) {
+    redisAssert(o->type == REDIS_STRING);
+    if (o->refcount != 1 || o->encoding != REDIS_ENCODING_RAW) {
+        robj *decoded = getDecodedObject(o);
+        o = createRawStringObject(decoded->ptr, sdslen(decoded->ptr));
+        decrRefCount(decoded);
+        dbOverwrite(db,key,o);
+    }
+    return o;
+}
+
 /*
  * 清空服务器的所有数据。
  */
@@ -442,6 +479,9 @@ void delCommand(redisClient *c) {
     // 遍历所有输入键
     for (j = 1; j < c->argc; j++) {
 
+        // 先删除过期的键
+        expireIfNeeded(c->db,c->argv[j]);
+
         // 尝试删除键
         if (dbDelete(c->db,c->argv[j])) {
 
@@ -879,11 +919,13 @@ void shutdownCommand(redisClient *c) {
         }
     }
 
-    /* SHUTDOWN can be called even while the server is in "loading" state.
-     * When this happens we need to make sure no attempt is performed to save
+    /* When SHUTDOWN is called while the server is loading a dataset in
+     * memory we need to make sure no attempt is performed to save
      * the dataset on shutdown (otherwise it could overwrite the current DB
-     * with half-read data). */
-    if (server.loading)
+     * with half-read data).
+     *
+     * Also when in Sentinel mode clear the SAVE flag and force NOSAVE. */
+    if (server.loading || server.sentinel_mode)
         flags = (flags & ~REDIS_SHUTDOWN_SAVE) | REDIS_SHUTDOWN_NOSAVE;
 
     if (prepareForShutdown(flags) == REDIS_OK) exit(0);
@@ -1134,7 +1176,8 @@ void propagateExpire(redisDb *db, robj *key) {
 int expireIfNeeded(redisDb *db, robj *key) {
 
     // 取出键的过期时间
-    long long when = getExpire(db,key);
+    mstime_t when = getExpire(db,key);
+    mstime_t now;
 
     // 没有过期时间
     if (when < 0) return 0; /* No expire for this key */
@@ -1143,6 +1186,13 @@ int expireIfNeeded(redisDb *db, robj *key) {
     // 如果服务器正在进行载入，那么不进行任何过期检查
     if (server.loading) return 0;
 
+    /* If we are in the context of a Lua script, we claim that time is
+     * blocked to when the Lua script started. This way a key can expire
+     * only the first time it is accessed and not in the middle of the
+     * script execution, making propagation to slaves / AOF consistent.
+     * See issue #1525 on Github for more information. */
+    now = server.lua_caller ? server.lua_time_start : mstime();
+
     /* If we are running in the context of a slave, return ASAP:
      * the slave key expiration is controlled by the master that will
      * send us synthesized DEL operations for expired keys.
@@ -1155,15 +1205,13 @@ int expireIfNeeded(redisDb *db, robj *key) {
     // 它只返回一个逻辑上正确的返回值
     // 真正的删除操作要等待主节点发来删除命令时才执行
     // 从而保证数据的同步
-    if (server.masterhost != NULL) {
-        return mstime() > when;
-    }
+    if (server.masterhost != NULL) return now > when;
 
     // 运行到这里，表示键带有过期时间，并且服务器为主节点
 
     /* Return when this key has not expired */
     // 如果未过期，返回 0
-    if (mstime() <= when) return 0;
+    if (now <= when) return 0;
 
     /* Delete the key */
     server.stat_expiredkeys++;
@@ -1365,6 +1413,8 @@ void persistCommand(redisClient *c) {
  * API to get key arguments from commands
  * ---------------------------------------------------------------------------*/
 
+/* The base case is to use the keys position as given in the command table
+ * (firstkey, lastkey, step). */
 int *getKeysUsingCommandTable(struct redisCommand *cmd,robj **argv, int argc, int *numkeys) {
     int j, i = 0, last, *keys;
     REDIS_NOTUSED(argv);
@@ -1384,42 +1434,65 @@ int *getKeysUsingCommandTable(struct redisCommand *cmd,robj **argv, int argc, in
     return keys;
 }
 
-int *getKeysFromCommand(struct redisCommand *cmd,robj **argv, int argc, int *numkeys, int flags) {
+/* Return all the arguments that are keys in the command passed via argc / argv.
+ *
+ * The command returns the positions of all the key arguments inside the array,
+ * so the actual return value is an heap allocated array of integers. The
+ * length of the array is returned by reference into *numkeys.
+ *
+ * 'cmd' must be point to the corresponding entry into the redisCommand
+ * table, according to the command name in argv[0].
+ *
+ * This function uses the command table if a command-specific helper function
+ * is not required, otherwise it calls the command-specific function. */
+int *getKeysFromCommand(struct redisCommand *cmd, robj **argv, int argc, int *numkeys) {
     if (cmd->getkeys_proc) {
-        return cmd->getkeys_proc(cmd,argv,argc,numkeys,flags);
+        return cmd->getkeys_proc(cmd,argv,argc,numkeys);
     } else {
         return getKeysUsingCommandTable(cmd,argv,argc,numkeys);
     }
 }
 
+/* Free the result of getKeysFromCommand. */
 void getKeysFreeResult(int *result) {
     zfree(result);
 }
 
-int *noPreloadGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *numkeys, int flags) {
-    if (flags & REDIS_GETKEYS_PRELOAD) {
+/* Helper function to extract keys from following commands:
+ * ZUNIONSTORE <destkey> <num-keys> <key> <key> ... <key> <options>
+ * ZINTERSTORE <destkey> <num-keys> <key> <key> ... <key> <options> */
+int *zunionInterGetKeys(struct redisCommand *cmd, robj **argv, int argc, int *numkeys) {
+    int i, num, *keys;
+    REDIS_NOTUSED(cmd);
+
+    num = atoi(argv[2]->ptr);
+    /* Sanity check. Don't return any key if the command is going to
+     * reply with syntax error. */
+    if (num > (argc-3)) {
         *numkeys = 0;
         return NULL;
-    } else {
-        return getKeysUsingCommandTable(cmd,argv,argc,numkeys);
     }
-}
 
-int *renameGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *numkeys, int flags) {
-    if (flags & REDIS_GETKEYS_PRELOAD) {
-        int *keys = zmalloc(sizeof(int));
-        *numkeys = 1;
-        keys[0] = 1;
-        return keys;
-    } else {
-        return getKeysUsingCommandTable(cmd,argv,argc,numkeys);
-    }
+    /* Keys in z{union,inter}store come from two places:
+     * argv[1] = storage key,
+     * argv[3...n] = keys to intersect */
+    keys = zmalloc(sizeof(int)*(num+1));
+
+    /* Add all key positions for argv[3...n] to keys[] */
+    for (i = 0; i < num; i++) keys[i] = 3+i;
+
+    /* Finally add the argv[1] key position (the storage key target). */
+    keys[num] = 1;
+    *numkeys = num+1;  /* Total keys = {union,inter} keys + storage key */
+    return keys;
 }
 
-int *zunionInterGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *numkeys, int flags) {
+/* Helper function to extract keys from the following commands:
+ * EVAL <script> <num-keys> <key> <key> ... <key> [more stuff]
+ * EVALSHA <script> <num-keys> <key> <key> ... <key> [more stuff] */
+int *evalGetKeys(struct redisCommand *cmd, robj **argv, int argc, int *numkeys) {
     int i, num, *keys;
     REDIS_NOTUSED(cmd);
-    REDIS_NOTUSED(flags);
 
     num = atoi(argv[2]->ptr);
     /* Sanity check. Don't return any key if the command is going to
@@ -1428,8 +1501,60 @@ int *zunionInterGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *num
         *numkeys = 0;
         return NULL;
     }
+
     keys = zmalloc(sizeof(int)*num);
+    *numkeys = num;
+
+    /* Add all key positions for argv[3...n] to keys[] */
     for (i = 0; i < num; i++) keys[i] = 3+i;
+
+    return keys;
+}
+
+/* Helper function to extract keys from the SORT command.
+ *
+ * SORT <sort-key> ... STORE <store-key> ...
+ *
+ * The first argument of SORT is always a key, however a list of options
+ * follow in SQL-alike style. Here we parse just the minimum in order to
+ * correctly identify keys in the "STORE" option. */
+int *sortGetKeys(struct redisCommand *cmd, robj **argv, int argc, int *numkeys) {
+    int i, j, num, *keys;
+    REDIS_NOTUSED(cmd);
+
+    num = 0;
+    keys = zmalloc(sizeof(int)*2); /* Alloc 2 places for the worst case. */
+
+    keys[num++] = 1; /* <sort-key> is always present. */
+
+    /* Search for STORE option. By default we consider options to don't
+     * have arguments, so if we find an unknown option name we scan the
+     * next. However there are options with 1 or 2 arguments, so we
+     * provide a list here in order to skip the right number of args. */
+    struct {
+        char *name;
+        int skip;
+    } skiplist[] = {
+        {"limit", 2},
+        {"get", 1},
+        {"by", 1},
+        {NULL, 0} /* End of elements. */
+    };
+
+    for (i = 2; i < argc; i++) {
+        for (j = 0; skiplist[j].name != NULL; j++) {
+            if (!strcasecmp(argv[i]->ptr,skiplist[j].name)) {
+                i += skiplist[j].skip;
+                break;
+            } else if (!strcasecmp(argv[i]->ptr,"store") && i+1 < argc) {
+                /* Note: we don't increment "num" here and continue the loop
+                 * to be sure to process the *last* "STORE" option if multiple
+                 * ones are provided. This is same behavior as SORT. */
+                keys[num] = i+1; /* <store-key> */
+                break;
+            }
+        }
+    }
     *numkeys = num;
     return keys;
 }
@@ -1472,9 +1597,9 @@ unsigned int getKeysInSlot(unsigned int hashslot, robj **keys, unsigned int coun
 
     range.min = range.max = hashslot;
     range.minex = range.maxex = 0;
-    
+
     // 定位到第一个属于指定 slot 的键上面
-    n = zslFirstInRange(server.cluster->slots_to_keys, range);
+    n = zslFirstInRange(server.cluster->slots_to_keys, &range);
     // 遍历跳跃表，并保存属于指定 slot 的键
     // n && n->score 检查当前键是否属于指定 slot
     // && count-- 用来计数
@@ -1486,6 +1611,28 @@ unsigned int getKeysInSlot(unsigned int hashslot, robj **keys, unsigned int coun
     return j;
 }
 
+/* Remove all the keys in the specified hash slot.
+ * The number of removed items is returned. */
+unsigned int delKeysInSlot(unsigned int hashslot) {
+    zskiplistNode *n;
+    zrangespec range;
+    int j = 0;
+
+    range.min = range.max = hashslot;
+    range.minex = range.maxex = 0;
+
+    n = zslFirstInRange(server.cluster->slots_to_keys, &range);
+    while(n && n->score == hashslot) {
+        robj *key = n->obj;
+        n = n->level[0].forward; /* Go to the next item before freeing it. */
+        incrRefCount(key); /* Protect the object while freeing it. */
+        dbDelete(&server.db[0],key);
+        decrRefCount(key);
+        j++;
+    }
+    return j;
+}
+
 // 返回指定 slot 包含的键数量
 unsigned int countKeysInSlot(unsigned int hashslot) {
     zskiplist *zsl = server.cluster->slots_to_keys;
@@ -1498,7 +1645,7 @@ unsigned int countKeysInSlot(unsigned int hashslot) {
 
     /* Find first element in range */
     // 定位到第一个在指定 slot 上的键
-    zn = zslFirstInRange(zsl, range);
+    zn = zslFirstInRange(zsl, &range);
 
     /* Use rank of first element, if any, to determine preliminary count */
     // 使用第一个指定 slot 键的排位减去最后一个指定 slot 键的排位
@@ -1513,7 +1660,7 @@ unsigned int countKeysInSlot(unsigned int hashslot) {
 
         /* Find last element in range */
         // 获取最后一个指定 slot 的键
-        zn = zslLastInRange(zsl, range);
+        zn = zslLastInRange(zsl, &range);
 
         /* Use rank of last element, if any, to determine the actual count */
         // 最后一个键存在
diff --git a/src/debug.c b/src/debug.c
index 7d9a8bfe5..3fb491cb7 100644
--- a/src/debug.c
+++ b/src/debug.c
@@ -292,7 +292,7 @@ void debugCommand(redisClient *c) {
         addReplyStatusFormat(c,
             "Value at:%p refcount:%d "
             "encoding:%s serializedlength:%lld "
-            "lru:%d lru_seconds_idle:%lu",
+            "lru:%d lru_seconds_idle:%llu",
             (void*)val, val->refcount,
             strenc, (long long) rdbSavedObjectLen(val),
             val->lru, estimateObjectIdleTime(val));
@@ -326,6 +326,7 @@ void debugCommand(redisClient *c) {
 
         if (getLongFromObjectOrReply(c, c->argv[2], &keys, NULL) != REDIS_OK)
             return;
+        dictExpand(c->db->dict,keys);
         for (j = 0; j < keys; j++) {
             snprintf(buf,sizeof(buf),"key:%lu",j);
             key = createStringObject(buf,strlen(buf));
@@ -363,6 +364,31 @@ void debugCommand(redisClient *c) {
     {
         server.active_expire_enabled = atoi(c->argv[2]->ptr);
         addReply(c,shared.ok);
+    } else if (!strcasecmp(c->argv[1]->ptr,"cmdkeys") && c->argc >= 3) {
+        struct redisCommand *cmd = lookupCommand(c->argv[2]->ptr);
+        int *keys, numkeys, j;
+
+        if (!cmd) {
+            addReplyError(c,"Invalid command specified");
+            return;
+        } else if ((cmd->arity > 0 && cmd->arity != c->argc-2) ||
+                   ((c->argc-2) < -cmd->arity))
+        {
+            addReplyError(c,"Invalid number of arguments specified for command");
+            return;
+        }
+
+        keys = getKeysFromCommand(cmd,c->argv+2,c->argc-2,&numkeys);
+        addReplyMultiBulkLen(c,numkeys);
+        for (j = 0; j < numkeys; j++) addReplyBulk(c,c->argv[keys[j]+2]);
+        getKeysFreeResult(keys);
+    } else if (!strcasecmp(c->argv[1]->ptr,"error") && c->argc == 3) {
+        sds errstr = sdsnewlen("-",1);
+
+        errstr = sdscatsds(errstr,c->argv[2]->ptr);
+        errstr = sdsmapchars(errstr,"\n\r","  ",2); /* no newlines in errors. */
+        errstr = sdscatlen(errstr,"\r\n",2);
+        addReplySds(c,errstr);
     } else {
         addReplyErrorFormat(c, "Unknown DEBUG subcommand or wrong number of arguments for '%s'",
             (char*)c->argv[1]->ptr);
@@ -676,7 +702,7 @@ void logCurrentClient(void) {
     int j;
 
     redisLog(REDIS_WARNING, "--- CURRENT CLIENT INFO");
-    client = getClientInfoString(cc);
+    client = catClientInfoString(sdsempty(),cc);
     redisLog(REDIS_WARNING,"client: %s", client);
     sdsfree(client);
     for (j = 0; j < cc->argc; j++) {
diff --git a/src/dict.c b/src/dict.c
index 902e61aa3..23352af78 100644
--- a/src/dict.c
+++ b/src/dict.c
@@ -343,7 +343,7 @@ int dictExpand(dict *d, unsigned long size)
  * 返回 0 则表示所有键都已经迁移完毕。
  *
  * Note that a rehashing step consists in moving a bucket (that may have more
- * thank one key as we use chaining) from the old to the new hash table. 
+ * than one key as we use chaining) from the old to the new hash table.
  *
  * 注意，每步 rehash 都是以一个哈希表索引（桶）作为单位的，
  * 一个桶里可能会有多个节点，
@@ -1095,6 +1095,58 @@ dictEntry *dictGetRandomKey(dict *d)
     return he;
 }
 
+/* This is a version of dictGetRandomKey() that is modified in order to
+ * return multiple entries by jumping at a random place of the hash table
+ * and scanning linearly for entries.
+ *
+ * Returned pointers to hash table entries are stored into 'des' that
+ * points to an array of dictEntry pointers. The array must have room for
+ * at least 'count' elements, that is the argument we pass to the function
+ * to tell how many random elements we need.
+ *
+ * The function returns the number of items stored into 'des', that may
+ * be less than 'count' if the hash table has less than 'count' elements
+ * inside.
+ *
+ * Note that this function is not suitable when you need a good distribution
+ * of the returned items, but only when you need to "sample" a given number
+ * of continuous elements to run some kind of algorithm or to produce
+ * statistics. However the function is much faster than dictGetRandomKey()
+ * at producing N elements, and the elements are guaranteed to be non
+ * repeating. */
+int dictGetRandomKeys(dict *d, dictEntry **des, int count) {
+    int j; /* internal hash table id, 0 or 1. */
+    int stored = 0;
+
+    if (dictSize(d) < count) count = dictSize(d);
+    while(stored < count) {
+        for (j = 0; j < 2; j++) {
+            /* Pick a random point inside the hash table 0 or 1. */
+            unsigned int i = random() & d->ht[j].sizemask;
+            int size = d->ht[j].size;
+
+            /* Make sure to visit every bucket by iterating 'size' times. */
+            while(size--) {
+                dictEntry *he = d->ht[j].table[i];
+                while (he) {
+                    /* Collect all the elements of the buckets found non
+                     * empty while iterating. */
+                    *des = he;
+                    des++;
+                    he = he->next;
+                    stored++;
+                    if (stored == count) return stored;
+                }
+                i = (i+1) & d->ht[j].sizemask;
+            }
+            /* If there is only one table and we iterated it all, we should
+             * already have 'count' elements. Assert this condition. */
+            assert(dictIsRehashing(d) != 0);
+        }
+    }
+    return stored; /* Never reached. */
+}
+
 /* Function to reverse bits. Algorithm from:
  * http://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel */
 static unsigned long rev(unsigned long v) {
@@ -1172,7 +1224,7 @@ static unsigned long rev(unsigned long v) {
  * 其中 SIZE-1 是哈希表的最大索引值，
  * 这个最大索引值就是哈希表的 mask （掩码）。
  *
- * For example if the current hash table size is 64, the mask is
+ * For example if the current hash table size is 16, the mask is
  * (in binary) 1111. The position of a key in the hash table will be always
  * the last four bits of the hash output, and so forth.
  *
diff --git a/src/dict.h b/src/dict.h
index b538056d1..46b79f016 100644
--- a/src/dict.h
+++ b/src/dict.h
@@ -269,6 +269,7 @@ dictIterator *dictGetSafeIterator(dict *d);
 dictEntry *dictNext(dictIterator *iter);
 void dictReleaseIterator(dictIterator *iter);
 dictEntry *dictGetRandomKey(dict *d);
+int dictGetRandomKeys(dict *d, dictEntry **des, int count);
 void dictPrintStats(dict *d);
 unsigned int dictGenHashFunction(const void *key, int len);
 unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len);
diff --git a/src/help.h b/src/help.h
index 60a5726bf..8395c525b 100644
--- a/src/help.h
+++ b/src/help.h
@@ -14,7 +14,8 @@ static char *commandGroups[] = {
     "transactions",
     "connection",
     "server",
-    "scripting"
+    "scripting",
+    "hyperloglog"
 };
 
 struct commandHelp {
@@ -54,6 +55,11 @@ struct commandHelp {
     "Perform bitwise operations between strings",
     1,
     "2.6.0" },
+    { "BITPOS",
+    "key bit [start] [end]",
+    "Find first bit set or clear in a string",
+    1,
+    "2.8.7" },
     { "BLPOP",
     "key [key ...] timeout",
     "Remove and get the first element in a list, or block until one is available",
@@ -84,6 +90,11 @@ struct commandHelp {
     "Get the list of client connections",
     9,
     "2.4.0" },
+    { "CLIENT PAUSE",
+    "timeout",
+    "Stop processing commands from clients for some time",
+    9,
+    "2.9.50" },
     { "CLIENT SETNAME",
     "connection-name",
     "Set the current connection name",
@@ -99,6 +110,11 @@ struct commandHelp {
     "Reset the stats returned by INFO",
     9,
     "2.0.0" },
+    { "CONFIG REWRITE",
+    "-",
+    "Rewrite the configuration file with the in memory configuration",
+    9,
+    "2.8.0" },
     { "CONFIG SET",
     "parameter value",
     "Set a configuration parameter to the given value",
@@ -259,6 +275,11 @@ struct commandHelp {
     "Set multiple hash fields to multiple values",
     5,
     "2.0.0" },
+    { "HSCAN",
+    "key cursor [MATCH pattern] [COUNT count]",
+    "Incrementally iterate hash fields and associated values",
+    5,
+    "2.8.0" },
     { "HSET",
     "key field value",
     "Set the string value of a hash field",
@@ -360,7 +381,7 @@ struct commandHelp {
     1,
     "1.0.0" },
     { "MIGRATE",
-    "host port key destination-db timeout",
+    "host port key destination-db timeout [COPY] [REPLACE]",
     "Atomically transfer a key from a Redis instance to another one.",
     0,
     "2.6.0" },
@@ -409,6 +430,21 @@ struct commandHelp {
     "Set the expiration for a key as a UNIX timestamp specified in milliseconds",
     0,
     "2.6.0" },
+    { "PFADD",
+    "key element [element ...]",
+    "Adds the specified elements to the specified HyperLogLog.",
+    11,
+    "2.8.9" },
+    { "PFCOUNT",
+    "key [key ...]",
+    "Return the approximated cardinality of the set(s) observed by the HyperLogLog at key(s).",
+    11,
+    "2.8.9" },
+    { "PFMERGE",
+    "destkey sourcekey [sourcekey ...]",
+    "Merge N different HyperLogLogs into a single one.",
+    11,
+    "2.8.9" },
     { "PING",
     "-",
     "Ping the server",
@@ -434,6 +470,11 @@ struct commandHelp {
     "Post a message to a channel",
     6,
     "2.0.0" },
+    { "PUBSUB",
+    "subcommand [argument [argument ...]]",
+    "Inspect the state of the Pub/Sub subsystem",
+    6,
+    "2.8.0" },
     { "PUNSUBSCRIBE",
     "[pattern [pattern ...]]",
     "Stop listening for messages posted to channels matching the given patterns",
@@ -494,6 +535,11 @@ struct commandHelp {
     "Synchronously save the dataset to disk",
     9,
     "1.0.0" },
+    { "SCAN",
+    "cursor [MATCH pattern] [COUNT count]",
+    "Incrementally iterate the keys space",
+    0,
+    "2.8.0" },
     { "SCARD",
     "key",
     "Get the number of members in a set",
@@ -619,6 +665,11 @@ struct commandHelp {
     "Remove one or more members from a set",
     3,
     "1.0.0" },
+    { "SSCAN",
+    "key cursor [MATCH pattern] [COUNT count]",
+    "Incrementally iterate Set elements",
+    3,
+    "2.8.0" },
     { "STRLEN",
     "key",
     "Get the length of the value stored in a key",
@@ -699,11 +750,21 @@ struct commandHelp {
     "Intersect multiple sorted sets and store the resulting sorted set in a new key",
     4,
     "2.0.0" },
+    { "ZLEXCOUNT",
+    "key min max",
+    "Count the number of members in a sorted set between a given lexicographical range",
+    4,
+    "2.8.9" },
     { "ZRANGE",
     "key start stop [WITHSCORES]",
     "Return a range of members in a sorted set, by index",
     4,
     "1.2.0" },
+    { "ZRANGEBYLEX",
+    "key min max [LIMIT offset count]",
+    "Return a range of members in a sorted set, by lexicographical range",
+    4,
+    "2.8.9" },
     { "ZRANGEBYSCORE",
     "key min max [WITHSCORES] [LIMIT offset count]",
     "Return a range of members in a sorted set, by score",
@@ -719,6 +780,11 @@ struct commandHelp {
     "Remove one or more members from a sorted set",
     4,
     "1.2.0" },
+    { "ZREMRANGEBYLEX",
+    "key min max",
+    "Remove all members in a sorted set between the given lexicographical range",
+    4,
+    "2.8.9" },
     { "ZREMRANGEBYRANK",
     "key start stop",
     "Remove all members in a sorted set within the given indexes",
@@ -744,6 +810,11 @@ struct commandHelp {
     "Determine the index of a member in a sorted set, with scores ordered from high to low",
     4,
     "2.0.0" },
+    { "ZSCAN",
+    "key cursor [MATCH pattern] [COUNT count]",
+    "Incrementally iterate sorted sets elements and associated scores",
+    4,
+    "2.8.0" },
     { "ZSCORE",
     "key member",
     "Get the score associated with the given member in a sorted set",
diff --git a/src/hyperloglog.c b/src/hyperloglog.c
new file mode 100644
index 000000000..1c6ed45f3
--- /dev/null
+++ b/src/hyperloglog.c
@@ -0,0 +1,1548 @@
+/* hyperloglog.c - Redis HyperLogLog probabilistic cardinality approximation.
+ * This file implements the algorithm and the exported Redis commands.
+ *
+ * Copyright (c) 2014, Salvatore Sanfilippo <antirez at gmail dot com>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ *   * Redistributions of source code must retain the above copyright notice,
+ *     this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *   * Neither the name of Redis nor the names of its contributors may be used
+ *     to endorse or promote products derived from this software without
+ *     specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "redis.h"
+
+#include <stdint.h>
+#include <math.h>
+
+/* The Redis HyperLogLog implementation is based on the following ideas:
+ *
+ * * The use of a 64 bit hash function as proposed in [1], in order to don't
+ *   limited to cardinalities up to 10^9, at the cost of just 1 additional
+ *   bit per register.
+ * * The use of 16384 6-bit registers for a great level of accuracy, using
+ *   a total of 12k per key.
+ * * The use of the Redis string data type. No new type is introduced.
+ * * No attempt is made to compress the data structure as in [1]. Also the
+ *   algorithm used is the original HyperLogLog Algorithm as in [2], with
+ *   the only difference that a 64 bit hash function is used, so no correction
+ *   is performed for values near 2^32 as in [1].
+ *
+ * [1] Heule, Nunkesser, Hall: HyperLogLog in Practice: Algorithmic
+ *     Engineering of a State of The Art Cardinality Estimation Algorithm.
+ *
+ * [2] P. Flajolet, Éric Fusy, O. Gandouet, and F. Meunier. Hyperloglog: The
+ *     analysis of a near-optimal cardinality estimation algorithm.
+ *
+ * Redis uses two representations:
+ *
+ * 1) A "dense" representation where every entry is represented by
+ *    a 6-bit integer.
+ * 2) A "sparse" representation using run length compression suitable
+ *    for representing HyperLogLogs with many registers set to 0 in
+ *    a memory efficient way.
+ *
+ *
+ * HLL header
+ * ===
+ *
+ * Both the dense and sparse representation have a 16 byte header as follows:
+ *
+ * +------+---+-----+----------+
+ * | HYLL | E | N/U | Cardin.  |
+ * +------+---+-----+----------+
+ *
+ * The first 4 bytes are a magic string set to the bytes "HYLL".
+ * "E" is one byte encoding, currently set to HLL_DENSE or
+ * HLL_SPARSE. N/U are three not used bytes.
+ *
+ * The "Cardin." field is a 64 bit integer stored in little endian format
+ * with the latest cardinality computed that can be reused if the data
+ * structure was not modified since the last computation (this is useful
+ * because there are high probabilities that HLLADD operations don't
+ * modify the actual data structure and hence the approximated cardinality).
+ *
+ * When the most significant bit in the most significant byte of the cached
+ * cardinality is set, it means that the data structure was modified and
+ * we can't reuse the cached value that must be recomputed.
+ *
+ * Dense representation
+ * ===
+ *
+ * The dense representation used by Redis is the following:
+ *
+ * +--------+--------+--------+------//      //--+
+ * |11000000|22221111|33333322|55444444 ....     |
+ * +--------+--------+--------+------//      //--+
+ *
+ * The 6 bits counters are encoded one after the other starting from the
+ * LSB to the MSB, and using the next bytes as needed.
+ *
+ * Sparse representation
+ * ===
+ *
+ * The sparse representation encodes registers using a run length
+ * encoding composed of three opcodes, two using one byte, and one using
+ * of two bytes. The opcodes are called ZERO, XZERO and VAL.
+ *
+ * ZERO opcode is represented as 00xxxxxx. The 6-bit integer represented
+ * by the six bits 'xxxxxx', plus 1, means that there are N registers set
+ * to 0. This opcode can represent from 1 to 64 contiguous registers set
+ * to the value of 0.
+ *
+ * XZERO opcode is represented by two bytes 01xxxxxx yyyyyyyy. The 14-bit
+ * integer represented by the bits 'xxxxxx' as most significant bits and
+ * 'yyyyyyyy' as least significant bits, plus 1, means that there are N
+ * registers set to 0. This opcode can represent from 0 to 16384 contiguous
+ * registers set to the value of 0.
+ *
+ * VAL opcode is represented as 1vvvvvxx. It contains a 5-bit integer
+ * representing the value of a register, and a 2-bit integer representing
+ * the number of contiguous registers set to that value 'vvvvv'.
+ * To obtain the value and run length, the integers vvvvv and xx must be
+ * incremented by one. This opcode can represent values from 1 to 32,
+ * repeated from 1 to 4 times.
+ *
+ * The sparse representation can't represent registers with a value greater
+ * than 32, however it is very unlikely that we find such a register in an
+ * HLL with a cardinality where the sparse representation is still more
+ * memory efficient than the dense representation. When this happens the
+ * HLL is converted to the dense representation.
+ *
+ * The sparse representation is purely positional. For example a sparse
+ * representation of an empty HLL is just: XZERO:16384.
+ *
+ * An HLL having only 3 non-zero registers at position 1000, 1020, 1021
+ * respectively set to 2, 3, 3, is represented by the following three
+ * opcodes:
+ *
+ * XZERO:1000 (Registers 0-999 are set to 0)
+ * VAL:2,1    (1 register set to value 2, that is register 1000)
+ * ZERO:19    (Registers 1001-1019 set to 0)
+ * VAL:3,2    (2 registers set to value 3, that is registers 1020,1021)
+ * XZERO:15362 (Registers 1022-16383 set to 0)
+ *
+ * In the example the sparse representation used just 7 bytes instead
+ * of 12k in order to represent the HLL registers. In general for low
+ * cardinality there is a big win in terms of space efficiency, traded
+ * with CPU time since the sparse representation is slower to access:
+ *
+ * The following table shows average cardinality vs bytes used, 100
+ * samples per cardinality (when the set was not representable because
+ * of registers with too big value, the dense representation size was used
+ * as a sample).
+ *
+ * 100 267
+ * 200 485
+ * 300 678
+ * 400 859
+ * 500 1033
+ * 600 1205
+ * 700 1375
+ * 800 1544
+ * 900 1713
+ * 1000 1882
+ * 2000 3480
+ * 3000 4879
+ * 4000 6089
+ * 5000 7138
+ * 6000 8042
+ * 7000 8823
+ * 8000 9500
+ * 9000 10088
+ * 10000 10591
+ *
+ * The dense representation uses 12288 bytes, so there is a big win up to
+ * a cardinality of ~2000-3000. For bigger cardinalities the constant times
+ * involved in updating the sparse representation is not justified by the
+ * memory savings. The exact maximum length of the sparse representation
+ * when this implementation switches to the dense representation is
+ * configured via the define server.hll_sparse_max_bytes.
+ */
+
+struct hllhdr {
+    char magic[4];      /* "HYLL" */
+    uint8_t encoding;   /* HLL_DENSE or HLL_SPARSE. */
+    uint8_t notused[3]; /* Reserved for future use, must be zero. */
+    uint8_t card[8];    /* Cached cardinality, little endian. */
+    uint8_t registers[]; /* Data bytes. */
+};
+
+/* The cached cardinality MSB is used to signal validity of the cached value. */
+#define HLL_INVALIDATE_CACHE(hdr) (hdr)->card[0] |= (1<<7)
+#define HLL_VALID_CACHE(hdr) (((hdr)->card[0] & (1<<7)) == 0)
+
+#define HLL_P 14 /* The greater is P, the smaller the error. */
+#define HLL_REGISTERS (1<<HLL_P) /* With P=14, 16384 registers. */
+#define HLL_P_MASK (HLL_REGISTERS-1) /* Mask to index register. */
+#define HLL_BITS 6 /* Enough to count up to 63 leading zeroes. */
+#define HLL_REGISTER_MAX ((1<<HLL_BITS)-1)
+#define HLL_HDR_SIZE sizeof(struct hllhdr)
+#define HLL_DENSE_SIZE (HLL_HDR_SIZE+((HLL_REGISTERS*HLL_BITS+7)/8))
+#define HLL_DENSE 0 /* Dense encoding. */
+#define HLL_SPARSE 1 /* Sparse encoding. */
+#define HLL_RAW 255 /* Only used internally, never exposed. */
+#define HLL_MAX_ENCODING 1
+
+static char *invalid_hll_err = "-INVALIDOBJ Corrupted HLL object detected\r\n";
+
+/* =========================== Low level bit macros ========================= */
+
+/* Macros to access the dense representation.
+ *
+ * We need to get and set 6 bit counters in an array of 8 bit bytes.
+ * We use macros to make sure the code is inlined since speed is critical
+ * especially in order to compute the approximated cardinality in
+ * HLLCOUNT where we need to access all the registers at once.
+ * For the same reason we also want to avoid conditionals in this code path.
+ *
+ * +--------+--------+--------+------//
+ * |11000000|22221111|33333322|55444444
+ * +--------+--------+--------+------//
+ *
+ * Note: in the above representation the most significant bit (MSB)
+ * of every byte is on the left. We start using bits from the LSB to MSB,
+ * and so forth passing to the next byte.
+ *
+ * Example, we want to access to counter at pos = 1 ("111111" in the
+ * illustration above).
+ *
+ * The index of the first byte b0 containing our data is:
+ *
+ *  b0 = 6 * pos / 8 = 0
+ *
+ *   +--------+
+ *   |11000000|  <- Our byte at b0
+ *   +--------+
+ *
+ * The position of the first bit (counting from the LSB = 0) in the byte
+ * is given by:
+ *
+ *  fb = 6 * pos % 8 -> 6
+ *
+ * Right shift b0 of 'fb' bits.
+ *
+ *   +--------+
+ *   |11000000|  <- Initial value of b0
+ *   |00000011|  <- After right shift of 6 pos.
+ *   +--------+
+ *
+ * Left shift b1 of bits 8-fb bits (2 bits)
+ *
+ *   +--------+
+ *   |22221111|  <- Initial value of b1
+ *   |22111100|  <- After left shift of 2 bits.
+ *   +--------+
+ *
+ * OR the two bits, and finally AND with 111111 (63 in decimal) to
+ * clean the higher order bits we are not interested in:
+ *
+ *   +--------+
+ *   |00000011|  <- b0 right shifted
+ *   |22111100|  <- b1 left shifted
+ *   |22111111|  <- b0 OR b1
+ *   |  111111|  <- (b0 OR b1) AND 63, our value.
+ *   +--------+
+ *
+ * We can try with a different example, like pos = 0. In this case
+ * the 6-bit counter is actually contained in a single byte.
+ *
+ *  b0 = 6 * pos / 8 = 0
+ *
+ *   +--------+
+ *   |11000000|  <- Our byte at b0
+ *   +--------+
+ *
+ *  fb = 6 * pos % 8 = 0
+ *
+ *  So we right shift of 0 bits (no shift in practice) and
+ *  left shift the next byte of 8 bits, even if we don't use it,
+ *  but this has the effect of clearing the bits so the result
+ *  will not be affacted after the OR.
+ *
+ * -------------------------------------------------------------------------
+ *
+ * Setting the register is a bit more complex, let's assume that 'val'
+ * is the value we want to set, already in the right range.
+ *
+ * We need two steps, in one we need to clear the bits, and in the other
+ * we need to bitwise-OR the new bits.
+ *
+ * Let's try with 'pos' = 1, so our first byte at 'b' is 0,
+ *
+ * "fb" is 6 in this case.
+ *
+ *   +--------+
+ *   |11000000|  <- Our byte at b0
+ *   +--------+
+ *
+ * To create a AND-mask to clear the bits about this position, we just
+ * initialize the mask with the value 63, left shift it of "fs" bits,
+ * and finally invert the result.
+ *
+ *   +--------+
+ *   |00111111|  <- "mask" starts at 63
+ *   |11000000|  <- "mask" after left shift of "ls" bits.
+ *   |00111111|  <- "mask" after invert.
+ *   +--------+
+ *
+ * Now we can bitwise-AND the byte at "b" with the mask, and bitwise-OR
+ * it with "val" left-shifted of "ls" bits to set the new bits.
+ *
+ * Now let's focus on the next byte b1:
+ *
+ *   +--------+
+ *   |22221111|  <- Initial value of b1
+ *   +--------+
+ *
+ * To build the AND mask we start again with the 63 value, right shift
+ * it by 8-fb bits, and invert it.
+ *
+ *   +--------+
+ *   |00111111|  <- "mask" set at 2&6-1
+ *   |00001111|  <- "mask" after the right shift by 8-fb = 2 bits
+ *   |11110000|  <- "mask" after bitwise not.
+ *   +--------+
+ *
+ * Now we can mask it with b+1 to clear the old bits, and bitwise-OR
+ * with "val" left-shifted by "rs" bits to set the new value.
+ */
+
+/* Note: if we access the last counter, we will also access the b+1 byte
+ * that is out of the array, but sds strings always have an implicit null
+ * term, so the byte exists, and we can skip the conditional (or the need
+ * to allocate 1 byte more explicitly). */
+
+/* Store the value of the register at position 'regnum' into variable 'target'.
+ * 'p' is an array of unsigned bytes. */
+#define HLL_DENSE_GET_REGISTER(target,p,regnum) do { \
+    uint8_t *_p = (uint8_t*) p; \
+    unsigned long _byte = regnum*HLL_BITS/8; \
+    unsigned long _fb = regnum*HLL_BITS&7; \
+    unsigned long _fb8 = 8 - _fb; \
+    unsigned long b0 = _p[_byte]; \
+    unsigned long b1 = _p[_byte+1]; \
+    target = ((b0 >> _fb) | (b1 << _fb8)) & HLL_REGISTER_MAX; \
+} while(0)
+
+/* Set the value of the register at position 'regnum' to 'val'.
+ * 'p' is an array of unsigned bytes. */
+#define HLL_DENSE_SET_REGISTER(p,regnum,val) do { \
+    uint8_t *_p = (uint8_t*) p; \
+    unsigned long _byte = regnum*HLL_BITS/8; \
+    unsigned long _fb = regnum*HLL_BITS&7; \
+    unsigned long _fb8 = 8 - _fb; \
+    unsigned long _v = val; \
+    _p[_byte] &= ~(HLL_REGISTER_MAX << _fb); \
+    _p[_byte] |= _v << _fb; \
+    _p[_byte+1] &= ~(HLL_REGISTER_MAX >> _fb8); \
+    _p[_byte+1] |= _v >> _fb8; \
+} while(0)
+
+/* Macros to access the sparse representation.
+ * The macros parameter is expected to be an uint8_t pointer. */
+#define HLL_SPARSE_XZERO_BIT 0x40 /* 01xxxxxx */
+#define HLL_SPARSE_VAL_BIT 0x80 /* 1vvvvvxx */
+#define HLL_SPARSE_IS_ZERO(p) (((*(p)) & 0xc0) == 0) /* 00xxxxxx */
+#define HLL_SPARSE_IS_XZERO(p) (((*(p)) & 0xc0) == HLL_SPARSE_XZERO_BIT)
+#define HLL_SPARSE_IS_VAL(p) ((*(p)) & HLL_SPARSE_VAL_BIT)
+#define HLL_SPARSE_ZERO_LEN(p) (((*(p)) & 0x3f)+1)
+#define HLL_SPARSE_XZERO_LEN(p) (((((*(p)) & 0x3f) << 8) | (*((p)+1)))+1)
+#define HLL_SPARSE_VAL_VALUE(p) ((((*(p)) >> 2) & 0x1f)+1)
+#define HLL_SPARSE_VAL_LEN(p) (((*(p)) & 0x3)+1)
+#define HLL_SPARSE_VAL_MAX_VALUE 32
+#define HLL_SPARSE_VAL_MAX_LEN 4
+#define HLL_SPARSE_ZERO_MAX_LEN 64
+#define HLL_SPARSE_XZERO_MAX_LEN 16384
+#define HLL_SPARSE_VAL_SET(p,val,len) do { \
+    *(p) = (((val)-1)<<2|((len)-1))|HLL_SPARSE_VAL_BIT; \
+} while(0)
+#define HLL_SPARSE_ZERO_SET(p,len) do { \
+    *(p) = (len)-1; \
+} while(0)
+#define HLL_SPARSE_XZERO_SET(p,len) do { \
+    int _l = (len)-1; \
+    *(p) = (_l>>8) | HLL_SPARSE_XZERO_BIT; \
+    *((p)+1) = (_l&0xff); \
+} while(0)
+
+/* ========================= HyperLogLog algorithm  ========================= */
+
+/* Our hash function is MurmurHash2, 64 bit version.
+ * It was modified for Redis in order to provide the same result in
+ * big and little endian archs (endian neutral). */
+uint64_t MurmurHash64A (const void * key, int len, unsigned int seed) {
+    const uint64_t m = 0xc6a4a7935bd1e995;
+    const int r = 47;
+    uint64_t h = seed ^ (len * m);
+    const uint8_t *data = (const uint8_t *)key;
+    const uint8_t *end = data + (len-(len&7));
+
+    while(data != end) {
+        uint64_t k;
+
+#if (BYTE_ORDER == LITTLE_ENDIAN)
+        k = *((uint64_t*)data);
+#else
+        k = (uint64_t) data[0];
+        k |= (uint64_t) data[1] << 8;
+        k |= (uint64_t) data[2] << 16;
+        k |= (uint64_t) data[3] << 24;
+        k |= (uint64_t) data[4] << 32;
+        k |= (uint64_t) data[5] << 40;
+        k |= (uint64_t) data[6] << 48;
+        k |= (uint64_t) data[7] << 56;
+#endif
+
+        k *= m;
+        k ^= k >> r;
+        k *= m;
+        h ^= k;
+        h *= m;
+        data += 8;
+    }
+
+    switch(len & 7) {
+    case 7: h ^= (uint64_t)data[6] << 48;
+    case 6: h ^= (uint64_t)data[5] << 40;
+    case 5: h ^= (uint64_t)data[4] << 32;
+    case 4: h ^= (uint64_t)data[3] << 24;
+    case 3: h ^= (uint64_t)data[2] << 16;
+    case 2: h ^= (uint64_t)data[1] << 8;
+    case 1: h ^= (uint64_t)data[0];
+            h *= m;
+    };
+
+    h ^= h >> r;
+    h *= m;
+    h ^= h >> r;
+    return h;
+}
+
+/* Given a string element to add to the HyperLogLog, returns the length
+ * of the pattern 000..1 of the element hash. As a side effect 'regp' is
+ * set to the register index this element hashes to. */
+int hllPatLen(unsigned char *ele, size_t elesize, long *regp) {
+    uint64_t hash, bit, index;
+    int count;
+
+    /* Count the number of zeroes starting from bit HLL_REGISTERS
+     * (that is a power of two corresponding to the first bit we don't use
+     * as index). The max run can be 64-P+1 bits.
+     *
+     * Note that the final "1" ending the sequence of zeroes must be
+     * included in the count, so if we find "001" the count is 3, and
+     * the smallest count possible is no zeroes at all, just a 1 bit
+     * at the first position, that is a count of 1.
+     *
+     * This may sound like inefficient, but actually in the average case
+     * there are high probabilities to find a 1 after a few iterations. */
+    hash = MurmurHash64A(ele,elesize,0xadc83b19ULL);
+    index = hash & HLL_P_MASK; /* Register index. */
+    hash |= ((uint64_t)1<<63); /* Make sure the loop terminates. */
+    bit = HLL_REGISTERS; /* First bit not used to address the register. */
+    count = 1; /* Initialized to 1 since we count the "00000...1" pattern. */
+    while((hash & bit) == 0) {
+        count++;
+        bit <<= 1;
+    }
+    *regp = (int) index;
+    return count;
+}
+
+/* ================== Dense representation implementation  ================== */
+
+/* "Add" the element in the dense hyperloglog data structure.
+ * Actually nothing is added, but the max 0 pattern counter of the subset
+ * the element belongs to is incremented if needed.
+ *
+ * 'registers' is expected to have room for HLL_REGISTERS plus an
+ * additional byte on the right. This requirement is met by sds strings
+ * automatically since they are implicitly null terminated.
+ *
+ * The function always succeed, however if as a result of the operation
+ * the approximated cardinality changed, 1 is returned. Otherwise 0
+ * is returned. */
+int hllDenseAdd(uint8_t *registers, unsigned char *ele, size_t elesize) {
+    uint8_t oldcount, count;
+    long index;
+
+    /* Update the register if this element produced a longer run of zeroes. */
+    count = hllPatLen(ele,elesize,&index);
+    HLL_DENSE_GET_REGISTER(oldcount,registers,index);
+    if (count > oldcount) {
+        HLL_DENSE_SET_REGISTER(registers,index,count);
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+/* Compute SUM(2^-reg) in the dense representation.
+ * PE is an array with a pre-computer table of values 2^-reg indexed by reg.
+ * As a side effect the integer pointed by 'ezp' is set to the number
+ * of zero registers. */
+double hllDenseSum(uint8_t *registers, double *PE, int *ezp) {
+    double E = 0;
+    int j, ez = 0;
+
+    /* Redis default is to use 16384 registers 6 bits each. The code works
+     * with other values by modifying the defines, but for our target value
+     * we take a faster path with unrolled loops. */
+    if (HLL_REGISTERS == 16384 && HLL_BITS == 6) {
+        uint8_t *r = registers;
+        unsigned long r0, r1, r2, r3, r4, r5, r6, r7, r8, r9,
+                      r10, r11, r12, r13, r14, r15;
+        for (j = 0; j < 1024; j++) {
+            /* Handle 16 registers per iteration. */
+            r0 = r[0] & 63; if (r0 == 0) ez++;
+            r1 = (r[0] >> 6 | r[1] << 2) & 63; if (r1 == 0) ez++;
+            r2 = (r[1] >> 4 | r[2] << 4) & 63; if (r2 == 0) ez++;
+            r3 = (r[2] >> 2) & 63; if (r3 == 0) ez++;
+            r4 = r[3] & 63; if (r4 == 0) ez++;
+            r5 = (r[3] >> 6 | r[4] << 2) & 63; if (r5 == 0) ez++;
+            r6 = (r[4] >> 4 | r[5] << 4) & 63; if (r6 == 0) ez++;
+            r7 = (r[5] >> 2) & 63; if (r7 == 0) ez++;
+            r8 = r[6] & 63; if (r8 == 0) ez++;
+            r9 = (r[6] >> 6 | r[7] << 2) & 63; if (r9 == 0) ez++;
+            r10 = (r[7] >> 4 | r[8] << 4) & 63; if (r10 == 0) ez++;
+            r11 = (r[8] >> 2) & 63; if (r11 == 0) ez++;
+            r12 = r[9] & 63; if (r12 == 0) ez++;
+            r13 = (r[9] >> 6 | r[10] << 2) & 63; if (r13 == 0) ez++;
+            r14 = (r[10] >> 4 | r[11] << 4) & 63; if (r14 == 0) ez++;
+            r15 = (r[11] >> 2) & 63; if (r15 == 0) ez++;
+
+            /* Additional parens will allow the compiler to optimize the
+             * code more with a loss of precision that is not very relevant
+             * here (floating point math is not commutative!). */
+            E += (PE[r0] + PE[r1]) + (PE[r2] + PE[r3]) + (PE[r4] + PE[r5]) +
+                 (PE[r6] + PE[r7]) + (PE[r8] + PE[r9]) + (PE[r10] + PE[r11]) +
+                 (PE[r12] + PE[r13]) + (PE[r14] + PE[r15]);
+            r += 12;
+        }
+    } else {
+        for (j = 0; j < HLL_REGISTERS; j++) {
+            unsigned long reg;
+
+            HLL_DENSE_GET_REGISTER(reg,registers,j);
+            if (reg == 0) {
+                ez++;
+                /* Increment E at the end of the loop. */
+            } else {
+                E += PE[reg]; /* Precomputed 2^(-reg[j]). */
+            }
+        }
+        E += ez; /* Add 2^0 'ez' times. */
+    }
+    *ezp = ez;
+    return E;
+}
+
+/* ================== Sparse representation implementation  ================= */
+
+/* Convert the HLL with sparse representation given as input in its dense
+ * representation. Both representations are represented by SDS strings, and
+ * the input representation is freed as a side effect.
+ *
+ * The function returns REDIS_OK if the sparse representation was valid,
+ * otherwise REDIS_ERR is returned if the representation was corrupted. */
+int hllSparseToDense(robj *o) {
+    sds sparse = o->ptr, dense;
+    struct hllhdr *hdr, *oldhdr = (struct hllhdr*)sparse;
+    int idx = 0, runlen, regval;
+    uint8_t *p = (uint8_t*)sparse, *end = p+sdslen(sparse);
+
+    /* If the representation is already the right one return ASAP. */
+    hdr = (struct hllhdr*) sparse;
+    if (hdr->encoding == HLL_DENSE) return REDIS_OK;
+
+    /* Create a string of the right size filled with zero bytes.
+     * Note that the cached cardinality is set to 0 as a side effect
+     * that is exactly the cardinality of an empty HLL. */
+    dense = sdsnewlen(NULL,HLL_DENSE_SIZE);
+    hdr = (struct hllhdr*) dense;
+    *hdr = *oldhdr; /* This will copy the magic and cached cardinality. */
+    hdr->encoding = HLL_DENSE;
+
+    /* Now read the sparse representation and set non-zero registers
+     * accordingly. */
+    p += HLL_HDR_SIZE;
+    while(p < end) {
+        if (HLL_SPARSE_IS_ZERO(p)) {
+            runlen = HLL_SPARSE_ZERO_LEN(p);
+            idx += runlen;
+            p++;
+        } else if (HLL_SPARSE_IS_XZERO(p)) {
+            runlen = HLL_SPARSE_XZERO_LEN(p);
+            idx += runlen;
+            p += 2;
+        } else {
+            runlen = HLL_SPARSE_VAL_LEN(p);
+            regval = HLL_SPARSE_VAL_VALUE(p);
+            while(runlen--) {
+                HLL_DENSE_SET_REGISTER(hdr->registers,idx,regval);
+                idx++;
+            }
+            p++;
+        }
+    }
+
+    /* If the sparse representation was valid, we expect to find idx
+     * set to HLL_REGISTERS. */
+    if (idx != HLL_REGISTERS) {
+        sdsfree(dense);
+        return REDIS_ERR;
+    }
+
+    /* Free the old representation and set the new one. */
+    sdsfree(o->ptr);
+    o->ptr = dense;
+    return REDIS_OK;
+}
+
+/* "Add" the element in the sparse hyperloglog data structure.
+ * Actually nothing is added, but the max 0 pattern counter of the subset
+ * the element belongs to is incremented if needed.
+ *
+ * The object 'o' is the String object holding the HLL. The function requires
+ * a reference to the object in order to be able to enlarge the string if
+ * needed.
+ *
+ * On success, the function returns 1 if the cardinality changed, or 0
+ * if the register for this element was not updated.
+ * On error (if the representation is invalid) -1 is returned.
+ *
+ * As a side effect the function may promote the HLL representation from
+ * sparse to dense: this happens when a register requires to be set to a value
+ * not representable with the sparse representation, or when the resulting
+ * size would be greater than server.hll_sparse_max_bytes. */
+int hllSparseAdd(robj *o, unsigned char *ele, size_t elesize) {
+    struct hllhdr *hdr;
+    uint8_t oldcount, count, *sparse, *end, *p, *prev, *next;
+    long index, first, span;
+    long is_zero = 0, is_xzero = 0, is_val = 0, runlen = 0;
+
+    /* Update the register if this element produced a longer run of zeroes. */
+    count = hllPatLen(ele,elesize,&index);
+
+    /* If the count is too big to be representable by the sparse representation
+     * switch to dense representation. */
+    if (count > HLL_SPARSE_VAL_MAX_VALUE) goto promote;
+
+    /* When updating a sparse representation, sometimes we may need to
+     * enlarge the buffer for up to 3 bytes in the worst case (XZERO split
+     * into XZERO-VAL-XZERO). Make sure there is enough space right now
+     * so that the pointers we take during the execution of the function
+     * will be valid all the time. */
+    o->ptr = sdsMakeRoomFor(o->ptr,3);
+
+    /* Step 1: we need to locate the opcode we need to modify to check
+     * if a value update is actually needed. */
+    sparse = p = ((uint8_t*)o->ptr) + HLL_HDR_SIZE;
+    end = p + sdslen(o->ptr) - HLL_HDR_SIZE;
+
+    first = 0;
+    prev = NULL; /* Points to previos opcode at the end of the loop. */
+    next = NULL; /* Points to the next opcode at the end of the loop. */
+    span = 0;
+    while(p < end) {
+        long oplen;
+
+        /* Set span to the number of registers covered by this opcode.
+         *
+         * This is the most performance critical loop of the sparse
+         * representation. Sorting the conditionals from the most to the
+         * least frequent opcode in many-bytes sparse HLLs is faster. */
+        oplen = 1;
+        if (HLL_SPARSE_IS_ZERO(p)) {
+            span = HLL_SPARSE_ZERO_LEN(p);
+        } else if (HLL_SPARSE_IS_VAL(p)) {
+            span = HLL_SPARSE_VAL_LEN(p);
+        } else { /* XZERO. */
+            span = HLL_SPARSE_XZERO_LEN(p);
+            oplen = 2;
+        }
+        /* Break if this opcode covers the register as 'index'. */
+        if (index <= first+span-1) break;
+        prev = p;
+        p += oplen;
+        first += span;
+    }
+    if (span == 0) return -1; /* Invalid format. */
+
+    next = HLL_SPARSE_IS_XZERO(p) ? p+2 : p+1;
+    if (next >= end) next = NULL;
+
+    /* Cache current opcode type to avoid using the macro again and
+     * again for something that will not change.
+     * Also cache the run-length of the opcode. */
+    if (HLL_SPARSE_IS_ZERO(p)) {
+        is_zero = 1;
+        runlen = HLL_SPARSE_ZERO_LEN(p);
+    } else if (HLL_SPARSE_IS_XZERO(p)) {
+        is_xzero = 1;
+        runlen = HLL_SPARSE_XZERO_LEN(p);
+    } else {
+        is_val = 1;
+        runlen = HLL_SPARSE_VAL_LEN(p);
+    }
+
+    /* Step 2: After the loop:
+     *
+     * 'first' stores to the index of the first register covered
+     *  by the current opcode, which is pointed by 'p'.
+     *
+     * 'next' ad 'prev' store respectively the next and previous opcode,
+     *  or NULL if the opcode at 'p' is respectively the last or first.
+     *
+     * 'span' is set to the number of registers covered by the current
+     *  opcode.
+     *
+     * There are different cases in order to update the data structure
+     * in place without generating it from scratch:
+     *
+     * A) If it is a VAL opcode already set to a value >= our 'count'
+     *    no update is needed, regardless of the VAL run-length field.
+     *    In this case PFADD returns 0 since no changes are performed.
+     *
+     * B) If it is a VAL opcode with len = 1 (representing only our
+     *    register) and the value is less than 'count', we just update it
+     *    since this is a trivial case. */
+    if (is_val) {
+        oldcount = HLL_SPARSE_VAL_VALUE(p);
+        /* Case A. */
+        if (oldcount >= count) return 0;
+
+        /* Case B. */
+        if (runlen == 1) {
+            HLL_SPARSE_VAL_SET(p,count,1);
+            goto updated;
+        }
+    }
+
+    /* C) Another trivial to handle case is a ZERO opcode with a len of 1.
+     * We can just replace it with a VAL opcode with our value and len of 1. */
+    if (is_zero && runlen == 1) {
+        HLL_SPARSE_VAL_SET(p,count,1);
+        goto updated;
+    }
+
+    /* D) General case.
+     *
+     * The other cases are more complex: our register requires to be updated
+     * and is either currently represented by a VAL opcode with len > 1,
+     * by a ZERO opcode with len > 1, or by an XZERO opcode.
+     *
+     * In those cases the original opcode must be split into muliple
+     * opcodes. The worst case is an XZERO split in the middle resuling into
+     * XZERO - VAL - XZERO, so the resulting sequence max length is
+     * 5 bytes.
+     *
+     * We perform the split writing the new sequence into the 'new' buffer
+     * with 'newlen' as length. Later the new sequence is inserted in place
+     * of the old one, possibly moving what is on the right a few bytes
+     * if the new sequence is longer than the older one. */
+    uint8_t seq[5], *n = seq;
+    int last = first+span-1; /* Last register covered by the sequence. */
+    int len;
+
+    if (is_zero || is_xzero) {
+        /* Handle splitting of ZERO / XZERO. */
+        if (index != first) {
+            len = index-first;
+            if (len > HLL_SPARSE_ZERO_MAX_LEN) {
+                HLL_SPARSE_XZERO_SET(n,len);
+                n += 2;
+            } else {
+                HLL_SPARSE_ZERO_SET(n,len);
+                n++;
+            }
+        }
+        HLL_SPARSE_VAL_SET(n,count,1);
+        n++;
+        if (index != last) {
+            len = last-index;
+            if (len > HLL_SPARSE_ZERO_MAX_LEN) {
+                HLL_SPARSE_XZERO_SET(n,len);
+                n += 2;
+            } else {
+                HLL_SPARSE_ZERO_SET(n,len);
+                n++;
+            }
+        }
+    } else {
+        /* Handle splitting of VAL. */
+        int curval = HLL_SPARSE_VAL_VALUE(p);
+
+        if (index != first) {
+            len = index-first;
+            HLL_SPARSE_VAL_SET(n,curval,len);
+            n++;
+        }
+        HLL_SPARSE_VAL_SET(n,count,1);
+        n++;
+        if (index != last) {
+            len = last-index;
+            HLL_SPARSE_VAL_SET(n,curval,len);
+            n++;
+        }
+    }
+
+    /* Step 3: substitute the new sequence with the old one.
+     *
+     * Note that we already allocated space on the sds string
+     * calling sdsMakeRoomFor(). */
+     int seqlen = n-seq;
+     int oldlen = is_xzero ? 2 : 1;
+     int deltalen = seqlen-oldlen;
+
+     if (deltalen > 0 &&
+         sdslen(o->ptr)+deltalen > server.hll_sparse_max_bytes) goto promote;
+     if (deltalen && next) memmove(next+deltalen,next,end-next);
+     sdsIncrLen(o->ptr,deltalen);
+     memcpy(p,seq,seqlen);
+     end += deltalen;
+
+updated:
+    /* Step 4: Merge adjacent values if possible.
+     *
+     * The representation was updated, however the resulting representation
+     * may not be optimal: adjacent VAL opcodes can sometimes be merged into
+     * a single one. */
+    p = prev ? prev : sparse;
+    int scanlen = 5; /* Scan up to 5 upcodes starting from prev. */
+    while (p < end && scanlen--) {
+        if (HLL_SPARSE_IS_XZERO(p)) {
+            p += 2;
+            continue;
+        } else if (HLL_SPARSE_IS_ZERO(p)) {
+            p++;
+            continue;
+        }
+        /* We need two adjacent VAL opcodes to try a merge, having
+         * the same value, and a len that fits the VAL opcode max len. */
+        if (p+1 < end && HLL_SPARSE_IS_VAL(p+1)) {
+            int v1 = HLL_SPARSE_VAL_VALUE(p);
+            int v2 = HLL_SPARSE_VAL_VALUE(p+1);
+            if (v1 == v2) {
+                int len = HLL_SPARSE_VAL_LEN(p)+HLL_SPARSE_VAL_LEN(p+1);
+                if (len <= HLL_SPARSE_VAL_MAX_LEN) {
+                    HLL_SPARSE_VAL_SET(p+1,v1,len);
+                    memmove(p,p+1,end-p);
+                    sdsIncrLen(o->ptr,-1);
+                    end--;
+                    /* After a merge we reiterate without incrementing 'p'
+                     * in order to try to merge the just merged value with
+                     * a value on its right. */
+                    continue;
+                }
+            }
+        }
+        p++;
+    }
+
+    /* Invalidate the cached cardinality. */
+    hdr = o->ptr;
+    HLL_INVALIDATE_CACHE(hdr);
+    return 1;
+
+promote: /* Promote to dense representation. */
+    if (hllSparseToDense(o) == REDIS_ERR) return -1; /* Corrupted HLL. */
+    hdr = o->ptr;
+
+    /* We need to call hllDenseAdd() to perform the operation after the
+     * conversion. However the result must be 1, since if we need to
+     * convert from sparse to dense a register requires to be updated.
+     *
+     * Note that this in turn means that PFADD will make sure the command
+     * is propagated to slaves / AOF, so if there is a sparse -> dense
+     * convertion, it will be performed in all the slaves as well. */
+    int dense_retval = hllDenseAdd(hdr->registers, ele, elesize);
+    redisAssert(dense_retval == 1);
+    return dense_retval;
+}
+
+/* Compute SUM(2^-reg) in the sparse representation.
+ * PE is an array with a pre-computer table of values 2^-reg indexed by reg.
+ * As a side effect the integer pointed by 'ezp' is set to the number
+ * of zero registers. */
+double hllSparseSum(uint8_t *sparse, int sparselen, double *PE, int *ezp, int *invalid) {
+    double E = 0;
+    int ez = 0, idx = 0, runlen, regval;
+    uint8_t *end = sparse+sparselen, *p = sparse;
+
+    while(p < end) {
+        if (HLL_SPARSE_IS_ZERO(p)) {
+            runlen = HLL_SPARSE_ZERO_LEN(p);
+            idx += runlen;
+            ez += runlen;
+            /* Increment E at the end of the loop. */
+            p++;
+        } else if (HLL_SPARSE_IS_XZERO(p)) {
+            runlen = HLL_SPARSE_XZERO_LEN(p);
+            idx += runlen;
+            ez += runlen;
+            /* Increment E at the end of the loop. */
+            p += 2;
+        } else {
+            runlen = HLL_SPARSE_VAL_LEN(p);
+            regval = HLL_SPARSE_VAL_VALUE(p);
+            idx += runlen;
+            E += PE[regval]*runlen;
+            p++;
+        }
+    }
+    if (idx != HLL_REGISTERS && invalid) *invalid = 1;
+    E += ez; /* Add 2^0 'ez' times. */
+    *ezp = ez;
+    return E;
+}
+
+/* ========================= HyperLogLog Count ==============================
+ * This is the core of the algorithm where the approximated count is computed.
+ * The function uses the lower level hllDenseSum() and hllSparseSum() functions
+ * as helpers to compute the SUM(2^-reg) part of the computation, which is
+ * representation-specific, while all the rest is common. */
+
+/* Implements the SUM operation for uint8_t data type which is only used
+ * internally as speedup for PFCOUNT with multiple keys. */
+double hllRawSum(uint8_t *registers, double *PE, int *ezp) {
+    double E = 0;
+    int j, ez = 0;
+    uint64_t *word = (uint64_t*) registers;
+    uint8_t *bytes;
+
+    for (j = 0; j < HLL_REGISTERS/8; j++) {
+        if (*word == 0) {
+            ez += 8;
+        } else {
+            bytes = (uint8_t*) word;
+            if (bytes[0]) E += PE[bytes[0]]; else ez++;
+            if (bytes[1]) E += PE[bytes[1]]; else ez++;
+            if (bytes[2]) E += PE[bytes[2]]; else ez++;
+            if (bytes[3]) E += PE[bytes[3]]; else ez++;
+            if (bytes[4]) E += PE[bytes[4]]; else ez++;
+            if (bytes[5]) E += PE[bytes[5]]; else ez++;
+            if (bytes[6]) E += PE[bytes[6]]; else ez++;
+            if (bytes[7]) E += PE[bytes[7]]; else ez++;
+        }
+        word++;
+    }
+    E += ez; /* 2^(-reg[j]) is 1 when m is 0, add it 'ez' times for every
+                zero register in the HLL. */
+    *ezp = ez;
+    return E;
+}
+
+/* Return the approximated cardinality of the set based on the armonic
+ * mean of the registers values. 'hdr' points to the start of the SDS
+ * representing the String object holding the HLL representation.
+ *
+ * If the sparse representation of the HLL object is not valid, the integer
+ * pointed by 'invalid' is set to non-zero, otherwise it is left untouched.
+ *
+ * hllCount() supports a special internal-only encoding of HLL_RAW, that
+ * is, hdr->registers will point to an uint8_t array of HLL_REGISTERS element.
+ * This is useful in order to speedup PFCOUNT when called against multiple
+ * keys (no need to work with 6-bit integers encoding). */
+uint64_t hllCount(struct hllhdr *hdr, int *invalid) {
+    double m = HLL_REGISTERS;
+    double E, alpha = 0.7213/(1+1.079/m);
+    int j, ez; /* Number of registers equal to 0. */
+
+    /* We precompute 2^(-reg[j]) in a small table in order to
+     * speedup the computation of SUM(2^-register[0..i]). */
+    static int initialized = 0;
+    static double PE[64];
+    if (!initialized) {
+        PE[0] = 1; /* 2^(-reg[j]) is 1 when m is 0. */
+        for (j = 1; j < 64; j++) {
+            /* 2^(-reg[j]) is the same as 1/2^reg[j]. */
+            PE[j] = 1.0/(1ULL << j);
+        }
+        initialized = 1;
+    }
+
+    /* Compute SUM(2^-register[0..i]). */
+    if (hdr->encoding == HLL_DENSE) {
+        E = hllDenseSum(hdr->registers,PE,&ez);
+    } else if (hdr->encoding == HLL_SPARSE) {
+        E = hllSparseSum(hdr->registers,
+                         sdslen((sds)hdr)-HLL_HDR_SIZE,PE,&ez,invalid);
+    } else if (hdr->encoding == HLL_RAW) {
+        E = hllRawSum(hdr->registers,PE,&ez);
+    } else {
+        redisPanic("Unknown HyperLogLog encoding in hllCount()");
+    }
+
+    /* Muliply the inverse of E for alpha_m * m^2 to have the raw estimate. */
+    E = (1/E)*alpha*m*m;
+
+    /* Use the LINEARCOUNTING algorithm for small cardinalities.
+     * For larger values but up to 72000 HyperLogLog raw approximation is
+     * used since linear counting error starts to increase. However HyperLogLog
+     * shows a strong bias in the range 2.5*16384 - 72000, so we try to
+     * compensate for it. */
+    if (E < m*2.5 && ez != 0) {
+        E = m*log(m/ez); /* LINEARCOUNTING() */
+    } else if (m == 16384 && E < 72000) {
+        /* We did polynomial regression of the bias for this range, this
+         * way we can compute the bias for a given cardinality and correct
+         * according to it. Only apply the correction for P=14 that's what
+         * we use and the value the correction was verified with. */
+        double bias = 5.9119*1.0e-18*(E*E*E*E)
+                      -1.4253*1.0e-12*(E*E*E)+
+                      1.2940*1.0e-7*(E*E)
+                      -5.2921*1.0e-3*E+
+                      83.3216;
+        E -= E*(bias/100);
+    }
+    /* We don't apply the correction for E > 1/30 of 2^32 since we use
+     * a 64 bit function and 6 bit counters. To apply the correction for
+     * 1/30 of 2^64 is not needed since it would require a huge set
+     * to approach such a value. */
+    return (uint64_t) E;
+}
+
+/* Call hllDenseAdd() or hllSparseAdd() according to the HLL encoding. */
+int hllAdd(robj *o, unsigned char *ele, size_t elesize) {
+    struct hllhdr *hdr = o->ptr;
+    switch(hdr->encoding) {
+    case HLL_DENSE: return hllDenseAdd(hdr->registers,ele,elesize);
+    case HLL_SPARSE: return hllSparseAdd(o,ele,elesize);
+    default: return -1; /* Invalid representation. */
+    }
+}
+
+/* Merge by computing MAX(registers[i],hll[i]) the HyperLogLog 'hll'
+ * with an array of uint8_t HLL_REGISTERS registers pointed by 'max'.
+ *
+ * The hll object must be already validated via isHLLObjectOrReply()
+ * or in some other way.
+ *
+ * If the HyperLogLog is sparse and is found to be invalid, REDIS_ERR
+ * is returned, otherwise the function always succeeds. */
+int hllMerge(uint8_t *max, robj *hll) {
+    struct hllhdr *hdr = hll->ptr;
+    int i;
+
+    if (hdr->encoding == HLL_DENSE) {
+        uint8_t val;
+
+        for (i = 0; i < HLL_REGISTERS; i++) {
+            HLL_DENSE_GET_REGISTER(val,hdr->registers,i);
+            if (val > max[i]) max[i] = val;
+        }
+    } else {
+        uint8_t *p = hll->ptr, *end = p + sdslen(hll->ptr);
+        long runlen, regval;
+
+        p += HLL_HDR_SIZE;
+        i = 0;
+        while(p < end) {
+            if (HLL_SPARSE_IS_ZERO(p)) {
+                runlen = HLL_SPARSE_ZERO_LEN(p);
+                i += runlen;
+                p++;
+            } else if (HLL_SPARSE_IS_XZERO(p)) {
+                runlen = HLL_SPARSE_XZERO_LEN(p);
+                i += runlen;
+                p += 2;
+            } else {
+                runlen = HLL_SPARSE_VAL_LEN(p);
+                regval = HLL_SPARSE_VAL_VALUE(p);
+                while(runlen--) {
+                    if (regval > max[i]) max[i] = regval;
+                    i++;
+                }
+                p++;
+            }
+        }
+        if (i != HLL_REGISTERS) return REDIS_ERR;
+    }
+    return REDIS_OK;
+}
+
+/* ========================== HyperLogLog commands ========================== */
+
+/* Create an HLL object. We always create the HLL using sparse encoding.
+ * This will be upgraded to the dense representation as needed. */
+robj *createHLLObject(void) {
+    robj *o;
+    struct hllhdr *hdr;
+    sds s;
+    uint8_t *p;
+    int sparselen = HLL_HDR_SIZE +
+                    (((HLL_REGISTERS+(HLL_SPARSE_XZERO_MAX_LEN-1)) /
+                     HLL_SPARSE_XZERO_MAX_LEN)*2);
+    int aux;
+
+    /* Populate the sparse representation with as many XZERO opcodes as
+     * needed to represent all the registers. */
+    aux = HLL_REGISTERS;
+    s = sdsnewlen(NULL,sparselen);
+    p = (uint8_t*)s + HLL_HDR_SIZE;
+    while(aux) {
+        int xzero = HLL_SPARSE_XZERO_MAX_LEN;
+        if (xzero > aux) xzero = aux;
+        HLL_SPARSE_XZERO_SET(p,xzero);
+        p += 2;
+        aux -= xzero;
+    }
+    redisAssert((p-(uint8_t*)s) == sparselen);
+
+    /* Create the actual object. */
+    o = createObject(REDIS_STRING,s);
+    hdr = o->ptr;
+    memcpy(hdr->magic,"HYLL",4);
+    hdr->encoding = HLL_SPARSE;
+    return o;
+}
+
+/* Check if the object is a String with a valid HLL representation.
+ * Return REDIS_OK if this is true, otherwise reply to the client
+ * with an error and return REDIS_ERR. */
+int isHLLObjectOrReply(redisClient *c, robj *o) {
+    struct hllhdr *hdr;
+
+    /* Key exists, check type */
+    if (checkType(c,o,REDIS_STRING))
+        return REDIS_ERR; /* Error already sent. */
+
+    if (stringObjectLen(o) < sizeof(*hdr)) goto invalid;
+    hdr = o->ptr;
+
+    /* Magic should be "HYLL". */
+    if (hdr->magic[0] != 'H' || hdr->magic[1] != 'Y' ||
+        hdr->magic[2] != 'L' || hdr->magic[3] != 'L') goto invalid;
+
+    if (hdr->encoding > HLL_MAX_ENCODING) goto invalid;
+
+    /* Dense representation string length should match exactly. */
+    if (hdr->encoding == HLL_DENSE &&
+        stringObjectLen(o) != HLL_DENSE_SIZE) goto invalid;
+
+    /* All tests passed. */
+    return REDIS_OK;
+
+invalid:
+    addReplySds(c,
+        sdsnew("-WRONGTYPE Key is not a valid "
+               "HyperLogLog string value.\r\n"));
+    return REDIS_ERR;
+}
+
+/* PFADD var ele ele ele ... ele => :0 or :1 */
+void pfaddCommand(redisClient *c) {
+    robj *o = lookupKeyWrite(c->db,c->argv[1]);
+    struct hllhdr *hdr;
+    int updated = 0, j;
+
+    if (o == NULL) {
+        /* Create the key with a string value of the exact length to
+         * hold our HLL data structure. sdsnewlen() when NULL is passed
+         * is guaranteed to return bytes initialized to zero. */
+        o = createHLLObject();
+        dbAdd(c->db,c->argv[1],o);
+        updated++;
+    } else {
+        if (isHLLObjectOrReply(c,o) != REDIS_OK) return;
+        o = dbUnshareStringValue(c->db,c->argv[1],o);
+    }
+    /* Perform the low level ADD operation for every element. */
+    for (j = 2; j < c->argc; j++) {
+        int retval = hllAdd(o, (unsigned char*)c->argv[j]->ptr,
+                               sdslen(c->argv[j]->ptr));
+        switch(retval) {
+        case 1:
+            updated++;
+            break;
+        case -1:
+            addReplySds(c,sdsnew(invalid_hll_err));
+            return;
+        }
+    }
+    hdr = o->ptr;
+    if (updated) {
+        signalModifiedKey(c->db,c->argv[1]);
+        notifyKeyspaceEvent(REDIS_NOTIFY_STRING,"pfadd",c->argv[1],c->db->id);
+        server.dirty++;
+        HLL_INVALIDATE_CACHE(hdr);
+    }
+    addReply(c, updated ? shared.cone : shared.czero);
+}
+
+/* PFCOUNT var -> approximated cardinality of set. */
+void pfcountCommand(redisClient *c) {
+    robj *o;
+    struct hllhdr *hdr;
+    uint64_t card;
+
+    /* Case 1: multi-key keys, cardinality of the union.
+     *
+     * When multiple keys are specified, PFCOUNT actually computes
+     * the cardinality of the merge of the N HLLs specified. */
+    if (c->argc > 2) {
+        uint8_t max[HLL_HDR_SIZE+HLL_REGISTERS], *registers;
+        int j;
+
+        /* Compute an HLL with M[i] = MAX(M[i]_j). */
+        memset(max,0,sizeof(max));
+        hdr = (struct hllhdr*) max;
+        hdr->encoding = HLL_RAW; /* Special internal-only encoding. */
+        registers = max + HLL_HDR_SIZE;
+        for (j = 1; j < c->argc; j++) {
+            /* Check type and size. */
+            robj *o = lookupKeyRead(c->db,c->argv[j]);
+            if (o == NULL) continue; /* Assume empty HLL for non existing var. */
+            if (isHLLObjectOrReply(c,o) != REDIS_OK) return;
+
+            /* Merge with this HLL with our 'max' HHL by setting max[i]
+             * to MAX(max[i],hll[i]). */
+            if (hllMerge(registers,o) == REDIS_ERR) {
+                addReplySds(c,sdsnew(invalid_hll_err));
+                return;
+            }
+        }
+
+        /* Compute cardinality of the resulting set. */
+        addReplyLongLong(c,hllCount(hdr,NULL));
+        return;
+    }
+
+    /* Case 2: cardinality of the single HLL.
+     *
+     * The user specified a single key. Either return the cached value
+     * or compute one and update the cache. */
+    o = lookupKeyRead(c->db,c->argv[1]);
+    if (o == NULL) {
+        /* No key? Cardinality is zero since no element was added, otherwise
+         * we would have a key as HLLADD creates it as a side effect. */
+        addReply(c,shared.czero);
+    } else {
+        if (isHLLObjectOrReply(c,o) != REDIS_OK) return;
+        o = dbUnshareStringValue(c->db,c->argv[1],o);
+
+        /* Check if the cached cardinality is valid. */
+        hdr = o->ptr;
+        if (HLL_VALID_CACHE(hdr)) {
+            /* Just return the cached value. */
+            card = (uint64_t)hdr->card[0];
+            card |= (uint64_t)hdr->card[1] << 8;
+            card |= (uint64_t)hdr->card[2] << 16;
+            card |= (uint64_t)hdr->card[3] << 24;
+            card |= (uint64_t)hdr->card[4] << 32;
+            card |= (uint64_t)hdr->card[5] << 40;
+            card |= (uint64_t)hdr->card[6] << 48;
+            card |= (uint64_t)hdr->card[7] << 56;
+        } else {
+            int invalid = 0;
+            /* Recompute it and update the cached value. */
+            card = hllCount(hdr,&invalid);
+            if (invalid) {
+                addReplySds(c,sdsnew(invalid_hll_err));
+                return;
+            }
+            hdr->card[0] = card & 0xff;
+            hdr->card[1] = (card >> 8) & 0xff;
+            hdr->card[2] = (card >> 16) & 0xff;
+            hdr->card[3] = (card >> 24) & 0xff;
+            hdr->card[4] = (card >> 32) & 0xff;
+            hdr->card[5] = (card >> 40) & 0xff;
+            hdr->card[6] = (card >> 48) & 0xff;
+            hdr->card[7] = (card >> 56) & 0xff;
+            /* This is not considered a read-only command even if the
+             * data structure is not modified, since the cached value
+             * may be modified and given that the HLL is a Redis string
+             * we need to propagate the change. */
+            signalModifiedKey(c->db,c->argv[1]);
+            server.dirty++;
+        }
+        addReplyLongLong(c,card);
+    }
+}
+
+/* PFMERGE dest src1 src2 src3 ... srcN => OK */
+void pfmergeCommand(redisClient *c) {
+    uint8_t max[HLL_REGISTERS];
+    struct hllhdr *hdr;
+    int j;
+
+    /* Compute an HLL with M[i] = MAX(M[i]_j).
+     * We we the maximum into the max array of registers. We'll write
+     * it to the target variable later. */
+    memset(max,0,sizeof(max));
+    for (j = 1; j < c->argc; j++) {
+        /* Check type and size. */
+        robj *o = lookupKeyRead(c->db,c->argv[j]);
+        if (o == NULL) continue; /* Assume empty HLL for non existing var. */
+        if (isHLLObjectOrReply(c,o) != REDIS_OK) return;
+
+        /* Merge with this HLL with our 'max' HHL by setting max[i]
+         * to MAX(max[i],hll[i]). */
+        if (hllMerge(max,o) == REDIS_ERR) {
+            addReplySds(c,sdsnew(invalid_hll_err));
+            return;
+        }
+    }
+
+    /* Create / unshare the destination key's value if needed. */
+    robj *o = lookupKeyWrite(c->db,c->argv[1]);
+    if (o == NULL) {
+        /* Create the key with a string value of the exact length to
+         * hold our HLL data structure. sdsnewlen() when NULL is passed
+         * is guaranteed to return bytes initialized to zero. */
+        o = createHLLObject();
+        dbAdd(c->db,c->argv[1],o);
+    } else {
+        /* If key exists we are sure it's of the right type/size
+         * since we checked when merging the different HLLs, so we
+         * don't check again. */
+        o = dbUnshareStringValue(c->db,c->argv[1],o);
+    }
+
+    /* Only support dense objects as destination. */
+    if (hllSparseToDense(o) == REDIS_ERR) {
+        addReplySds(c,sdsnew(invalid_hll_err));
+        return;
+    }
+
+    /* Write the resulting HLL to the destination HLL registers and
+     * invalidate the cached value. */
+    hdr = o->ptr;
+    for (j = 0; j < HLL_REGISTERS; j++) {
+        HLL_DENSE_SET_REGISTER(hdr->registers,j,max[j]);
+    }
+    HLL_INVALIDATE_CACHE(hdr);
+
+    signalModifiedKey(c->db,c->argv[1]);
+    /* We generate an PFADD event for PFMERGE for semantical simplicity
+     * since in theory this is a mass-add of elements. */
+    notifyKeyspaceEvent(REDIS_NOTIFY_STRING,"pfadd",c->argv[1],c->db->id);
+    server.dirty++;
+    addReply(c,shared.ok);
+}
+
+/* ========================== Testing / Debugging  ========================== */
+
+/* PFSELFTEST
+ * This command performs a self-test of the HLL registers implementation.
+ * Something that is not easy to test from within the outside. */
+#define HLL_TEST_CYCLES 1000
+void pfselftestCommand(redisClient *c) {
+    int j, i;
+    sds bitcounters = sdsnewlen(NULL,HLL_DENSE_SIZE);
+    struct hllhdr *hdr = (struct hllhdr*) bitcounters, *hdr2;
+    robj *o = NULL;
+    uint8_t bytecounters[HLL_REGISTERS];
+
+    /* Test 1: access registers.
+     * The test is conceived to test that the different counters of our data
+     * structure are accessible and that setting their values both result in
+     * the correct value to be retained and not affect adjacent values. */
+    for (j = 0; j < HLL_TEST_CYCLES; j++) {
+        /* Set the HLL counters and an array of unsigned byes of the
+         * same size to the same set of random values. */
+        for (i = 0; i < HLL_REGISTERS; i++) {
+            unsigned int r = rand() & HLL_REGISTER_MAX;
+
+            bytecounters[i] = r;
+            HLL_DENSE_SET_REGISTER(hdr->registers,i,r);
+        }
+        /* Check that we are able to retrieve the same values. */
+        for (i = 0; i < HLL_REGISTERS; i++) {
+            unsigned int val;
+
+            HLL_DENSE_GET_REGISTER(val,hdr->registers,i);
+            if (val != bytecounters[i]) {
+                addReplyErrorFormat(c,
+                    "TESTFAILED Register %d should be %d but is %d",
+                    i, (int) bytecounters[i], (int) val);
+                goto cleanup;
+            }
+        }
+    }
+
+    /* Test 2: approximation error.
+     * The test adds unique elements and check that the estimated value
+     * is always reasonable bounds.
+     * 
+     * We check that the error is smaller than 4 times than the expected
+     * standard error, to make it very unlikely for the test to fail because
+     * of a "bad" run.
+     *
+     * The test is performed with both dense and sparse HLLs at the same
+     * time also verifying that the computed cardinality is the same. */
+    memset(hdr->registers,0,HLL_DENSE_SIZE-HLL_HDR_SIZE);
+    o = createHLLObject();
+    double relerr = 1.04/sqrt(HLL_REGISTERS);
+    int64_t checkpoint = 1;
+    uint64_t seed = (uint64_t)rand() | (uint64_t)rand() << 32;
+    uint64_t ele;
+    for (j = 1; j <= 10000000; j++) {
+        ele = j ^ seed;
+        hllDenseAdd(hdr->registers,(unsigned char*)&ele,sizeof(ele));
+        hllAdd(o,(unsigned char*)&ele,sizeof(ele));
+
+        /* Make sure that for small cardinalities we use sparse
+         * encoding. */
+        if (j == checkpoint && j < server.hll_sparse_max_bytes/2) {
+            hdr2 = o->ptr;
+            if (hdr2->encoding != HLL_SPARSE) {
+                addReplyError(c, "TESTFAILED sparse encoding not used");
+                goto cleanup;
+            }
+        }
+
+        /* Check that dense and sparse representations agree. */
+        if (j == checkpoint && hllCount(hdr,NULL) != hllCount(o->ptr,NULL)) {
+                addReplyError(c, "TESTFAILED dense/sparse disagree");
+                goto cleanup;
+        }
+
+        /* Check error. */
+        if (j == checkpoint) {
+            int64_t abserr = checkpoint - (int64_t)hllCount(hdr,NULL);
+            if (abserr < 0) abserr = -abserr;
+            if (abserr > (uint64_t)(relerr*4*checkpoint)) {
+                addReplyErrorFormat(c,
+                    "TESTFAILED Too big error. card:%llu abserr:%llu",
+                    (unsigned long long) checkpoint,
+                    (unsigned long long) abserr);
+                goto cleanup;
+            }
+            checkpoint *= 10;
+        }
+    }
+
+    /* Success! */
+    addReply(c,shared.ok);
+
+cleanup:
+    sdsfree(bitcounters);
+    if (o) decrRefCount(o);
+}
+
+/* PFDEBUG <subcommand> <key> ... args ...
+ * Different debugging related operations about the HLL implementation. */
+void pfdebugCommand(redisClient *c) {
+    char *cmd = c->argv[1]->ptr;
+    struct hllhdr *hdr;
+    robj *o;
+    int j;
+
+    o = lookupKeyRead(c->db,c->argv[2]);
+    if (o == NULL) {
+        addReplyError(c,"The specified key does not exist");
+        return;
+    }
+    if (isHLLObjectOrReply(c,o) != REDIS_OK) return;
+    o = dbUnshareStringValue(c->db,c->argv[2],o);
+    hdr = o->ptr;
+
+    /* PFDEBUG GETREG <key> */
+    if (!strcasecmp(cmd,"getreg")) {
+        if (c->argc != 3) goto arityerr;
+
+        if (hdr->encoding == HLL_SPARSE) {
+            if (hllSparseToDense(o) == REDIS_ERR) {
+                addReplySds(c,sdsnew(invalid_hll_err));
+                return;
+            }
+            server.dirty++; /* Force propagation on encoding change. */
+        }
+
+        hdr = o->ptr;
+        addReplyMultiBulkLen(c,HLL_REGISTERS);
+        for (j = 0; j < HLL_REGISTERS; j++) {
+            uint8_t val;
+
+            HLL_DENSE_GET_REGISTER(val,hdr->registers,j);
+            addReplyLongLong(c,val);
+        }
+    }
+    /* PFDEBUG DECODE <key> */
+    else if (!strcasecmp(cmd,"decode")) {
+        if (c->argc != 3) goto arityerr;
+
+        uint8_t *p = o->ptr, *end = p+sdslen(o->ptr);
+        sds decoded = sdsempty();
+
+        if (hdr->encoding != HLL_SPARSE) {
+            addReplyError(c,"HLL encoding is not sparse");
+            return;
+        }
+
+        p += HLL_HDR_SIZE;
+        while(p < end) {
+            int runlen, regval;
+
+            if (HLL_SPARSE_IS_ZERO(p)) {
+                runlen = HLL_SPARSE_ZERO_LEN(p);
+                p++;
+                decoded = sdscatprintf(decoded,"z:%d ",runlen);
+            } else if (HLL_SPARSE_IS_XZERO(p)) {
+                runlen = HLL_SPARSE_XZERO_LEN(p);
+                p += 2;
+                decoded = sdscatprintf(decoded,"Z:%d ",runlen);
+            } else {
+                runlen = HLL_SPARSE_VAL_LEN(p);
+                regval = HLL_SPARSE_VAL_VALUE(p);
+                p++;
+                decoded = sdscatprintf(decoded,"v:%d,%d ",regval,runlen);
+            }
+        }
+        decoded = sdstrim(decoded," ");
+        addReplyBulkCBuffer(c,decoded,sdslen(decoded));
+        sdsfree(decoded);
+    }
+    /* PFDEBUG ENCODING <key> */
+    else if (!strcasecmp(cmd,"encoding")) {
+        char *encodingstr[2] = {"dense","sparse"};
+        if (c->argc != 3) goto arityerr;
+
+        addReplyStatus(c,encodingstr[hdr->encoding]);
+    }
+    /* PFDEBUG TODENSE <key> */
+    else if (!strcasecmp(cmd,"todense")) {
+        int conv = 0;
+        if (c->argc != 3) goto arityerr;
+
+        if (hdr->encoding == HLL_SPARSE) {
+            if (hllSparseToDense(o) == REDIS_ERR) {
+                addReplySds(c,sdsnew(invalid_hll_err));
+                return;
+            }
+            conv = 1;
+            server.dirty++; /* Force propagation on encoding change. */
+        }
+        addReply(c,conv ? shared.cone : shared.czero);
+    } else {
+        addReplyErrorFormat(c,"Unknown PFDEBUG subcommand '%s'", cmd);
+    }
+    return;
+
+arityerr:
+    addReplyErrorFormat(c,
+        "Wrong number of arguments for the '%s' subcommand",cmd);
+}
+
diff --git a/src/networking.c b/src/networking.c
index 7f1f889d3..5017b8ef3 100644
--- a/src/networking.c
+++ b/src/networking.c
@@ -167,6 +167,7 @@ redisClient *createClient(int fd) {
     // 订阅的频道和模式
     c->pubsub_channels = dictCreate(&setDictType,NULL);
     c->pubsub_patterns = listCreate();
+    c->peerid = NULL;
     listSetFreeMethod(c->pubsub_patterns,decrRefCountVoid);
     listSetMatchMethod(c->pubsub_patterns,listMatchObjects);
     // 如果不是伪客户端，那么添加到服务器的客户端链表中
@@ -739,6 +740,7 @@ void copyClientOutputBuffer(redisClient *dst, redisClient *src) {
 /*
  * TCP 连接 accept 处理器
  */
+#define MAX_ACCEPTS_PER_CALL 1000
 static void acceptCommonHandler(int fd, int flags) {
 
     // 创建客户端
@@ -782,41 +784,49 @@ static void acceptCommonHandler(int fd, int flags) {
  * 创建一个 TCP 连接处理器
  */
 void acceptTcpHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
-    int cport, cfd;
+    int cport, cfd, max = MAX_ACCEPTS_PER_CALL;
     char cip[REDIS_IP_STR_LEN];
     REDIS_NOTUSED(el);
     REDIS_NOTUSED(mask);
     REDIS_NOTUSED(privdata);
 
-    // accept 客户端连接
-    cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);
-    if (cfd == AE_ERR) {
-        redisLog(REDIS_WARNING,"Accepting client connection: %s", server.neterr);
-        return;
+    while(max--) {
+        // accept 客户端连接
+        cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);
+        if (cfd == ANET_ERR) {
+            if (errno != EWOULDBLOCK)
+                redisLog(REDIS_WARNING,
+                    "Accepting client connection: %s", server.neterr);
+            return;
+        }
+        redisLog(REDIS_VERBOSE,"Accepted %s:%d", cip, cport);
+        // 为客户端创建客户端状态（redisClient）
+        acceptCommonHandler(cfd,0);
     }
-    redisLog(REDIS_VERBOSE,"Accepted %s:%d", cip, cport);
-    // 为客户端创建客户端状态（redisClient）
-    acceptCommonHandler(cfd,0);
 }
 
 /*
  * 创建一个本地连接处理器
  */
 void acceptUnixHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
-    int cfd;
+    int cfd, max = MAX_ACCEPTS_PER_CALL;
     REDIS_NOTUSED(el);
     REDIS_NOTUSED(mask);
     REDIS_NOTUSED(privdata);
 
-    // accept 本地客户端连接
-    cfd = anetUnixAccept(server.neterr, fd);
-    if (cfd == AE_ERR) {
-        redisLog(REDIS_WARNING,"Accepting client connection: %s", server.neterr);
-        return;
+    while(max--) {
+        // accept 本地客户端连接
+        cfd = anetUnixAccept(server.neterr, fd);
+        if (cfd == ANET_ERR) {
+            if (errno != EWOULDBLOCK)
+                redisLog(REDIS_WARNING,
+                    "Accepting client connection: %s", server.neterr);
+            return;
+        }
+        redisLog(REDIS_VERBOSE,"Accepted connection to %s", server.unixsocket);
+        // 为本地客户端创建客户端状态
+        acceptCommonHandler(cfd,REDIS_UNIX_SOCKET);
     }
-    redisLog(REDIS_VERBOSE,"Accepted connection to %s", server.unixsocket);
-    // 为本地客户端创建客户端状态
-    acceptCommonHandler(cfd,REDIS_UNIX_SOCKET);
 }
 
 /*
@@ -989,6 +999,7 @@ void freeClient(redisClient *c) {
     zfree(c->argv);
     // 清除事务状态信息
     freeClientMultiState(c);
+    sdsfree(c->peerid);
     // 释放客户端 redisClient 结构本身
     zfree(c);
 }
@@ -1257,7 +1268,7 @@ int processInlineBuffer(redisClient *c) {
  * multi bulk requests idempotent. */
 static void setProtocolError(redisClient *c, int pos) {
     if (server.verbosity >= REDIS_VERBOSE) {
-        sds client = getClientInfoString(c);
+        sds client = catClientInfoString(sdsempty(),c);
         redisLog(REDIS_VERBOSE,
             "Protocol error from client: %s", client);
         sdsfree(client);
@@ -1353,8 +1364,10 @@ int processMultibulkBuffer(redisClient *c) {
             newline = strchr(c->querybuf+pos,'\r');
             if (newline == NULL) {
                 if (sdslen(c->querybuf) > REDIS_INLINE_MAX_SIZE) {
-                    addReplyError(c,"Protocol error: too big bulk count string");
+                    addReplyError(c,
+                        "Protocol error: too big bulk count string");
                     setProtocolError(c,0);
+                    return REDIS_ERR;
                 }
                 break;
             }
@@ -1467,6 +1480,10 @@ void processInputBuffer(redisClient *c) {
     // 需要等待下次读事件的就绪
     while(sdslen(c->querybuf)) {
 
+        /* Return if clients are paused. */
+        // 如果客户端正处于暂停状态，那么直接返回
+        if (!(c->flags & REDIS_SLAVE) && clientsArePaused()) return;
+
         /* Immediately abort if the client is in the middle of something. */
         // REDIS_BLOCKED 状态表示客户端正在被阻塞
         if (c->flags & REDIS_BLOCKED) return;
@@ -1588,7 +1605,7 @@ void readQueryFromClient(aeEventLoop *el, int fd, void *privdata, int mask) {
     // 查询缓冲区长度超出服务器最大缓冲区长度
     // 清空缓冲区并释放客户端
     if (sdslen(c->querybuf) > server.client_max_querybuf_len) {
-        sds ci = getClientInfoString(c), bytes = sdsempty();
+        sds ci = catClientInfoString(sdsempty(),c), bytes = sdsempty();
 
         bytes = sdscatrepr(bytes,c->querybuf,64);
         redisLog(REDIS_WARNING,"Closing client that reached max query buffer length: %s (qbuf initial bytes: %s)", ci, bytes);
@@ -1623,7 +1640,7 @@ void getClientsMaxBuffers(unsigned long *longest_output_list,
     *biggest_input_buffer = bib;
 }
 
-/* This is a helper function for getClientPeerId().
+/* This is a helper function for genClientPeerId().
  * It writes the specified ip/port to "peerid" as a null termiated string
  * in the form ip:port if ip does not contain ":" itself, otherwise
  * [ip]:port format is used (for IPv6 addresses basically). */
@@ -1647,7 +1664,7 @@ void formatPeerId(char *peerid, size_t peerid_len, char *ip, int port) {
  * On failure the function still populates 'peerid' with the "?:0" string
  * in case you want to relax error checking or need to display something
  * anyway (see anetPeerToString implementation for more info). */
-int getClientPeerId(redisClient *client, char *peerid, size_t peerid_len) {
+int genClientPeerId(redisClient *client, char *peerid, size_t peerid_len) {
     char ip[REDIS_IP_STR_LEN];
     int port;
 
@@ -1663,13 +1680,26 @@ int getClientPeerId(redisClient *client, char *peerid, size_t peerid_len) {
     }
 }
 
-/* Turn a Redis client into an sds string representing its state. */
-// 将给定客户端的信息保存到一个 SDS 中
-sds getClientInfoString(redisClient *client) {
-    char peerid[REDIS_PEER_ID_LEN], flags[16], events[3], *p;
+/* This function returns the client peer id, by creating and caching it
+ * if client->perrid is NULL, otherwise returning the cached value.
+ * The Peer ID never changes during the life of the client, however it
+ * is expensive to compute. */
+char *getClientPeerId(redisClient *c) {
+    char peerid[REDIS_PEER_ID_LEN];
+
+    if (c->peerid == NULL) {
+        genClientPeerId(c,peerid,sizeof(peerid));
+        c->peerid = sdsnew(peerid);
+    }
+    return c->peerid;
+}
+
+/* Concatenate a string representing the state of a client in an human
+ * readable format, into the sds string 's'. */
+sds catClientInfoString(sds s, redisClient *client) {
+    char flags[16], events[3], *p;
     int emask;
 
-    getClientPeerId(client,peerid,sizeof(peerid));
     p = flags;
     if (client->flags & REDIS_SLAVE) {
         if (client->flags & REDIS_MONITOR)
@@ -1694,23 +1724,23 @@ sds getClientInfoString(redisClient *client) {
     if (emask & AE_READABLE) *p++ = 'r';
     if (emask & AE_WRITABLE) *p++ = 'w';
     *p = '\0';
-    return sdscatprintf(sdsempty(),
-        "addr=%s fd=%d name=%s age=%ld idle=%ld flags=%s db=%d sub=%d psub=%d multi=%d qbuf=%lu qbuf-free=%lu obl=%lu oll=%lu omem=%lu events=%s cmd=%s",
-        peerid,
+    return sdscatfmt(s,
+        "addr=%s fd=%i name=%s age=%I idle=%I flags=%s db=%i sub=%i psub=%i multi=%i qbuf=%U qbuf-free=%U obl=%U oll=%U omem=%U events=%s cmd=%s",
+        getClientPeerId(client),
         client->fd,
         client->name ? (char*)client->name->ptr : "",
-        (long)(server.unixtime - client->ctime),
-        (long)(server.unixtime - client->lastinteraction),
+        (long long)(server.unixtime - client->ctime),
+        (long long)(server.unixtime - client->lastinteraction),
         flags,
         client->db->id,
         (int) dictSize(client->pubsub_channels),
         (int) listLength(client->pubsub_patterns),
         (client->flags & REDIS_MULTI) ? client->mstate.count : -1,
-        (unsigned long) sdslen(client->querybuf),
-        (unsigned long) sdsavail(client->querybuf),
-        (unsigned long) client->bufpos,
-        (unsigned long) listLength(client->reply),
-        getClientOutputBufferMemoryUsage(client),
+        (unsigned long long) sdslen(client->querybuf),
+        (unsigned long long) sdsavail(client->querybuf),
+        (unsigned long long) client->bufpos,
+        (unsigned long long) listLength(client->reply),
+        (unsigned long long) getClientOutputBufferMemoryUsage(client),
         events,
         client->lastcmd ? client->lastcmd->name : "NULL");
 }
@@ -1724,14 +1754,11 @@ sds getAllClientsInfoString(void) {
     redisClient *client;
     sds o = sdsempty();
 
+    o = sdsMakeRoomFor(o,200*listLength(server.clients));
     listRewind(server.clients,&li);
     while ((ln = listNext(&li)) != NULL) {
-        sds cs;
-
         client = listNodeValue(ln);
-        cs = getClientInfoString(client);
-        o = sdscatsds(o,cs);
-        sdsfree(cs);
+        o = catClientInfoString(o,client);
         o = sdscatlen(o,"\n",1);
     }
     return o;
@@ -1757,11 +1784,10 @@ void clientCommand(redisClient *c) {
         // 遍历客户端链表，并杀死指定地址的客户端
         listRewind(server.clients,&li);
         while ((ln = listNext(&li)) != NULL) {
-            char peerid[REDIS_PEER_ID_LEN];
+            char *peerid;
 
             client = listNodeValue(ln);
-            if (getClientPeerId(client,peerid,sizeof(peerid)) == REDIS_ERR)
-                continue;
+            peerid = getClientPeerId(client);
             if (strcmp(peerid,c->argv[2]->ptr) == 0) {
                 addReply(c,shared.ok);
                 if (c == client) {
@@ -1811,6 +1837,13 @@ void clientCommand(redisClient *c) {
             addReplyBulk(c,c->name);
         else
             addReply(c,shared.nullbulk);
+    } else if (!strcasecmp(c->argv[1]->ptr,"pause") && c->argc == 3) {
+        long long duration;
+
+        if (getTimeoutFromObjectOrReply(c,c->argv[2],&duration,UNIT_MILLISECONDS)
+                                        != REDIS_OK) return;
+        pauseClients(duration);
+        addReply(c,shared.ok);
     } else {
         addReplyError(c, "Syntax error, try CLIENT (LIST | KILL ip:port | GETNAME | SETNAME connection-name)");
     }
@@ -2032,7 +2065,7 @@ void asyncCloseClientOnOutputBufferLimitReached(redisClient *c) {
 
     // 检查限制
     if (checkClientOutputBufferLimits(c)) {
-        sds client = getClientInfoString(c);
+        sds client = catClientInfoString(sdsempty(),c);
 
         // 异步关闭
         freeClientAsync(c);
@@ -2063,3 +2096,72 @@ void flushSlavesOutputBuffers(void) {
         }
     }
 }
+
+/* Pause clients up to the specified unixtime (in ms). While clients
+ * are paused no command is processed from clients, so the data set can't
+ * change during that time.
+ *
+ * However while this function pauses normal and Pub/Sub clients, slaves are
+ * still served, so this function can be used on server upgrades where it is
+ * required that slaves process the latest bytes from the replication stream
+ * before being turned to masters.
+ *
+ * This function is also internally used by Redis Cluster for the manual
+ * failover procedure implemented by CLUSTER FAILOVER.
+ *
+ * The function always succeed, even if there is already a pause in progress.
+ * In such a case, the pause is extended if the duration is more than the
+ * time left for the previous duration. However if the duration is smaller
+ * than the time left for the previous pause, no change is made to the
+ * left duration. */
+void pauseClients(mstime_t end) {
+    if (!server.clients_paused || end > server.clients_pause_end_time)
+        server.clients_pause_end_time = end;
+    server.clients_paused = 1;
+}
+
+/* Return non-zero if clients are currently paused. As a side effect the
+ * function checks if the pause time was reached and clear it. */
+int clientsArePaused(void) {
+    if (server.clients_paused && server.clients_pause_end_time < server.mstime) {
+        listNode *ln;
+        listIter li;
+        redisClient *c;
+
+        server.clients_paused = 0;
+
+        /* Put all the clients in the unblocked clients queue in order to
+         * force the re-processing of the input buffer if any. */
+        listRewind(server.clients,&li);
+        while ((ln = listNext(&li)) != NULL) {
+            c = listNodeValue(ln);
+
+            if (c->flags & REDIS_SLAVE) continue;
+            listAddNodeTail(server.unblocked_clients,c);
+        }
+    }
+    return server.clients_paused;
+}
+
+/* This function is called by Redis in order to process a few events from
+ * time to time while blocked into some not interruptible operation.
+ * This allows to reply to clients with the -LOADING error while loading the
+ * data set at startup or after a full resynchronization with the master
+ * and so forth.
+ *
+ * It calls the event loop in order to process a few events. Specifically we
+ * try to call the event loop for times as long as we receive acknowledge that
+ * some event was processed, in order to go forward with the accept, read,
+ * write, close sequence needed to serve a client.
+ *
+ * The function returns the total number of events processed. */
+int processEventsWhileBlocked(void) {
+    int iterations = 4; /* See the function top-comment. */
+    int count = 0;
+    while (iterations--) {
+        int events = aeProcessEvents(server.el, AE_FILE_EVENTS|AE_DONT_WAIT);
+        if (!events) break;
+        count += events;
+    }
+    return count;
+}
diff --git a/src/object.c b/src/object.c
index 6c39e1c38..b7f70fa10 100644
--- a/src/object.c
+++ b/src/object.c
@@ -45,8 +45,7 @@ robj *createObject(int type, void *ptr) {
     o->refcount = 1;
 
     /* Set the LRU to the current lruclock (minutes resolution). */
-    o->lru = server.lruclock;
-
+    o->lru = LRU_CLOCK();
     return o;
 }
 
@@ -72,7 +71,7 @@ robj *createEmbeddedStringObject(char *ptr, size_t len) {
     o->encoding = REDIS_ENCODING_EMBSTR;
     o->ptr = sh+1;
     o->refcount = 1;
-    o->lru = server.lruclock;
+    o->lru = LRU_CLOCK();
 
     sh->len = len;
     sh->free = 0;
@@ -87,11 +86,11 @@ robj *createEmbeddedStringObject(char *ptr, size_t len) {
 
 /* Create a string object with EMBSTR encoding if it is smaller than
  * REIDS_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
- * used. */
-// 如果输入字符串的长度小于 REDIS_ENCODING_EMBSTR_SIZE_LIMIT
-// 那么创建一个 EMBSTR 编码的字符对象
-// 否则创建一个 RAW 编码的字符对象
-#define REDIS_ENCODING_EMBSTR_SIZE_LIMIT 32
+ * used.
+ *
+ * The current limit of 39 is chosen so that the biggest string object
+ * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
+#define REDIS_ENCODING_EMBSTR_SIZE_LIMIT 39
 robj *createStringObject(char *ptr, size_t len) {
     if (len <= REDIS_ENCODING_EMBSTR_SIZE_LIMIT)
         return createEmbeddedStringObject(ptr,len);
@@ -527,79 +526,76 @@ robj *tryObjectEncoding(robj *o) {
     sds s = o->ptr;
     size_t len;
 
-    // 对象已经编码过，直接返回
-    if (o->encoding == REDIS_ENCODING_INT)
-        return o; /* Already encoded */
+    /* Make sure this is a string object, the only type we encode
+     * in this function. Other types use encoded memory efficient
+     * representations but are handled by the commands implementing
+     * the type. */
+    redisAssertWithInfo(NULL,o,o->type == REDIS_STRING);
+
+    /* We try some specialized encoding only for objects that are
+     * RAW or EMBSTR encoded, in other words objects that are still
+     * in represented by an actually array of chars. */
+    if (!sdsEncodedObject(o)) return o;
 
     /* It's not safe to encode shared objects: shared objects can be shared
-     * everywhere in the "object space" of Redis. Encoded objects can only
-     * appear as "values" (and not, for instance, as keys) */
-    // 不能编码共享对象
+     * everywhere in the "object space" of Redis and may end in places where
+     * they are not handled. We handle them only as values in the keyspace. */
      if (o->refcount > 1) return o;
 
-    /* Currently we try to encode only strings */
-    // 只有字符串对象可以编码
-    redisAssertWithInfo(NULL,o,o->type == REDIS_STRING);
-
     /* Check if we can represent this string as a long integer.
      * Note that we are sure that a string larger than 21 chars is not
-     * representable as a 64 bit integer. */
+     * representable as a 32 nor 64 bit integer. */
     len = sdslen(s);
-    if (len > 21 || !string2l(s,len,&value)) {
-        /* Integer encoding not possible. Check if we can use EMBSTR. */
-        if (sdslen(s) <= REDIS_ENCODING_EMBSTR_SIZE_LIMIT) {
-            robj *emb = createEmbeddedStringObject(s,sdslen(s));
+    if (len <= 21 && string2l(s,len,&value)) {
+        /* This object is encodable as a long. Try to use a shared object.
+         * Note that we avoid using shared integers when maxmemory is used
+         * because every object needs to have a private LRU field for the LRU
+         * algorithm to work well. */
+        if (server.maxmemory == 0 &&
+            value >= 0 &&
+            value < REDIS_SHARED_INTEGERS)
+        {
             decrRefCount(o);
-            return emb;
+            incrRefCount(shared.integers[value]);
+            return shared.integers[value];
         } else {
-            /* We can't encode the object...
-             *
-             * Do the last try, and at least optimize the SDS string inside
-             * the string object to require little space, in case there
-             * is more than 10% of free space at the end of the SDS string.
-             *
-             * We do that only for relatively large strings as this branch
-             * is only entered if the length of the string is greater than
-             * REDIS_ENCODING_EMBSTR_SIZE_LIMIT. */
-            if (o->encoding == REDIS_ENCODING_RAW &&
-                sdsavail(s) > len/10)
-            {
-                o->ptr = sdsRemoveFreeSpace(o->ptr);
-            }
-            /* Return the original object. */
+            if (o->encoding == REDIS_ENCODING_RAW) sdsfree(o->ptr);
+            o->encoding = REDIS_ENCODING_INT;
+            o->ptr = (void*) value;
             return o;
         }
     }
 
-    /* Ok, this object can be encoded...
-     *
-     * 好的，这个对象可以被编码
-     *
-     * Can I use a shared object? Only if the object is inside a given range
-     *
-     * 先检查对象能否表示为共享对象。
+    /* If the string is small and is still RAW encoded,
+     * try the EMBSTR encoding which is more efficient.
+     * In this representation the object and the SDS string are allocated
+     * in the same chunk of memory to save space and cache misses. */
+    if (len <= REDIS_ENCODING_EMBSTR_SIZE_LIMIT) {
+        robj *emb;
+
+        if (o->encoding == REDIS_ENCODING_EMBSTR) return o;
+        emb = createEmbeddedStringObject(s,sdslen(s));
+        decrRefCount(o);
+        return emb;
+    }
+
+    /* We can't encode the object...
      *
-     * Note that we also avoid using shared integers when maxmemory is used
-     * because every object needs to have a private LRU field for the LRU
-     * algorithm to work well. 
+     * Do the last try, and at least optimize the SDS string inside
+     * the string object to require little space, in case there
+     * is more than 10% of free space at the end of the SDS string.
      *
-     * 但 maxmemory 启用时，Redis 不使用共享对象来表示整数，
-     * 因为 maxmemory 要求每个对象都需要有自己的 LRU 域，这样 LRU 算法才能运行。
-     */
-    if (server.maxmemory == 0 && value >= 0 && value < REDIS_SHARED_INTEGERS) {
-        // 释放对象
-        decrRefCount(o);
-        // 增加共享对象的引用计数
-        incrRefCount(shared.integers[value]);
-        // 返回共享对象
-        return shared.integers[value];
-    } else {
-        if (o->encoding == REDIS_ENCODING_RAW) sdsfree(o->ptr);
-        o->encoding = REDIS_ENCODING_INT;
-        o->ptr = (void*) value;
-        // 返回新对象
-        return o;
+     * We do that only for relatively large strings as this branch
+     * is only entered if the length of the string is greater than
+     * REDIS_ENCODING_EMBSTR_SIZE_LIMIT. */
+    if (o->encoding == REDIS_ENCODING_RAW &&
+        sdsavail(s) > len/10)
+    {
+        o->ptr = sdsRemoveFreeSpace(o->ptr);
     }
+
+    /* Return the original object. */
+    return o;
 }
 
 /* Get a decoded version of an encoded object (returned as a new object).
@@ -1000,28 +996,22 @@ char *strEncoding(int encoding) {
     }
 }
 
-/* Given an object returns the min number of seconds the object was never
- * requested, using an approximated LRU algorithm. 
- *
- * 以秒为单位返回对象的空闲时长。
- *
- * 时长的计算使用的是近似 LRU 算法。
- */
-unsigned long estimateObjectIdleTime(robj *o) {
-
-    if (server.lruclock >= o->lru) {
-        return (server.lruclock - o->lru) * REDIS_LRU_CLOCK_RESOLUTION;
-
+/* Given an object returns the min number of milliseconds the object was never
+ * requested, using an approximated LRU algorithm. */
+unsigned long long estimateObjectIdleTime(robj *o) {
+    unsigned long long lruclock = LRU_CLOCK();
+    if (lruclock >= o->lru) {
+        return (lruclock - o->lru) * REDIS_LRU_CLOCK_RESOLUTION;
     } else {
-        return ((REDIS_LRU_CLOCK_MAX - o->lru) + server.lruclock) *
+        return (lruclock + (REDIS_LRU_CLOCK_MAX - o->lru)) *
                     REDIS_LRU_CLOCK_RESOLUTION;
     }
 }
 
-/* This is a helper function for the DEBUG command. We need to lookup keys
+/* This is a helper function for the OBJECT command. We need to lookup keys
  * without any modification of LRU or other parameters.
  *
- * DEBUG 命令的辅助函数，用于在不修改 LRU 时间的情况下，尝试获取 key 对象
+ * OBJECT 命令的辅助函数，用于在不修改 LRU 时间的情况下，尝试获取 key 对象
  */
 robj *objectCommandLookup(redisClient *c, robj *key) {
     dictEntry *de;
@@ -1063,8 +1053,7 @@ void objectCommand(redisClient *c) {
     } else if (!strcasecmp(c->argv[1]->ptr,"idletime") && c->argc == 3) {
         if ((o = objectCommandLookupOrReply(c,c->argv[2],shared.nullbulk))
                 == NULL) return;
-        addReplyLongLong(c,estimateObjectIdleTime(o));
-
+        addReplyLongLong(c,estimateObjectIdleTime(o)/1000);
     } else {
         addReplyError(c,"Syntax error. Try OBJECT (refcount|encoding|idletime)");
     }
diff --git a/src/pubsub.c b/src/pubsub.c
index ecce79bc7..2256df921 100644
--- a/src/pubsub.c
+++ b/src/pubsub.c
@@ -61,7 +61,7 @@ int listMatchPubsubPattern(void *a, void *b) {
  * 订阅成功返回 1 ，如果客户端已经订阅了该频道，那么返回 0 。
  */
 int pubsubSubscribeChannel(redisClient *c, robj *channel) {
-    struct dictEntry *de;
+    dictEntry *de;
     list *clients = NULL;
     int retval = 0;
 
@@ -125,7 +125,7 @@ int pubsubSubscribeChannel(redisClient *c, robj *channel) {
  * 如果取消成功返回 1 ，如果因为客户端未订阅频道，而造成取消失败，返回 0 。
  */
 int pubsubUnsubscribeChannel(redisClient *c, robj *channel, int notify) {
-    struct dictEntry *de;
+    dictEntry *de;
     list *clients;
     listNode *ln;
     int retval = 0;
@@ -390,7 +390,7 @@ int pubsubUnsubscribeAllPatterns(redisClient *c, int notify) {
  */
 int pubsubPublishMessage(robj *channel, robj *message) {
     int receivers = 0;
-    struct dictEntry *de;
+    dictEntry *de;
     listNode *ln;
     listIter li;
 
@@ -512,9 +512,10 @@ void punsubscribeCommand(redisClient *c) {
 void publishCommand(redisClient *c) {
 
     int receivers = pubsubPublishMessage(c->argv[1],c->argv[2]);
-
-    if (server.cluster_enabled) clusterPropagatePublish(c->argv[1],c->argv[2]);
-    forceCommandPropagation(c,REDIS_PROPAGATE_REPL);
+    if (server.cluster_enabled)
+        clusterPropagatePublish(c->argv[1],c->argv[2]);
+    else
+        forceCommandPropagation(c,REDIS_PROPAGATE_REPL);
     addReplyLongLong(c,receivers);
 }
 
diff --git a/src/rdb.c b/src/rdb.c
index 5df96912f..d1c90471b 100644
--- a/src/rdb.c
+++ b/src/rdb.c
@@ -615,7 +615,7 @@ int rdbSaveDoubleValue(rio *rdb, double val) {
         double min = -4503599627370495; /* (2^52)-1 */
         double max = 4503599627370496; /* -(2^52) */
         if (val > min && val < max && val == ((double)((long long)val)))
-            ll2string((char*)buf+1,sizeof(buf),(long long)val);
+            ll2string((char*)buf+1,sizeof(buf)-1,(long long)val);
         else
 #endif
             snprintf((char*)buf+1,sizeof(buf)-1,"%.17g",val);
@@ -632,7 +632,7 @@ int rdbSaveDoubleValue(rio *rdb, double val) {
  * 载入字符串表示的双精度浮点数
  */
 int rdbLoadDoubleValue(rio *rdb, double *val) {
-    char buf[128];
+    char buf[256];
     unsigned char len;
 
     // 载入字符串长度
@@ -1020,13 +1020,11 @@ int rdbSave(char *filename) {
     memrev64ifbe(&cksum);
     rioWrite(&rdb,&cksum,8);
 
-    /* Make sure data will not remain on the OS's output buffers 
-     *
-     * 冲洗缓存，确保数据已写入磁盘
-     */
-    fflush(fp);
-    fsync(fileno(fp));
-    fclose(fp);
+    /* Make sure data will not remain on the OS's output buffers */
+    // 冲洗缓存，确保数据已写入磁盘
+    if (fflush(fp) == EOF) goto werr;
+    if (fsync(fileno(fp)) == -1) goto werr;
+    if (fclose(fp) == EOF) goto werr;
 
     /* Use RENAME to make sure the DB file is changed atomically only
      * if the generate DB file is ok. 
@@ -1599,10 +1597,14 @@ void rdbLoadProgressCallback(rio *r, const void *buf, size_t len) {
     if (server.loading_process_events_interval_bytes &&
         (r->processed_bytes + len)/server.loading_process_events_interval_bytes > r->processed_bytes/server.loading_process_events_interval_bytes)
     {
+        /* The DB can take some non trivial amount of time to load. Update
+         * our cached time since it is used to create and update the last
+         * interaction time with clients and for other important things. */
+        updateCachedTime();
         if (server.masterhost && server.repl_state == REDIS_REPL_TRANSFER)
             replicationSendNewlineToMaster();
         loadingProgress(r->processed_bytes);
-        aeProcessEvents(server.el, AE_FILE_EVENTS|AE_DONT_WAIT);
+        processEventsWhileBlocked();
     }
 }
 
diff --git a/src/redis-cli.c b/src/redis-cli.c
index fa9dc5476..58c15da42 100644
--- a/src/redis-cli.c
+++ b/src/redis-cli.c
@@ -58,7 +58,7 @@
 #define OUTPUT_RAW 1
 #define OUTPUT_CSV 2
 #define REDIS_CLI_KEEPALIVE_INTERVAL 15 /* seconds */
-#define REDIS_DEFAULT_PIPE_TIMEOUT 30 /* seconds */
+#define REDIS_CLI_DEFAULT_PIPE_TIMEOUT 30 /* seconds */
 
 static redisContext *context;
 static struct config {
@@ -82,6 +82,8 @@ static struct config {
     int getrdb_mode;
     int stat_mode;
     int scan_mode;
+    int intrinsic_latency_mode;
+    int intrinsic_latency_duration;
     char *pattern;
     char *rdb_filename;
     int bigkeys;
@@ -94,6 +96,7 @@ static struct config {
 } config;
 
 static void usage();
+static void slaveMode(void);
 char *redisGitSHA1(void);
 char *redisGitDirty(void);
 
@@ -101,14 +104,18 @@ char *redisGitDirty(void);
  * Utility functions
  *--------------------------------------------------------------------------- */
 
-static long long mstime(void) {
+static long long ustime(void) {
     struct timeval tv;
-    long long mst;
+    long long ust;
 
     gettimeofday(&tv, NULL);
-    mst = ((long long)tv.tv_sec)*1000;
-    mst += tv.tv_usec/1000;
-    return mst;
+    ust = ((long long)tv.tv_sec)*1000000;
+    ust += tv.tv_usec;
+    return ust;
+}
+
+static long long mstime(void) {
+    return ustime()/1000;
 }
 
 static void cliRefreshPrompt(void) {
@@ -600,6 +607,8 @@ static int cliSendCommand(int argc, char **argv, int repeat) {
     if (!strcasecmp(command,"monitor")) config.monitor_mode = 1;
     if (!strcasecmp(command,"subscribe") ||
         !strcasecmp(command,"psubscribe")) config.pubsub_mode = 1;
+    if (!strcasecmp(command,"sync") ||
+        !strcasecmp(command,"psync")) config.slave_mode = 1;
 
     /* Setup argument length */
     argvlen = malloc(argc*sizeof(size_t));
@@ -621,6 +630,13 @@ static int cliSendCommand(int argc, char **argv, int repeat) {
             }
         }
 
+        if (config.slave_mode) {
+            printf("Entering slave output mode...  (press Ctrl-C to quit)\n");
+            slaveMode();
+            config.slave_mode = 0;
+            return REDIS_ERR;  /* Error = slaveMode lost connection to master */
+        }
+
         if (cliReadReply(output_raw) != REDIS_OK) {
             free(argvlen);
             return REDIS_ERR;
@@ -716,8 +732,11 @@ static int parseOptions(int argc, char **argv) {
             config.stat_mode = 1;
         } else if (!strcmp(argv[i],"--scan")) {
             config.scan_mode = 1;
-        } else if (!strcmp(argv[i],"--pattern")) {
+        } else if (!strcmp(argv[i],"--pattern") && !lastarg) {
             config.pattern = argv[++i];
+        } else if (!strcmp(argv[i],"--intrinsic-latency") && !lastarg) {
+            config.intrinsic_latency_mode = 1;
+            config.intrinsic_latency_duration = atoi(argv[++i]);
         } else if (!strcmp(argv[i],"--rdb") && !lastarg) {
             config.getrdb_mode = 1;
             config.rdb_filename = argv[++i];
@@ -803,6 +822,8 @@ static void usage() {
 "  --bigkeys          Sample Redis keys looking for big keys.\n"
 "  --scan             List all keys using the SCAN command.\n"
 "  --pattern <pat>    Useful with --scan to specify a SCAN pattern.\n"
+"  --intrinsic-latency <sec> Run a test to measure intrinsic system latency.\n"
+"                     The test will run for the specified amount of seconds.\n"
 "  --eval <file>      Send an EVAL command using the Lua script at <file>.\n"
 "  --help             Output this help and exit.\n"
 "  --version          Output version and exit.\n"
@@ -820,7 +841,7 @@ static void usage() {
 "When no command is given, redis-cli starts in interactive mode.\n"
 "Type \"help\" in interactive mode for information on available commands.\n"
 "\n",
-        version, REDIS_DEFAULT_PIPE_TIMEOUT);
+        version, REDIS_CLI_DEFAULT_PIPE_TIMEOUT);
     sdsfree(version);
     exit(1);
 }
@@ -845,6 +866,7 @@ static void repl() {
     sds *argv;
 
     config.interactive = 1;
+    linenoiseSetMultiLine(1);
     linenoiseSetCompletionCallback(completionCallback);
 
     /* Only use history when stdin is a tty. */
@@ -940,6 +962,10 @@ static int noninteractive(int argc, char **argv) {
     return retval;
 }
 
+/*------------------------------------------------------------------------------
+ * Eval mode
+ *--------------------------------------------------------------------------- */
+
 static int evalMode(int argc, char **argv) {
     sds script = sdsempty();
     FILE *fp;
@@ -978,6 +1004,10 @@ static int evalMode(int argc, char **argv) {
     return cliSendCommand(argc+3-got_comma, argv2, config.repeat);
 }
 
+/*------------------------------------------------------------------------------
+ * Latency and latency history modes
+ *--------------------------------------------------------------------------- */
+
 #define LATENCY_SAMPLE_RATE 10 /* milliseconds. */
 #define LATENCY_HISTORY_DEFAULT_INTERVAL 15000 /* milliseconds. */
 static void latencyMode(void) {
@@ -1022,6 +1052,10 @@ static void latencyMode(void) {
     }
 }
 
+/*------------------------------------------------------------------------------
+ * Slave mode
+ *--------------------------------------------------------------------------- */
+
 /* Sends SYNC and reads the number of bytes in the payload. Used both by
  * slaveMode() and getRDB(). */
 unsigned long long sendSync(int fd) {
@@ -1061,6 +1095,7 @@ static void slaveMode(void) {
     int fd = context->fd;
     unsigned long long payload = sendSync(fd);
     char buf[1024];
+    int original_output = config.output;
 
     fprintf(stderr,"SYNC with master, discarding %llu "
                    "bytes of bulk transfer...\n", payload);
@@ -1081,8 +1116,13 @@ static void slaveMode(void) {
     /* Now we can use hiredis to read the incoming protocol. */
     config.output = OUTPUT_CSV;
     while (cliReadReply(0) == REDIS_OK);
+    config.output = original_output;
 }
 
+/*------------------------------------------------------------------------------
+ * RDB transfer mode
+ *--------------------------------------------------------------------------- */
+
 /* This function implements --rdb, so it uses the replication protocol in order
  * to fetch the RDB file from a remote server. */
 static void getRDB(void) {
@@ -1128,6 +1168,10 @@ static void getRDB(void) {
     exit(0);
 }
 
+/*------------------------------------------------------------------------------
+ * Bulk import (pipe) mode
+ *--------------------------------------------------------------------------- */
+
 static void pipeMode(void) {
     int fd = context->fd;
     long long errors = 0, replies = 0, obuf_len = 0, obuf_pos = 0;
@@ -1279,92 +1323,287 @@ static void pipeMode(void) {
         exit(0);
 }
 
+/*------------------------------------------------------------------------------
+ * Find big keys
+ *--------------------------------------------------------------------------- */
+
 #define TYPE_STRING 0
 #define TYPE_LIST   1
 #define TYPE_SET    2
 #define TYPE_HASH   3
 #define TYPE_ZSET   4
+#define TYPE_NONE   5
 
-static void findBigKeys(void) {
-    unsigned long long biggest[5] = {0,0,0,0,0};
-    unsigned long long samples = 0;
-    redisReply *reply1, *reply2, *reply3 = NULL;
-    char *sizecmd, *typename[] = {"string","list","set","hash","zset"};
-    char *typeunit[] = {"bytes","items","members","fields","members"};
-    int type;
+static redisReply *sendScan(unsigned long long *it) {
+    redisReply *reply = redisCommand(context, "SCAN %llu", *it);
 
-    printf("\n# Press ctrl+c when you have had enough of it... :)\n");
-    printf("# You can use -i 0.1 to sleep 0.1 sec every 100 sampled keys\n");
-    printf("# in order to reduce server load (usually not needed).\n\n");
-    while(1) {
-        /* Sample with RANDOMKEY */
-        reply1 = redisCommand(context,"RANDOMKEY");
-        if (reply1 == NULL) {
-            fprintf(stderr,"\nI/O error\n");
-            exit(1);
-        } else if (reply1->type == REDIS_REPLY_ERROR) {
-            fprintf(stderr, "RANDOMKEY error: %s\n",
-                reply1->str);
+    /* Handle any error conditions */
+    if(reply == NULL) {
+        fprintf(stderr, "\nI/O error\n");
+        exit(1);
+    } else if(reply->type == REDIS_REPLY_ERROR) {
+        fprintf(stderr, "SCAN error: %s\n", reply->str);
+        exit(1);
+    } else if(reply->type != REDIS_REPLY_ARRAY) {
+        fprintf(stderr, "Non ARRAY response from SCAN!\n");
+        exit(1);
+    } else if(reply->elements != 2) {
+        fprintf(stderr, "Invalid element count from SCAN!\n");
+        exit(1);
+    }
+
+    /* Validate our types are correct */
+    assert(reply->element[0]->type == REDIS_REPLY_STRING);
+    assert(reply->element[1]->type == REDIS_REPLY_ARRAY);
+    
+    /* Update iterator */
+    *it = atoi(reply->element[0]->str);
+
+    return reply;
+}
+
+static int getDbSize(void) {
+    redisReply *reply;
+    int size;
+
+    reply = redisCommand(context, "DBSIZE");
+    
+    if(reply == NULL || reply->type != REDIS_REPLY_INTEGER) {
+        fprintf(stderr, "Couldn't determine DBSIZE!\n");
+        exit(1);
+    }
+
+    /* Grab the number of keys and free our reply */
+    size = reply->integer;
+    freeReplyObject(reply);
+
+    return size;
+}
+
+static int toIntType(char *key, char *type) {
+    if(!strcmp(type, "string")) {
+        return TYPE_STRING;
+    } else if(!strcmp(type, "list")) {
+        return TYPE_LIST;
+    } else if(!strcmp(type, "set")) {
+        return TYPE_SET;
+    } else if(!strcmp(type, "hash")) {
+        return TYPE_HASH;
+    } else if(!strcmp(type, "zset")) {
+        return TYPE_ZSET;
+    } else if(!strcmp(type, "none")) {
+        return TYPE_NONE;
+    } else {
+        fprintf(stderr, "Unknown type '%s' for key '%s'\n", type, key);
+        exit(1);
+    }
+}
+
+static void getKeyTypes(redisReply *keys, int *types) {
+    redisReply *reply;
+    int i;
+
+    /* Pipeline TYPE commands */
+    for(i=0;i<keys->elements;i++) {
+        redisAppendCommand(context, "TYPE %s", keys->element[i]->str);
+    }
+
+    /* Retrieve types */
+    for(i=0;i<keys->elements;i++) {
+        if(redisGetReply(context, (void**)&reply)!=REDIS_OK) {
+            fprintf(stderr, "Error getting type for key '%s' (%d: %s)\n",
+                keys->element[i]->str, context->err, context->errstr);
             exit(1);
-        } else if (reply1->type == REDIS_REPLY_NIL) {
-            fprintf(stderr, "It looks like the database is empty!\n");
+        } else if(reply->type != REDIS_REPLY_STATUS) {
+            fprintf(stderr, "Invalid reply type (%d) for TYPE on key '%s'!\n",
+                reply->type, keys->element[i]->str);
             exit(1);
         }
 
-        /* Get the key type */
-        reply2 = redisCommand(context,"TYPE %s",reply1->str);
-        assert(reply2 && reply2->type == REDIS_REPLY_STATUS);
-        samples++;
-
-        /* Get the key "size" */
-        if (!strcmp(reply2->str,"string")) {
-            sizecmd = "STRLEN";
-            type = TYPE_STRING;
-        } else if (!strcmp(reply2->str,"list")) {
-            sizecmd = "LLEN";
-            type = TYPE_LIST;
-        } else if (!strcmp(reply2->str,"set")) {
-            sizecmd = "SCARD";
-            type = TYPE_SET;
-        } else if (!strcmp(reply2->str,"hash")) {
-            sizecmd = "HLEN";
-            type = TYPE_HASH;
-        } else if (!strcmp(reply2->str,"zset")) {
-            sizecmd = "ZCARD";
-            type = TYPE_ZSET;
-        } else if (!strcmp(reply2->str,"none")) {
-            freeReplyObject(reply1);
-            freeReplyObject(reply2);
+        types[i] = toIntType(keys->element[i]->str, reply->str); 
+        freeReplyObject(reply);
+    }
+}
+
+static void getKeySizes(redisReply *keys, int *types, 
+                        unsigned long long *sizes) 
+{
+    redisReply *reply;
+    char *sizecmds[] = {"STRLEN","LLEN","SCARD","HLEN","ZCARD"};
+    int i;
+
+    /* Pipeline size commands */
+    for(i=0;i<keys->elements;i++) {
+        /* Skip keys that were deleted */
+        if(types[i]==TYPE_NONE) 
             continue;
+
+        redisAppendCommand(context, "%s %s", sizecmds[types[i]], 
+            keys->element[i]->str);
+    }
+
+    /* Retreive sizes */
+    for(i=0;i<keys->elements;i++) {
+        /* Skip keys that dissapeared between SCAN and TYPE */
+        if(types[i] == TYPE_NONE) {
+            sizes[i] = 0;
+            continue;
+        }
+
+        /* Retreive size */
+        if(redisGetReply(context, (void**)&reply)!=REDIS_OK) {
+            fprintf(stderr, "Error getting size for key '%s' (%d: %s)\n",
+                keys->element[i]->str, context->err, context->errstr);
+            exit(1);
+        } else if(reply->type != REDIS_REPLY_INTEGER) {
+            /* Theoretically the key could have been removed and
+             * added as a different type between TYPE and SIZE */
+            fprintf(stderr, 
+                "Warning:  %s on '%s' failed (may have changed type)\n",
+                 sizecmds[types[i]], keys->element[i]->str);
+            sizes[i] = 0;
         } else {
-            fprintf(stderr, "Unknown key type '%s' for key '%s'\n",
-                reply2->str, reply1->str);
+            sizes[i] = reply->integer;
+        }
+            
+        freeReplyObject(reply);
+    }
+}
+
+static void findBigKeys(void) {
+    unsigned long long biggest[5] = {0}, counts[5] = {0}, totalsize[5] = {0};
+    unsigned long long sampled = 0, total_keys, totlen=0, *sizes=NULL, it=0;
+    sds maxkeys[5] = {0};
+    char *typename[] = {"string","list","set","hash","zset"};
+    char *typeunit[] = {"bytes","items","members","fields","members"};
+    redisReply *reply, *keys;
+    int type, *types=NULL, arrsize=0, i;
+    double pct;
+
+    /* Total keys pre scanning */
+    total_keys = getDbSize();
+
+    /* Status message */
+    printf("\n# Scanning the entire keyspace to find biggest keys as well as\n");
+    printf("# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec\n");
+    printf("# per 100 SCAN commands (not usually needed).\n\n");
+
+    /* New up sds strings to keep track of overall biggest per type */
+    for(i=0;i<TYPE_NONE; i++) {
+        maxkeys[i] = sdsempty();
+        if(!maxkeys[i]) {
+            fprintf(stderr, "Failed to allocate memory for largest key names!");
             exit(1);
         }
+    }
+
+    /* SCAN loop */
+    do {
+        /* Calculate approximate percentage completion */
+        pct = 100 * (double)sampled/total_keys;
+
+        /* Grab some keys and point to the keys array */
+        reply = sendScan(&it);
+        keys  = reply->element[1];
 
-        reply3 = redisCommand(context,"%s %s", sizecmd, reply1->str);
-        if (reply3 && reply3->type == REDIS_REPLY_INTEGER) {
-            if (biggest[type] < reply3->integer) {
-                printf("Biggest %-6s found so far '%s' with %llu %s.\n",
-                    typename[type], reply1->str,
-                    (unsigned long long) reply3->integer,
-                    typeunit[type]);
-                biggest[type] = reply3->integer;
+        /* Reallocate our type and size array if we need to */
+        if(keys->elements > arrsize) {
+            types = zrealloc(types, sizeof(int)*keys->elements);
+            sizes = zrealloc(sizes, sizeof(unsigned long long)*keys->elements);
+
+            if(!types || !sizes) {
+                fprintf(stderr, "Failed to allocate storage for keys!\n");
+                exit(1);
             }
+
+            arrsize = keys->elements;
         }
 
-        if ((samples % 1000000) == 0)
-            printf("(%llu keys sampled)\n", samples);
+        /* Retreive types and then sizes */
+        getKeyTypes(keys, types);
+        getKeySizes(keys, types, sizes);
+        
+        /* Now update our stats */
+        for(i=0;i<keys->elements;i++) {
+            if((type = types[i]) == TYPE_NONE)
+                continue;
+            
+            totalsize[type] += sizes[i];
+            counts[type]++;
+            totlen += keys->element[i]->len;
+            sampled++;
+
+            if(biggest[type]<sizes[i]) {
+                printf(
+                   "[%05.2f%%] Biggest %-6s found so far '%s' with %llu %s\n",
+                   pct, typename[type], keys->element[i]->str, sizes[i],
+                   typeunit[type]);
+
+                /* Keep track of biggest key name for this type */
+                maxkeys[type] = sdscpy(maxkeys[type], keys->element[i]->str);
+                if(!maxkeys[type]) {
+                    fprintf(stderr, "Failed to allocate memory for key!\n");
+                    exit(1);
+                }
 
-        if ((samples % 100) == 0 && config.interval)
+                /* Keep track of the biggest size for this type */
+                biggest[type] = sizes[i];                
+            }
+
+            /* Update overall progress */
+            if(sampled % 1000000 == 0) {
+                printf("[%05.2f%%] Sampled %llu keys so far\n", pct, sampled);
+            }
+        }
+
+        /* Sleep if we've been directed to do so */
+        if(sampled && (sampled %100) == 0 && config.interval) {
             usleep(config.interval);
+        }
+        
+        freeReplyObject(reply);
+    } while(it != 0);
+
+    if(types) zfree(types);
+    if(sizes) zfree(sizes);
+
+    /* We're done */
+    printf("\n-------- summary -------\n\n");
+
+    printf("Sampled %llu keys in the keyspace!\n", sampled);
+    printf("Total key length in bytes is %llu (avg len %.2f)\n\n",
+       totlen, totlen ? (double)totlen/sampled : 0);
+
+    /* Output the biggest keys we found, for types we did find */
+    for(i=0;i<TYPE_NONE;i++) {
+        if(sdslen(maxkeys[i])>0) {
+            printf("Biggest %6s found '%s' has %llu %s\n", typename[i], maxkeys[i],
+               biggest[i], typeunit[i]);
+        }
+    }
+
+    printf("\n");
+
+    for(i=0;i<TYPE_NONE;i++) {
+        printf("%llu %ss with %llu %s (%05.2f%% of keys, avg size %.2f)\n",
+           counts[i], typename[i], totalsize[i], typeunit[i],
+           sampled ? 100 * (double)counts[i]/sampled : 0,
+           counts[i] ? (double)totalsize[i]/counts[i] : 0);
+    }
 
-        freeReplyObject(reply1);
-        freeReplyObject(reply2);
-        if (reply3) freeReplyObject(reply3);
+    /* Free sds strings containing max keys */
+    for(i=0;i<TYPE_NONE;i++) {
+        sdsfree(maxkeys[i]);
     }
+
+    /* Success! */
+    exit(0);
 }
 
+/*------------------------------------------------------------------------------
+ * Stats mode
+ *--------------------------------------------------------------------------- */
+
 /* Return the specified INFO field from the INFO command output "info".
  * A new buffer is allocated for the result, that needs to be free'd.
  * If the field is not found NULL is returned. */
@@ -1504,6 +1743,10 @@ static void statMode() {
     }
 }
 
+/*------------------------------------------------------------------------------
+ * Scan mode
+ *--------------------------------------------------------------------------- */
+
 static void scanMode() {
     redisReply *reply;
     unsigned long long cur = 0;
@@ -1533,6 +1776,73 @@ static void scanMode() {
     exit(0);
 }
 
+/*------------------------------------------------------------------------------
+ * Intrisic latency mode.
+ *
+ * Measure max latency of a running process that does not result from
+ * syscalls. Basically this software should provide an hint about how much
+ * time the kernel leaves the process without a chance to run.
+ *--------------------------------------------------------------------------- */
+
+/* This is just some computation the compiler can't optimize out.
+ * Should run in less than 100-200 microseconds even using very
+ * slow hardware. Runs in less than 10 microseconds in modern HW. */
+unsigned long compute_something_fast(void) {
+    unsigned char s[256], i, j, t;
+    int count = 1000, k;
+    unsigned long output = 0;
+
+    for (k = 0; k < 256; k++) s[k] = k;
+
+    i = 0;
+    j = 0;
+    while(count--) {
+        i++;
+        j = j + s[i];
+        t = s[i];
+        s[i] = s[j];
+        s[j] = t;
+        output += s[(s[i]+s[j])&255];
+    }
+    return output;
+}
+
+static void intrinsicLatencyMode(void) {
+    long long test_end, run_time, max_latency = 0, runs = 0;
+
+    run_time = config.intrinsic_latency_duration*1000000;
+    test_end = ustime() + run_time;
+
+    while(1) {
+        long long start, end, latency;
+
+        start = ustime();
+        compute_something_fast();
+        end = ustime();
+        latency = end-start;
+        runs++;
+        if (latency <= 0) continue;
+
+        /* Reporting */
+        if (latency > max_latency) {
+            max_latency = latency;
+            printf("Max latency so far: %lld microseconds.\n", max_latency);
+        }
+
+        if (end > test_end) {
+            printf("\n%lld total runs (avg %lld microseconds per run).\n",
+                runs, run_time/runs);
+            printf("Worst run took %.02fx times the avarege.\n",
+                (double) max_latency / (run_time/runs));
+            exit(0);
+        }
+    }
+}
+
+/*------------------------------------------------------------------------------
+ * Program main()
+ *--------------------------------------------------------------------------- */
+
 int main(int argc, char **argv) {
     int firstarg;
 
@@ -1553,10 +1863,11 @@ int main(int argc, char **argv) {
     config.getrdb_mode = 0;
     config.stat_mode = 0;
     config.scan_mode = 0;
+    config.intrinsic_latency_mode = 0;
     config.pattern = NULL;
     config.rdb_filename = NULL;
     config.pipe_mode = 0;
-    config.pipe_timeout = REDIS_DEFAULT_PIPE_TIMEOUT;
+    config.pipe_timeout = REDIS_CLI_DEFAULT_PIPE_TIMEOUT;
     config.bigkeys = 0;
     config.stdinarg = 0;
     config.auth = NULL;
@@ -1615,6 +1926,9 @@ int main(int argc, char **argv) {
         scanMode();
     }
 
+    /* Intrinsic latency mode */
+    if (config.intrinsic_latency_mode) intrinsicLatencyMode();
+
     /* Start interactive mode when no command is provided */
     if (argc == 0 && !config.eval) {
         /* Note that in repl mode we don't abort on connection error.
diff --git a/src/redis-trib.rb b/src/redis-trib.rb
index 3b40b2f37..be4b469cb 100755
--- a/src/redis-trib.rb
+++ b/src/redis-trib.rb
@@ -359,11 +359,11 @@ def check_open_slots
         @nodes.each{|n|
             if n.info[:migrating].size > 0
                 cluster_error \
-                    "[WARNING] Node #{n} has slots in migrating state."
+                    "[WARNING] Node #{n} has slots in migrating state (#{n.info[:migrating].keys.join(",")})."
                 open_slots += n.info[:migrating].keys
             elsif n.info[:importing].size > 0
                 cluster_error \
-                    "[WARNING] Node #{n} has slots in importing state."
+                    "[WARNING] Node #{n} has slots in importing state (#{n.info[:importing].keys.join(",")})."
                 open_slots += n.info[:importing].keys
             end
         }
@@ -469,6 +469,12 @@ def fix_open_slot(slot)
         #         importing state in 1 slot. That's trivial to address.
         if migrating.length == 1 && importing.length == 1
             move_slot(migrating[0],importing[0],slot,:verbose=>true)
+        elsif migrating.length == 1 && importing.length == 0
+            xputs ">>> Setting #{slot} as STABLE"
+            migrating[0].r.cluster("setslot",slot,"stable")
+        elsif migrating.length == 0 && importing.length == 1
+            xputs ">>> Setting #{slot} as STABLE"
+            importing[0].r.cluster("setslot",slot,"stable")
         else
             xputs "[ERR] Sorry, Redis-trib can't fix this slot yet (work in progress)"
         end
@@ -504,7 +510,6 @@ def wait_cluster_join
     def alloc_slots
         nodes_count = @nodes.length
         masters_count = @nodes.length / (@replicas+1)
-        slots_per_node = ClusterHashSlots / masters_count
         masters = []
         slaves = []
 
@@ -535,34 +540,60 @@ def alloc_slots
         end
 
         # Alloc slots on masters
-        i = 0
+        slots_per_node = ClusterHashSlots.to_f / masters_count
+        first = 0
+        cursor = 0.0
         masters.each_with_index{|n,masternum|
-            first = i*slots_per_node
-            last = first+slots_per_node-1
-            last = ClusterHashSlots-1 if masternum == masters.length-1
+            last = (cursor+slots_per_node-1).round
+            if last > ClusterHashSlots || masternum == masters.length-1
+                last = ClusterHashSlots-1
+            end
+            last = first if last < first # Min step is 1.
             n.add_slots first..last
-            i += 1
+            first = last+1
+            cursor += slots_per_node
         }
 
         # Select N replicas for every master.
         # We try to split the replicas among all the IPs with spare nodes
         # trying to avoid the host where the master is running, if possible.
-        masters.each{|m|
-            i = 0
-            while i < @replicas
-                ips.each{|ip,nodes_list|
-                    next if nodes_list.length == 0
-                    # Skip instances with the same IP as the master if we
-                    # have some more IPs available.
-                    next if ip == m.info[:host] && nodes_count > nodes_list.length
-                    slave = nodes_list.shift
-                    slave.set_as_replica(m.info[:name])
-                    nodes_count -= 1
-                    i += 1
-                    puts "#{m} replica ##{i} is #{slave}"
-                    break if masters.length == masters_count
-                }
-            end
+        #
+        # Note we loop two times.  The first loop assigns the requested
+        # number of replicas to each master.  The second loop assigns any
+        # remaining instances as extra replicas to masters.  Some masters
+        # may end up with more than their requested number of replicas, but
+        # all nodes will be used.
+        assignment_verbose = false
+
+        [:requested,:unused].each{|assign|
+            masters.each{|m|
+                assigned_replicas = 0
+                while assigned_replicas < @replicas
+                    break if nodes_count == 0
+                    if assignment_verbose
+                        if assign == :requested
+                            puts "Requesting total of #{@replicas} replicas " \
+                                 "(#{assigned_replicas} replicas assigned " \
+                                 "so far with #{nodes_count} total remaining)."
+                        elsif assign == :unused
+                            puts "Assigning extra instance to replication " \
+                                 "role too (#{nodes_count} remaining)."
+                        end
+                    end
+                    ips.each{|ip,nodes_list|
+                        next if nodes_list.length == 0
+                        # Skip instances with the same IP as the master if we
+                        # have some more IPs available.
+                        next if ip == m.info[:host] && nodes_count > nodes_list.length
+                        slave = nodes_list.shift
+                        slave.set_as_replica(m.info[:name])
+                        nodes_count -= 1
+                        assigned_replicas += 1
+                        puts "Adding replica #{slave} to #{m}"
+                        break
+                    }
+                end
+            }
         }
     end
 
@@ -578,6 +609,22 @@ def show_nodes
         }
     end
 
+    # Redis Cluster config epoch collision resolution code is able to eventually
+    # set a different epoch to each node after a new cluster is created, but
+    # it is slow compared to assign a progressive config epoch to each node
+    # before joining the cluster. However we do just a best-effort try here
+    # since if we fail is not a problem.
+    def assign_config_epoch
+        config_epoch = 1
+        @nodes.each{|n|
+            begin
+                n.r.cluster("set-config-epoch",config_epoch)
+            rescue
+            end
+            config_epoch += 1
+        }
+    end
+
     def join_cluster
         # We use a brute force approach to make sure the node will meet
         # each other, that is, sending CLUSTER MEET messages to all the nodes
@@ -693,7 +740,7 @@ def move_slot(source,target,slot,o={})
             keys = source.r.cluster("getkeysinslot",slot,10)
             break if keys.length == 0
             keys.each{|key|
-                source.r.migrate(target.info[:host],target.info[:port],key,0,1000)
+                source.r.client.call(["migrate",target.info[:host],target.info[:port],key,0,15000])
                 print "." if o[:verbose]
                 STDOUT.flush
             }
@@ -819,6 +866,8 @@ def create_cluster_cmd(argv,opt)
         yes_or_die "Can I set the above configuration?"
         flush_nodes_config
         xputs ">>> Nodes configuration updated"
+        xputs ">>> Assign a different config epoch to each node"
+        assign_config_epoch
         xputs ">>> Sending CLUSTER MEET messages to join the cluster"
         join_cluster
         # Give one second for the join to start, in order to avoid that
@@ -898,10 +947,10 @@ def delnode_cluster_cmd(argv,opt)
         xputs ">>> Sending CLUSTER FORGET messages to the cluster..."
         @nodes.each{|n|
             next if n == node
-            if n.info[:replicate] && n.info[:replicate].downcase == node_id
+            if n.info[:replicate] && n.info[:replicate].downcase == id
                 # Reconfigure the slave to replicate with some other node
-                xputs ">>> #{n} as replica of #{master}"
                 master = get_master_with_least_replicas
+                xputs ">>> #{n} as replica of #{master}"
                 n.r.cluster("replicate",master.info[:name])
             end
             n.r.cluster("forget",argv[1])
@@ -940,6 +989,71 @@ def set_timeout_cluster_cmd(argv,opt)
         xputs ">>> New node timeout set. #{ok_count} OK, #{err_count} ERR."
     end
 
+    def call_cluster_cmd(argv,opt)
+        cmd = argv[1..-1]
+        cmd[0] = cmd[0].upcase
+
+        # Load cluster information
+        load_cluster_info_from_node(argv[0])
+        xputs ">>> Calling #{cmd.join(" ")}"
+        @nodes.each{|n|
+            begin
+                res = n.r.send(*cmd)
+                puts "#{n}: #{res}"
+            rescue => e
+                puts "#{n}: #{e}"
+            end
+        }
+    end
+
+    def import_cluster_cmd(argv,opt)
+        source_addr = opt['from']
+        xputs ">>> Importing data from #{source_addr} to cluster #{argv[1]}"
+
+        # Check the existing cluster.
+        load_cluster_info_from_node(argv[0])
+        check_cluster
+
+        # Connect to the source node.
+        xputs ">>> Connecting to the source Redis instance"
+        src_host,src_port = source_addr.split(":")
+        source = Redis.new(:host =>src_host, :port =>src_port)
+        if source.info['cluster_enabled'].to_i == 1
+            xputs "[ERR] The source node should not be a cluster node."
+        end
+        xputs "*** Importing #{source.dbsize} keys from DB 0"
+
+        # Build a slot -> node map
+        slots = {}
+        @nodes.each{|n|
+            n.slots.each{|s,_|
+                slots[s] = n
+            }
+        }
+
+        # Use SCAN to iterate over the keys, migrating to the
+        # right node as needed.
+        cursor = nil
+        while cursor != 0
+            cursor,keys = source.scan(cursor,:count,1000)
+            cursor = cursor.to_i
+            keys.each{|k|
+                # Migrate keys using the MIGRATE command.
+                slot = key_to_slot(k)
+                target = slots[slot]
+                print "Migrating #{k} to #{target}: "
+                STDOUT.flush
+                begin
+                    source.client.call(["migrate",target.info[:host],target.info[:port],k,0,15000])
+                rescue => e
+                    puts e
+                else
+                    puts "OK"
+                end
+            }
+        end
+    end
+
     def help_cluster_cmd(argv,opt)
         show_help
         exit 0
@@ -971,10 +1085,109 @@ def parse_options(cmd)
                 break
             end
         end
+
+        # Enforce mandatory options
+        if ALLOWED_OPTIONS[cmd]
+            ALLOWED_OPTIONS[cmd].each {|option,val|
+                if !options[option] && val == :required
+                    puts "Option '--#{option}' is required "+ \
+                         "for subcommand '#{cmd}'"
+                    exit 1
+                end
+            }
+        end
         return options,idx
     end
 end
 
+#################################################################################
+# Libraries
+# 
+# We try to don't depend on external libs since this is a critical part
+# of Redis Cluster.
+#################################################################################
+
+# This is the CRC16 algorithm used by Redis Cluster to hash keys.
+# Implementation according to CCITT standards.
+#
+# This is actually the XMODEM CRC 16 algorithm, using the
+# following parameters:
+#
+# Name                       : "XMODEM", also known as "ZMODEM", "CRC-16/ACORN"
+# Width                      : 16 bit
+# Poly                       : 1021 (That is actually x^16 + x^12 + x^5 + 1)
+# Initialization             : 0000
+# Reflect Input byte         : False
+# Reflect Output CRC         : False
+# Xor constant to output CRC : 0000
+# Output for "123456789"     : 31C3
+
+module RedisClusterCRC16
+    def RedisClusterCRC16.crc16(bytes)
+        crc = 0
+        bytes.each_byte{|b|
+            crc = ((crc<<8) & 0xffff) ^ XMODEMCRC16Lookup[((crc>>8)^b) & 0xff]
+        }
+        crc
+    end
+
+private
+    XMODEMCRC16Lookup = [
+        0x0000,0x1021,0x2042,0x3063,0x4084,0x50a5,0x60c6,0x70e7,
+        0x8108,0x9129,0xa14a,0xb16b,0xc18c,0xd1ad,0xe1ce,0xf1ef,
+        0x1231,0x0210,0x3273,0x2252,0x52b5,0x4294,0x72f7,0x62d6,
+        0x9339,0x8318,0xb37b,0xa35a,0xd3bd,0xc39c,0xf3ff,0xe3de,
+        0x2462,0x3443,0x0420,0x1401,0x64e6,0x74c7,0x44a4,0x5485,
+        0xa56a,0xb54b,0x8528,0x9509,0xe5ee,0xf5cf,0xc5ac,0xd58d,
+        0x3653,0x2672,0x1611,0x0630,0x76d7,0x66f6,0x5695,0x46b4,
+        0xb75b,0xa77a,0x9719,0x8738,0xf7df,0xe7fe,0xd79d,0xc7bc,
+        0x48c4,0x58e5,0x6886,0x78a7,0x0840,0x1861,0x2802,0x3823,
+        0xc9cc,0xd9ed,0xe98e,0xf9af,0x8948,0x9969,0xa90a,0xb92b,
+        0x5af5,0x4ad4,0x7ab7,0x6a96,0x1a71,0x0a50,0x3a33,0x2a12,
+        0xdbfd,0xcbdc,0xfbbf,0xeb9e,0x9b79,0x8b58,0xbb3b,0xab1a,
+        0x6ca6,0x7c87,0x4ce4,0x5cc5,0x2c22,0x3c03,0x0c60,0x1c41,
+        0xedae,0xfd8f,0xcdec,0xddcd,0xad2a,0xbd0b,0x8d68,0x9d49,
+        0x7e97,0x6eb6,0x5ed5,0x4ef4,0x3e13,0x2e32,0x1e51,0x0e70,
+        0xff9f,0xefbe,0xdfdd,0xcffc,0xbf1b,0xaf3a,0x9f59,0x8f78,
+        0x9188,0x81a9,0xb1ca,0xa1eb,0xd10c,0xc12d,0xf14e,0xe16f,
+        0x1080,0x00a1,0x30c2,0x20e3,0x5004,0x4025,0x7046,0x6067,
+        0x83b9,0x9398,0xa3fb,0xb3da,0xc33d,0xd31c,0xe37f,0xf35e,
+        0x02b1,0x1290,0x22f3,0x32d2,0x4235,0x5214,0x6277,0x7256,
+        0xb5ea,0xa5cb,0x95a8,0x8589,0xf56e,0xe54f,0xd52c,0xc50d,
+        0x34e2,0x24c3,0x14a0,0x0481,0x7466,0x6447,0x5424,0x4405,
+        0xa7db,0xb7fa,0x8799,0x97b8,0xe75f,0xf77e,0xc71d,0xd73c,
+        0x26d3,0x36f2,0x0691,0x16b0,0x6657,0x7676,0x4615,0x5634,
+        0xd94c,0xc96d,0xf90e,0xe92f,0x99c8,0x89e9,0xb98a,0xa9ab,
+        0x5844,0x4865,0x7806,0x6827,0x18c0,0x08e1,0x3882,0x28a3,
+        0xcb7d,0xdb5c,0xeb3f,0xfb1e,0x8bf9,0x9bd8,0xabbb,0xbb9a,
+        0x4a75,0x5a54,0x6a37,0x7a16,0x0af1,0x1ad0,0x2ab3,0x3a92,
+        0xfd2e,0xed0f,0xdd6c,0xcd4d,0xbdaa,0xad8b,0x9de8,0x8dc9,
+        0x7c26,0x6c07,0x5c64,0x4c45,0x3ca2,0x2c83,0x1ce0,0x0cc1,
+        0xef1f,0xff3e,0xcf5d,0xdf7c,0xaf9b,0xbfba,0x8fd9,0x9ff8,
+        0x6e17,0x7e36,0x4e55,0x5e74,0x2e93,0x3eb2,0x0ed1,0x1ef0
+    ]
+end
+
+# Turn a key name into the corrisponding Redis Cluster slot.
+def key_to_slot(key)
+    # Only hash what is inside {...} if there is such a pattern in the key.
+    # Note that the specification requires the content that is between
+    # the first { and the first } after the first {. If we found {} without
+    # nothing in the middle, the whole key is hashed as usually.
+    s = key.index "{"
+    if s
+        e = key.index "}",s+1
+        if e && e != s+1
+            key = key[s+1..e-1]
+        end
+    end
+    RedisClusterCRC16.crc16(key) % 16384
+end
+
+#################################################################################
+# Definition of commands
+#################################################################################
+
 COMMANDS={
     "create"  => ["create_cluster_cmd", -2, "host1:port1 ... hostN:portN"],
     "check"   => ["check_cluster_cmd", 2, "host:port"],
@@ -983,12 +1196,15 @@ def parse_options(cmd)
     "add-node" => ["addnode_cluster_cmd", 3, "new_host:new_port existing_host:existing_port"],
     "del-node" => ["delnode_cluster_cmd", 3, "host:port node_id"],
     "set-timeout" => ["set_timeout_cluster_cmd", 3, "host:port milliseconds"],
+    "call" =>    ["call_cluster_cmd", -3, "host:port command arg arg .. arg"],
+    "import" =>  ["import_cluster_cmd", 2, "host:port"],
     "help"    => ["help_cluster_cmd", 1, "(show this help)"]
 }
 
 ALLOWED_OPTIONS={
     "create" => {"replicas" => true},
-    "addnode" => {"slave" => false, "master-id" => true}
+    "add-node" => {"slave" => false, "master-id" => true},
+    "import" => {"from" => :required}
 }
 
 def show_help
diff --git a/src/redis.c b/src/redis.c
index 4f8ba2fa5..0c72ea53b 100644
--- a/src/redis.c
+++ b/src/redis.c
@@ -179,13 +179,13 @@ struct redisCommand *commandTable;
  */
 struct redisCommand redisCommandTable[] = {
     {"get",getCommand,2,"r",0,NULL,1,1,1,0,0},
-    {"set",setCommand,-3,"wm",0,noPreloadGetKeys,1,1,1,0,0},
-    {"setnx",setnxCommand,3,"wm",0,noPreloadGetKeys,1,1,1,0,0},
-    {"setex",setexCommand,4,"wm",0,noPreloadGetKeys,1,1,1,0,0},
-    {"psetex",psetexCommand,4,"wm",0,noPreloadGetKeys,1,1,1,0,0},
+    {"set",setCommand,-3,"wm",0,NULL,1,1,1,0,0},
+    {"setnx",setnxCommand,3,"wm",0,NULL,1,1,1,0,0},
+    {"setex",setexCommand,4,"wm",0,NULL,1,1,1,0,0},
+    {"psetex",psetexCommand,4,"wm",0,NULL,1,1,1,0,0},
     {"append",appendCommand,3,"wm",0,NULL,1,1,1,0,0},
     {"strlen",strlenCommand,2,"r",0,NULL,1,1,1,0,0},
-    {"del",delCommand,-2,"w",0,noPreloadGetKeys,1,-1,1,0,0},
+    {"del",delCommand,-2,"w",0,NULL,1,-1,1,0,0},
     {"exists",existsCommand,2,"r",0,NULL,1,1,1,0,0},
     {"setbit",setbitCommand,4,"wm",0,NULL,1,1,1,0,0},
     {"getbit",getbitCommand,3,"r",0,NULL,1,1,1,0,0},
@@ -232,12 +232,16 @@ struct redisCommand redisCommandTable[] = {
     {"zrem",zremCommand,-3,"w",0,NULL,1,1,1,0,0},
     {"zremrangebyscore",zremrangebyscoreCommand,4,"w",0,NULL,1,1,1,0,0},
     {"zremrangebyrank",zremrangebyrankCommand,4,"w",0,NULL,1,1,1,0,0},
+    {"zremrangebylex",zremrangebylexCommand,4,"w",0,NULL,1,1,1,0,0},
     {"zunionstore",zunionstoreCommand,-4,"wm",0,zunionInterGetKeys,0,0,0,0,0},
     {"zinterstore",zinterstoreCommand,-4,"wm",0,zunionInterGetKeys,0,0,0,0,0},
     {"zrange",zrangeCommand,-4,"r",0,NULL,1,1,1,0,0},
     {"zrangebyscore",zrangebyscoreCommand,-4,"r",0,NULL,1,1,1,0,0},
     {"zrevrangebyscore",zrevrangebyscoreCommand,-4,"r",0,NULL,1,1,1,0,0},
+    {"zrangebylex",zrangebylexCommand,-4,"r",0,NULL,1,1,1,0,0},
+    {"zrevrangebylex",zrevrangebylexCommand,-4,"r",0,NULL,1,1,1,0,0},
     {"zcount",zcountCommand,4,"r",0,NULL,1,1,1,0,0},
+    {"zlexcount",zlexcountCommand,4,"r",0,NULL,1,1,1,0,0},
     {"zrevrange",zrevrangeCommand,-4,"r",0,NULL,1,1,1,0,0},
     {"zcard",zcardCommand,2,"r",0,NULL,1,1,1,0,0},
     {"zscore",zscoreCommand,3,"r",0,NULL,1,1,1,0,0},
@@ -267,8 +271,8 @@ struct redisCommand redisCommandTable[] = {
     {"randomkey",randomkeyCommand,1,"rR",0,NULL,0,0,0,0,0},
     {"select",selectCommand,2,"rl",0,NULL,0,0,0,0,0},
     {"move",moveCommand,3,"w",0,NULL,1,1,1,0,0},
-    {"rename",renameCommand,3,"w",0,renameGetKeys,1,2,1,0,0},
-    {"renamenx",renamenxCommand,3,"w",0,renameGetKeys,1,2,1,0,0},
+    {"rename",renameCommand,3,"w",0,NULL,1,2,1,0,0},
+    {"renamenx",renamenxCommand,3,"w",0,NULL,1,2,1,0,0},
     {"expire",expireCommand,3,"w",0,NULL,1,1,1,0,0},
     {"expireat",expireatCommand,3,"w",0,NULL,1,1,1,0,0},
     {"pexpire",pexpireCommand,3,"w",0,NULL,1,1,1,0,0},
@@ -282,7 +286,7 @@ struct redisCommand redisCommandTable[] = {
     {"save",saveCommand,1,"ars",0,NULL,0,0,0,0,0},
     {"bgsave",bgsaveCommand,1,"ar",0,NULL,0,0,0,0,0},
     {"bgrewriteaof",bgrewriteaofCommand,1,"ar",0,NULL,0,0,0,0,0},
-    {"shutdown",shutdownCommand,-1,"arl",0,NULL,0,0,0,0,0},
+    {"shutdown",shutdownCommand,-1,"arlt",0,NULL,0,0,0,0,0},
     {"lastsave",lastsaveCommand,1,"rR",0,NULL,0,0,0,0,0},
     {"type",typeCommand,2,"r",0,NULL,1,1,1,0,0},
     {"multi",multiCommand,1,"rs",0,NULL,0,0,0,0,0},
@@ -293,7 +297,7 @@ struct redisCommand redisCommandTable[] = {
     {"replconf",replconfCommand,-1,"arslt",0,NULL,0,0,0,0,0},
     {"flushdb",flushdbCommand,1,"w",0,NULL,0,0,0,0,0},
     {"flushall",flushallCommand,1,"w",0,NULL,0,0,0,0,0},
-    {"sort",sortCommand,-2,"wm",0,NULL,1,1,1,0,0},
+    {"sort",sortCommand,-2,"wm",0,sortGetKeys,1,1,1,0,0},
     {"info",infoCommand,-1,"rlt",0,NULL,0,0,0,0,0},
     {"monitor",monitorCommand,1,"ars",0,NULL,0,0,0,0,0},
     {"ttl",ttlCommand,2,"r",0,NULL,1,1,1,0,0},
@@ -301,14 +305,14 @@ struct redisCommand redisCommandTable[] = {
     {"persist",persistCommand,2,"w",0,NULL,1,1,1,0,0},
     {"slaveof",slaveofCommand,3,"ast",0,NULL,0,0,0,0,0},
     {"debug",debugCommand,-2,"as",0,NULL,0,0,0,0,0},
-    {"config",configCommand,-2,"ar",0,NULL,0,0,0,0,0},
+    {"config",configCommand,-2,"art",0,NULL,0,0,0,0,0},
     {"subscribe",subscribeCommand,-2,"rpslt",0,NULL,0,0,0,0,0},
     {"unsubscribe",unsubscribeCommand,-1,"rpslt",0,NULL,0,0,0,0,0},
     {"psubscribe",psubscribeCommand,-2,"rpslt",0,NULL,0,0,0,0,0},
     {"punsubscribe",punsubscribeCommand,-1,"rpslt",0,NULL,0,0,0,0,0},
     {"publish",publishCommand,3,"pltr",0,NULL,0,0,0,0,0},
     {"pubsub",pubsubCommand,-2,"pltrR",0,NULL,0,0,0,0,0},
-    {"watch",watchCommand,-2,"rs",0,noPreloadGetKeys,1,-1,1,0,0},
+    {"watch",watchCommand,-2,"rs",0,NULL,1,-1,1,0,0},
     {"unwatch",unwatchCommand,1,"rs",0,NULL,0,0,0,0,0},
     {"cluster",clusterCommand,-2,"ar",0,NULL,0,0,0,0,0},
     {"restore",restoreCommand,-4,"awm",0,NULL,1,1,1,0,0},
@@ -320,16 +324,24 @@ struct redisCommand redisCommandTable[] = {
     {"dump",dumpCommand,2,"ar",0,NULL,1,1,1,0,0},
     {"object",objectCommand,-2,"r",0,NULL,2,2,2,0,0},
     {"client",clientCommand,-2,"ar",0,NULL,0,0,0,0,0},
-    {"eval",evalCommand,-3,"s",0,zunionInterGetKeys,0,0,0,0,0},
-    {"evalsha",evalShaCommand,-3,"s",0,zunionInterGetKeys,0,0,0,0,0},
+    {"eval",evalCommand,-3,"s",0,evalGetKeys,0,0,0,0,0},
+    {"evalsha",evalShaCommand,-3,"s",0,evalGetKeys,0,0,0,0,0},
     {"slowlog",slowlogCommand,-2,"r",0,NULL,0,0,0,0,0},
     {"script",scriptCommand,-2,"ras",0,NULL,0,0,0,0,0},
     {"time",timeCommand,1,"rR",0,NULL,0,0,0,0,0},
     {"bitop",bitopCommand,-4,"wm",0,NULL,2,-1,1,0,0},
     {"bitcount",bitcountCommand,-2,"r",0,NULL,1,1,1,0,0},
-    {"wait",waitCommand,3,"rs",0,NULL,0,0,0,0,0}
+    {"bitpos",bitposCommand,-3,"r",0,NULL,1,1,1,0,0},
+    {"wait",waitCommand,3,"rs",0,NULL,0,0,0,0,0},
+    {"pfselftest",pfselftestCommand,1,"r",0,NULL,0,0,0,0,0},
+    {"pfadd",pfaddCommand,-2,"wm",0,NULL,1,1,1,0,0},
+    {"pfcount",pfcountCommand,-2,"w",0,NULL,1,1,1,0,0},
+    {"pfmerge",pfmergeCommand,-2,"wm",0,NULL,1,-1,1,0,0},
+    {"pfdebug",pfdebugCommand,-3,"w",0,NULL,0,0,0,0,0}
 };
 
+struct evictionPoolEntry *evictionPoolAlloc(void);
+
 /*============================ Utility functions ============================ */
 
 /* Low level logging. To use only for very big messages, otherwise
@@ -765,11 +777,11 @@ void updateDictResizePolicy(void) {
  * When a key is expired, server.stat_expiredkeys is incremented.
  *
  * The parameter 'now' is the current time in milliseconds as is passed
- * to the function to avoid too many gettimeofday() syscalls. 
+ * to the function to avoid too many gettimeofday() syscalls.
  *
  * 参数 now 是毫秒格式的当前时间
  */
-int activeExpireCycleTryExpire(redisDb *db, struct dictEntry *de, long long now) {
+int activeExpireCycleTryExpire(redisDb *db, dictEntry *de, long long now) {
     // 获取键的过期时间
     long long t = dictGetSignedIntegerVal(de);
     if (now > t) {
@@ -1019,13 +1031,10 @@ void activeExpireCycle(int type) {
     }
 }
 
-// 更新 LRU 时钟
-void updateLRUClock(void) {
-    server.lruclock = (server.unixtime/REDIS_LRU_CLOCK_RESOLUTION) &
-                                                REDIS_LRU_CLOCK_MAX;
+unsigned int getLRUClock(void) {
+    return (mstime()/REDIS_LRU_CLOCK_RESOLUTION) & REDIS_LRU_CLOCK_MAX;
 }
 
-
 /* Add a sample to the operations per second array of samples. */
 // 将服务器的命令执行次数记录到抽样数组中
 void trackOperationsPerSecond(void) {
@@ -1259,6 +1268,15 @@ void databasesCron(void) {
     }
 }
 
+/* We take a cached value of the unix time in the global state because with
+ * virtual memory and aging there is to store the current time in objects at
+ * every object access, and accuracy is not needed. To access a global var is
+ * a lot faster than calling time(NULL) */
+void updateCachedTime(void) {
+    server.unixtime = time(NULL);
+    server.mstime = mstime();
+}
+
 /* This is our timer interrupt, called server.hz times per second.
  *
  * 这是 Redis 的时间中断器，每秒调用 server.hz 次。
@@ -1313,30 +1331,20 @@ int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
      * handler if we don't return here fast enough. */
     if (server.watchdog_period) watchdogScheduleSignal(server.watchdog_period);
 
-    /* We take a cached value of the unix time in the global state because
-     * with virtual memory and aging there is to store the current time
-     * in objects at every object access, and accuracy is not needed.
-     * To access a global var is faster than calling time(NULL) */
-    // 缓存起系统时间，减少执行系统调用的次数
-    // 秒格式的 UNIX 时间
-    server.unixtime = time(NULL);
-    // 毫秒格式的时间
-    server.mstime = mstime();
+    /* Update the time cache. */
+    updateCachedTime();
 
     // 记录服务器执行命令的次数
     run_with_period(100) trackOperationsPerSecond();
 
-    /* We have just 22 bits per object for LRU information.
-     * So we use an (eventually wrapping) LRU clock with 10 seconds resolution.
-     * 2^22 bits with 10 seconds resolution is more or less 1.5 years.
+    /* We have just REDIS_LRU_BITS bits per object for LRU information.
+     * So we use an (eventually wrapping) LRU clock.
      *
-     * 因为对象的 LRU 长度只有 22 位，所以我们使用 10 秒精度的 LRU 时钟，
-     * 这样即使只有 22 位，也可以保存大约 1.5 年的时间信息。
-     *
-     * Note that even if this will wrap after 1.5 years it's not a problem,
-     * everything will still work but just some object will appear younger
-     * to Redis. But for this to happen a given object should never be touched
-     * for 1.5 years.
+     * Note that even if the counter wraps it's not a big problem,
+     * everything will still work but some object will appear younger
+     * to Redis. However for this to happen a given object should never be
+     * touched for all the time needed to the counter to wrap, which is
+     * not likely.
      *
      * 即使服务器的时间最终比 1.5 年长也无所谓，
      * 对象系统仍会正常运作，不过一些对象可能会比服务器本身的时钟更年轻。
@@ -1347,13 +1355,16 @@ int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
      *
      * LRU 时间的精度可以通过修改 REDIS_LRU_CLOCK_RESOLUTION 常量来改变。
      */
-    updateLRUClock();
+    server.lruclock = getLRUClock();
 
     /* Record the max memory used since the server was started. */
     // 记录服务器的内存峰值
     if (zmalloc_used_memory() > server.stat_peak_memory)
         server.stat_peak_memory = zmalloc_used_memory();
 
+    /* Sample the RSS here since this is a relatively slow call. */
+    server.resident_set_size = zmalloc_get_rss();
+
     /* We received a SIGTERM, shutting down here in a safe way, as it is
      * not ok doing so inside the signal handler. */
     // 服务器进程收到 SIGTERM 信号，关闭服务器
@@ -1499,17 +1510,28 @@ int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
          }
     }
 
-
-    /* If we postponed an AOF buffer flush, let's try to do it every time the
-     * cron function is called. */
     // 根据 AOF 政策，
     // 考虑是否需要将 AOF 缓冲区中的内容写入到 AOF 文件中
+    /* AOF postponed flush: Try at every cron cycle if the slow fsync
+     * completed. */
     if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);
 
+    /* AOF write errors: in this case we have a buffer to flush as well and
+     * clear the AOF error in case of success to make the DB writable again,
+     * however to try every second is enough in case of 'hz' is set to
+     * an higher frequency. */
+    run_with_period(1000) {
+        if (server.aof_last_write_status == REDIS_ERR)
+            flushAppendOnlyFile(0);
+    }
+
     /* Close clients that need to be closed asynchronous */
     // 关闭那些需要异步关闭的客户端
     freeClientsInAsyncFreeQueue();
 
+    /* Clear the paused clients flag if needed. */
+    clientsArePaused(); /* Don't check return value, just use the side effect. */
+
     /* Replication cron function -- used to reconnect to master and
      * to detect transfer failures. */
     // 复制函数
@@ -1636,6 +1658,8 @@ void createSharedObjects(void) {
         "-EXECABORT Transaction discarded because of previous errors.\r\n"));
     shared.noreplicaserr = createObject(REDIS_STRING,sdsnew(
         "-NOREPLICAS Not enough good slaves to write.\r\n"));
+    shared.busykeyerr = createObject(REDIS_STRING,sdsnew(
+        "-BUSYKEY Target key name already exists.\r\n"));
 
     // 常用字符
     shared.space = createObject(REDIS_STRING,sdsnew(" "));
@@ -1681,6 +1705,12 @@ void createSharedObjects(void) {
         shared.bulkhdr[j] = createObject(REDIS_STRING,
             sdscatprintf(sdsempty(),"$%d\r\n",j));
     }
+    /* The following two shared objects, minstring and maxstrings, are not
+     * actually used for their value but as a special object meaning
+     * respectively the minimum possible string and the maximum possible
+     * string in string comparisons for the ZRANGEBYLEX command. */
+    shared.minstring = createStringObject("minstring",9);
+    shared.maxstring = createStringObject("maxstring",9);
 }
 
 void initServerConfig() {
@@ -1700,6 +1730,7 @@ void initServerConfig() {
     server.arch_bits = (sizeof(long) == 8) ? 64 : 32;
     // 设置默认服务器端口号
     server.port = REDIS_SERVERPORT;
+    server.tcp_backlog = REDIS_TCP_BACKLOG;
     server.bindaddr_count = 0;
     server.unixsocket = NULL;
     server.unixsocketperm = REDIS_DEFAULT_UNIX_SOCKET_PERM;
@@ -1755,6 +1786,7 @@ void initServerConfig() {
     server.set_max_intset_entries = REDIS_SET_MAX_INTSET_ENTRIES;
     server.zset_max_ziplist_entries = REDIS_ZSET_MAX_ZIPLIST_ENTRIES;
     server.zset_max_ziplist_value = REDIS_ZSET_MAX_ZIPLIST_VALUE;
+    server.hll_sparse_max_bytes = REDIS_DEFAULT_HLL_SPARSE_MAX_BYTES;
     server.shutdown_asap = 0;
     server.repl_ping_slave_period = REDIS_REPL_PING_SLAVE_PERIOD;
     server.repl_timeout = REDIS_REPL_TIMEOUT;
@@ -1762,6 +1794,7 @@ void initServerConfig() {
     server.repl_min_slaves_max_lag = REDIS_DEFAULT_MIN_SLAVES_MAX_LAG;
     server.cluster_enabled = 0;
     server.cluster_node_timeout = REDIS_CLUSTER_DEFAULT_NODE_TIMEOUT;
+    server.cluster_migration_barrier = REDIS_CLUSTER_DEFAULT_MIGRATION_BARRIER;
     server.cluster_configfile = zstrdup(REDIS_DEFAULT_CLUSTER_CONFIG_FILE);
     server.lua_caller = NULL;
     server.lua_time_limit = REDIS_LUA_TIME_LIMIT;
@@ -1771,7 +1804,7 @@ void initServerConfig() {
     server.loading_process_events_interval_bytes = (1024*1024*2);
 
     // 初始化 LRU 时间
-    updateLRUClock();
+    server.lruclock = getLRUClock();
 
     // 初始化并设置保存条件
     resetServerSaveParams();
@@ -1848,21 +1881,21 @@ void initServerConfig() {
 }
 
 /* This function will try to raise the max number of open files accordingly to
- * the configured max number of clients. It will also account for 32 additional
- * file descriptors as we need a few more for persistence, listening
- * sockets, log files and so forth.
+ * the configured max number of clients. It also reserves a number of file
+ * descriptors (REDIS_MIN_RESERVED_FDS) for extra operations of
+ * persistence, listening sockets, log files and so forth.
  *
  * If it will not be possible to set the limit accordingly to the configured
  * max number of clients, the function will do the reverse setting
  * server.maxclients to the value that we can actually handle. */
 void adjustOpenFilesLimit(void) {
-    rlim_t maxfiles = server.maxclients+32;
+    rlim_t maxfiles = server.maxclients+REDIS_MIN_RESERVED_FDS;
     struct rlimit limit;
 
     if (getrlimit(RLIMIT_NOFILE,&limit) == -1) {
         redisLog(REDIS_WARNING,"Unable to obtain the current NOFILE limit (%s), assuming 1024 and setting the max clients configuration accordingly.",
             strerror(errno));
-        server.maxclients = 1024-32;
+        server.maxclients = 1024-REDIS_MIN_RESERVED_FDS;
     } else {
         rlim_t oldlimit = limit.rlim_cur;
 
@@ -1870,22 +1903,58 @@ void adjustOpenFilesLimit(void) {
          * for our needs. */
         if (oldlimit < maxfiles) {
             rlim_t f;
-            
+            int setrlimit_error = 0;
+
+            /* Try to set the file limit to match 'maxfiles' or at least
+             * to the higher value supported less than maxfiles. */
             f = maxfiles;
             while(f > oldlimit) {
+                int decr_step = 16;
+
                 limit.rlim_cur = f;
                 limit.rlim_max = f;
                 if (setrlimit(RLIMIT_NOFILE,&limit) != -1) break;
-                f -= 128;
+                setrlimit_error = errno;
+
+                /* We failed to set file limit to 'f'. Try with a
+                 * smaller limit decrementing by a few FDs per iteration. */
+                if (f < decr_step) break;
+                f -= decr_step;
             }
+
+            /* Assume that the limit we get initially is still valid if
+             * our last try was even lower. */
             if (f < oldlimit) f = oldlimit;
+
             if (f != maxfiles) {
-                server.maxclients = f-32;
-                redisLog(REDIS_WARNING,"Unable to set the max number of files limit to %d (%s), setting the max clients configuration to %d.",
-                    (int) maxfiles, strerror(errno), (int) server.maxclients);
+                int old_maxclients = server.maxclients;
+                server.maxclients = f-REDIS_MIN_RESERVED_FDS;
+                if (server.maxclients < 1) {
+                    redisLog(REDIS_WARNING,"Your current 'ulimit -n' "
+                        "of %llu is not enough for Redis to start. "
+                        "Please increase your open file limit to at least "
+                        "%llu. Exiting.",
+                        (unsigned long long) oldlimit,
+                        (unsigned long long) maxfiles);
+                    exit(1);
+                }
+                redisLog(REDIS_WARNING,"You requested maxclients of %d "
+                    "requiring at least %llu max file descriptors.",
+                    old_maxclients,
+                    (unsigned long long) maxfiles);
+                redisLog(REDIS_WARNING,"Redis can't set maximum open files "
+                    "to %llu because of OS error: %s.",
+                    (unsigned long long) maxfiles, strerror(setrlimit_error));
+                redisLog(REDIS_WARNING,"Current maximum open files is %llu. "
+                    "maxclients has been reduced to %d to compensate for "
+                    "low ulimit. "
+                    "If you need higher maxclients increase 'ulimit -n'.",
+                    (unsigned long long) oldlimit, server.maxclients);
             } else {
-                redisLog(REDIS_NOTICE,"Max number of open files set to %d",
-                    (int) maxfiles);
+                redisLog(REDIS_NOTICE,"Increased maximum number of open files "
+                    "to %llu (it was originally set to %llu).",
+                    (unsigned long long) maxfiles,
+                    (unsigned long long) oldlimit);
             }
         }
     }
@@ -1919,33 +1988,65 @@ int listenToPort(int port, int *fds, int *count) {
         if (server.bindaddr[j] == NULL) {
             /* Bind * for both IPv6 and IPv4, we enter here only if
              * server.bindaddr_count == 0. */
-            fds[*count] = anetTcp6Server(server.neterr,port,NULL);
-            if (fds[*count] != ANET_ERR) (*count)++;
-            fds[*count] = anetTcpServer(server.neterr,port,NULL);
-            if (fds[*count] != ANET_ERR) (*count)++;
+            fds[*count] = anetTcp6Server(server.neterr,port,NULL,
+                server.tcp_backlog);
+            if (fds[*count] != ANET_ERR) {
+                anetNonBlock(NULL,fds[*count]);
+                (*count)++;
+            }
+            fds[*count] = anetTcpServer(server.neterr,port,NULL,
+                server.tcp_backlog);
+            if (fds[*count] != ANET_ERR) {
+                anetNonBlock(NULL,fds[*count]);
+                (*count)++;
+            }
             /* Exit the loop if we were able to bind * on IPv4 or IPv6,
              * otherwise fds[*count] will be ANET_ERR and we'll print an
              * error and return to the caller with an error. */
             if (*count) break;
         } else if (strchr(server.bindaddr[j],':')) {
             /* Bind IPv6 address. */
-            fds[*count] = anetTcp6Server(server.neterr,port,server.bindaddr[j]);
+            fds[*count] = anetTcp6Server(server.neterr,port,server.bindaddr[j],
+                server.tcp_backlog);
         } else {
             /* Bind IPv4 address. */
-            fds[*count] = anetTcpServer(server.neterr,port,server.bindaddr[j]);
+            fds[*count] = anetTcpServer(server.neterr,port,server.bindaddr[j],
+                server.tcp_backlog);
         }
         if (fds[*count] == ANET_ERR) {
             redisLog(REDIS_WARNING,
                 "Creating Server TCP listening socket %s:%d: %s",
                 server.bindaddr[j] ? server.bindaddr[j] : "*",
-                server.port, server.neterr);
+                port, server.neterr);
             return REDIS_ERR;
         }
+        anetNonBlock(NULL,fds[*count]);
         (*count)++;
     }
     return REDIS_OK;
 }
 
+/* Resets the stats that we expose via INFO or other means that we want
+ * to reset via CONFIG RESETSTAT. The function is also used in order to
+ * initialize these fields in initServer() at server startup. */
+void resetServerStats(void) {
+    server.stat_numcommands = 0;
+    server.stat_numconnections = 0;
+    server.stat_expiredkeys = 0;
+    server.stat_evictedkeys = 0;
+    server.stat_keyspace_misses = 0;
+    server.stat_keyspace_hits = 0;
+    server.stat_fork_time = 0;
+    server.stat_rejected_conn = 0;
+    server.stat_sync_full = 0;
+    server.stat_sync_partial_ok = 0;
+    server.stat_sync_partial_err = 0;
+    memset(server.ops_sec_samples,0,sizeof(server.ops_sec_samples));
+    server.ops_sec_idx = 0;
+    server.ops_sec_last_sample_time = mstime();
+    server.ops_sec_last_sample_ops = 0;
+}
+
 void initServer() {
     int j;
 
@@ -1971,6 +2072,7 @@ void initServer() {
     server.ready_keys = listCreate();
     server.clients_waiting_acks = listCreate();
     server.get_ack_from_slaves = 0;
+    server.clients_paused = 0;
 
     // 创建共享对象
     createSharedObjects();
@@ -1988,11 +2090,13 @@ void initServer() {
     // 打开 UNIX 本地端口
     if (server.unixsocket != NULL) {
         unlink(server.unixsocket); /* don't care if this fails */
-        server.sofd = anetUnixServer(server.neterr,server.unixsocket,server.unixsocketperm);
+        server.sofd = anetUnixServer(server.neterr,server.unixsocket,
+            server.unixsocketperm, server.tcp_backlog);
         if (server.sofd == ANET_ERR) {
             redisLog(REDIS_WARNING, "Opening socket: %s", server.neterr);
             exit(1);
         }
+        anetNonBlock(NULL,server.sofd);
     }
 
     /* Abort if there are no listening sockets at all. */
@@ -2009,6 +2113,7 @@ void initServer() {
         server.db[j].blocking_keys = dictCreate(&keylistDictType,NULL);
         server.db[j].ready_keys = dictCreate(&setDictType,NULL);
         server.db[j].watched_keys = dictCreate(&keylistDictType,NULL);
+        server.db[j].eviction_pool = evictionPoolAlloc();
         server.db[j].id = j;
         server.db[j].avg_ttl = 0;
     }
@@ -2029,27 +2134,16 @@ void initServer() {
     server.rdb_save_time_last = -1;
     server.rdb_save_time_start = -1;
     server.dirty = 0;
-    server.stat_numcommands = 0;
-    server.stat_numconnections = 0;
-    server.stat_expiredkeys = 0;
-    server.stat_evictedkeys = 0;
+    resetServerStats();
+    /* A few stats we don't want to reset: server startup time, and peak mem. */
     server.stat_starttime = time(NULL);
-    server.stat_keyspace_misses = 0;
-    server.stat_keyspace_hits = 0;
     server.stat_peak_memory = 0;
-    server.stat_fork_time = 0;
-    server.stat_rejected_conn = 0;
-    server.stat_sync_full = 0;
-    server.stat_sync_partial_ok = 0;
-    server.stat_sync_partial_err = 0;
-    memset(server.ops_sec_samples,0,sizeof(server.ops_sec_samples));
-    server.ops_sec_idx = 0;
-    server.ops_sec_last_sample_time = mstime();
-    server.ops_sec_last_sample_ops = 0;
-    server.unixtime = time(NULL);
-    server.mstime = mstime();
+    server.resident_set_size = 0;
     server.lastbgsave_status = REDIS_OK;
+    server.aof_last_write_status = REDIS_OK;
+    server.aof_last_write_errno = 0;
     server.repl_good_slaves_count = 0;
+    updateCachedTime();
 
     /* Create the serverCron() time event, that's our main way to process
      * background operations. */
@@ -2322,7 +2416,7 @@ void forceCommandPropagation(redisClient *c, int flags) {
 // 调用命令的实现函数，执行命令
 void call(redisClient *c, int flags) {
     // start 记录命令开始执行的时间
-    long long dirty, start = ustime(), duration;
+    long long dirty, start, duration;
     // 记录命令开始执行前的 FLAG
     int client_old_flags = c->flags;
 
@@ -2341,12 +2435,14 @@ void call(redisClient *c, int flags) {
     redisOpArrayInit(&server.also_propagate);
     // 保留旧 dirty 计数器值
     dirty = server.dirty;
+    // 计算命令开始执行的时间
+    start = ustime();
     // 执行实现函数
     c->cmd->proc(c);
-    // 计算命令执行之后的 dirty 值
-    dirty = server.dirty-dirty;
     // 计算命令执行耗费的时间
     duration = ustime()-start;
+    // 计算命令执行之后的 dirty 值
+    dirty = server.dirty-dirty;
 
     /* When EVAL is called loading the AOF we don't want commands called
      * from Lua to go into the slowlog or to populate statistics. */
@@ -2498,29 +2594,37 @@ int processCommand(redisClient *c) {
 
         // 集群已下线
         if (server.cluster->state != REDIS_CLUSTER_OK) {
+            flagTransaction(c);
             addReplySds(c,sdsnew("-CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information\r\n"));
             return REDIS_OK;
 
         // 集群运作正常
         } else {
-
-            // 记录转向是 ASK 还是 MOVED
-            int ask;
-
-            clusterNode *n = getNodeByQuery(c,c->cmd,c->argv,c->argc,&hashslot,&ask);
-
+            int error_code;
+            clusterNode *n = getNodeByQuery(c,c->cmd,c->argv,c->argc,&hashslot,&error_code);
             // 不能执行多键处理命令
             if (n == NULL) {
-                addReplyError(c,"Multi keys request invalid in cluster");
+                flagTransaction(c);
+                if (error_code == REDIS_CLUSTER_REDIR_CROSS_SLOT) {
+                    addReplySds(c,sdsnew("-CROSSSLOT Keys in request don't hash to the same slot\r\n"));
+                } else if (error_code == REDIS_CLUSTER_REDIR_UNSTABLE) {
+                    /* The request spawns mutliple keys in the same slot,
+                     * but the slot is not "stable" currently as there is
+                     * a migration or import in progress. */
+                    addReplySds(c,sdsnew("-TRYAGAIN Multiple keys request during rehashing of slot\r\n"));
+                } else {
+                    redisPanic("getNodeByQuery() unknown error.");
+                }
                 return REDIS_OK;
 
             // 命令针对的槽和键不是本节点处理的，进行转向
             } else if (n != server.cluster->myself) {
-
+                flagTransaction(c);
                 // -<ASK or MOVED> <slot> <ip>:<port>
                 // 例如 -ASK 10086 127.0.0.1:12345
                 addReplySds(c,sdscatprintf(sdsempty(),
-                    "-%s %d %s:%d\r\n", ask ? "ASK" : "MOVED",
+                    "-%s %d %s:%d\r\n",
+                    (error_code == REDIS_CLUSTER_REDIR_ASK) ? "ASK" : "MOVED",
                     hashslot,n->ip,n->port));
 
                 return REDIS_OK;
@@ -2554,15 +2658,22 @@ int processCommand(redisClient *c) {
      * and if this is a master instance. */
     // 如果这是一个主服务器，并且这个服务器之前执行 BGSAVE 时发生了错误
     // 那么不执行写命令
-    if (server.stop_writes_on_bgsave_err &&
-        server.saveparamslen > 0
-        && server.lastbgsave_status == REDIS_ERR &&
+    if (((server.stop_writes_on_bgsave_err &&
+          server.saveparamslen > 0 &&
+          server.lastbgsave_status == REDIS_ERR) ||
+          server.aof_last_write_status == REDIS_ERR) &&
         server.masterhost == NULL &&
         (c->cmd->flags & REDIS_CMD_WRITE ||
          c->cmd->proc == pingCommand))
     {
         flagTransaction(c);
-        addReply(c, shared.bgsaveerr);
+        if (server.aof_last_write_status == REDIS_OK)
+            addReply(c, shared.bgsaveerr);
+        else
+            addReplySds(c,
+                sdscatprintf(sdsempty(),
+                "-MISCONF Errors writing to the AOF file: %s\r\n",
+                strerror(server.aof_last_write_errno)));
         return REDIS_OK;
     }
 
@@ -2739,9 +2850,8 @@ int prepareForShutdown(int flags) {
     /* Close the listening sockets. Apparently this allows faster restarts. */
     // 关闭监听套接字，这样在重启的时候会快一点
     closeListeningSockets(1);
-
-    redisLog(REDIS_WARNING,"Redis is now ready to exit, bye bye...");
-
+    redisLog(REDIS_WARNING,"%s is now ready to exit, bye bye...",
+        server.sentinel_mode ? "Sentinel" : "Redis");
     return REDIS_OK;
 }
 
@@ -2864,7 +2974,8 @@ sds genRedisInfoString(char *section) {
 
     /* Server */
     if (allsections || defsections || !strcasecmp(section,"server")) {
-        struct utsname name;
+        static int call_uname = 1;
+        static struct utsname name;
         char *mode;
 
         if (server.cluster_enabled) mode = "cluster";
@@ -2872,7 +2983,13 @@ sds genRedisInfoString(char *section) {
         else mode = "standalone";
     
         if (sections++) info = sdscat(info,"\r\n");
-        uname(&name);
+
+        if (call_uname) {
+            /* Uname can be slow and is always the same output. Cache it. */
+            uname(&name);
+            call_uname = 0;
+        }
+
         info = sdscatprintf(info,
             "# Server\r\n"
             "redis_version:%s\r\n"
@@ -2933,8 +3050,16 @@ sds genRedisInfoString(char *section) {
     if (allsections || defsections || !strcasecmp(section,"memory")) {
         char hmem[64];
         char peak_hmem[64];
+        size_t zmalloc_used = zmalloc_used_memory();
 
-        bytesToHuman(hmem,zmalloc_used_memory());
+        /* Peak memory is updated from time to time by serverCron() so it
+         * may happen that the instantaneous value is slightly bigger than
+         * the peak value. This may confuse users, so we update the peak
+         * if found smaller than the current memory usage. */
+        if (zmalloc_used > server.stat_peak_memory)
+            server.stat_peak_memory = zmalloc_used;
+
+        bytesToHuman(hmem,zmalloc_used);
         bytesToHuman(peak_hmem,server.stat_peak_memory);
         if (sections++) info = sdscat(info,"\r\n");
         info = sdscatprintf(info,
@@ -2947,13 +3072,13 @@ sds genRedisInfoString(char *section) {
             "used_memory_lua:%lld\r\n"
             "mem_fragmentation_ratio:%.2f\r\n"
             "mem_allocator:%s\r\n",
-            zmalloc_used_memory(),
+            zmalloc_used,
             hmem,
-            zmalloc_get_rss(),
+            server.resident_set_size,
             server.stat_peak_memory,
             peak_hmem,
             ((long long)lua_gc(server.lua,LUA_GCCOUNT,0))*1024LL,
-            zmalloc_get_fragmentation_ratio(),
+            zmalloc_get_fragmentation_ratio(server.resident_set_size),
             ZMALLOC_LIB
             );
     }
@@ -2975,7 +3100,8 @@ sds genRedisInfoString(char *section) {
             "aof_rewrite_scheduled:%d\r\n"
             "aof_last_rewrite_time_sec:%jd\r\n"
             "aof_current_rewrite_time_sec:%jd\r\n"
-            "aof_last_bgrewrite_status:%s\r\n",
+            "aof_last_bgrewrite_status:%s\r\n"
+            "aof_last_write_status:%s\r\n",
             server.loading,
             server.dirty,
             server.rdb_child_pid != -1,
@@ -2990,7 +3116,8 @@ sds genRedisInfoString(char *section) {
             (intmax_t)server.aof_rewrite_time_last,
             (intmax_t)((server.aof_child_pid == -1) ?
                 -1 : time(NULL)-server.aof_rewrite_time_start),
-            (server.aof_lastbgrewrite_status == REDIS_OK) ? "ok" : "err");
+            (server.aof_lastbgrewrite_status == REDIS_OK) ? "ok" : "err",
+            (server.aof_last_write_status == REDIS_OK) ? "ok" : "err");
 
         if (server.aof_state != REDIS_AOF_OFF) {
             info = sdscatprintf(info,
@@ -3288,8 +3415,9 @@ void monitorCommand(redisClient *c) {
 
 /* ============================ Maxmemory directive  ======================== */
 
-/* This function gets called when 'maxmemory' is set on the config file to limit
- * the max memory used by the server, before processing a command.
+/* freeMemoryIfNeeded() gets called when 'maxmemory' is set on the config
+ * file to limit the max memory used by the server, before processing a
+ * command.
  *
  * 此函数在 maxmemory 选项被打开，并且内存超出限制时调用。
  *
@@ -3312,7 +3440,123 @@ void monitorCommand(redisClient *c) {
  *
  * 如果成功释放了所需数量的内存，那么函数返回 REDIS_OK ，否则函数将返回 REDIS_ERR ，
  * 并阻止执行新的命令。
- */
+ *
+ * ------------------------------------------------------------------------
+ *
+ * LRU approximation algorithm
+ *
+ * Redis uses an approximation of the LRU algorithm that runs in constant
+ * memory. Every time there is a key to expire, we sample N keys (with
+ * N very small, usually in around 5) to populate a pool of best keys to
+ * evict of M keys (the pool size is defined by REDIS_EVICTION_POOL_SIZE).
+ *
+ * The N keys sampled are added in the pool of good keys to expire (the one
+ * with an old access time) if they are better than one of the current keys
+ * in the pool.
+ *
+ * After the pool is populated, the best key we have in the pool is expired.
+ * However note that we don't remove keys from the pool when they are deleted
+ * so the pool may contain keys that no longer exist.
+ *
+ * When we try to evict a key, and all the entries in the pool don't exist
+ * we populate it again. This time we'll be sure that the pool has at least
+ * one key that can be evicted, if there is at least one key that can be
+ * evicted in the whole database. */
+
+/* Create a new eviction pool. */
+struct evictionPoolEntry *evictionPoolAlloc(void) {
+    struct evictionPoolEntry *ep;
+    int j;
+
+    ep = zmalloc(sizeof(*ep)*REDIS_EVICTION_POOL_SIZE);
+    for (j = 0; j < REDIS_EVICTION_POOL_SIZE; j++) {
+        ep[j].idle = 0;
+        ep[j].key = NULL;
+    }
+    return ep;
+}
+
+/* This is an helper function for freeMemoryIfNeeded(), it is used in order
+ * to populate the evictionPool with a few entries every time we want to
+ * expire a key. Keys with idle time smaller than one of the current
+ * keys are added. Keys are always added if there are free entries.
+ *
+ * We insert keys on place in ascending order, so keys with the smaller
+ * idle time are on the left, and keys with the higher idle time on the
+ * right. */
+
+#define EVICTION_SAMPLES_ARRAY_SIZE 16
+void evictionPoolPopulate(dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
+    int j, k, count;
+    dictEntry *_samples[EVICTION_SAMPLES_ARRAY_SIZE];
+    dictEntry **samples;
+
+    /* Try to use a static buffer: this function is a big hit...
+     * Note: it was actually measured that this helps. */
+    if (server.maxmemory_samples <= EVICTION_SAMPLES_ARRAY_SIZE) {
+        samples = _samples;
+    } else {
+        samples = zmalloc(sizeof(samples[0])*server.maxmemory_samples);
+    }
+
+#if 1 /* Use bulk get by default. */
+    count = dictGetRandomKeys(sampledict,samples,server.maxmemory_samples);
+#else
+    count = server.maxmemory_samples;
+    for (j = 0; j < count; j++) samples[j] = dictGetRandomKey(sampledict);
+#endif
+
+    for (j = 0; j < count; j++) {
+        unsigned long long idle;
+        sds key;
+        robj *o;
+        dictEntry *de;
+
+        de = samples[j];
+        key = dictGetKey(de);
+        /* If the dictionary we are sampling from is not the main
+         * dictionary (but the expires one) we need to lookup the key
+         * again in the key dictionary to obtain the value object. */
+        if (sampledict != keydict) de = dictFind(keydict, key);
+        o = dictGetVal(de);
+        idle = estimateObjectIdleTime(o);
+
+        /* Insert the element inside the pool.
+         * First, find the first empty bucket or the first populated
+         * bucket that has an idle time smaller than our idle time. */
+        k = 0;
+        while (k < REDIS_EVICTION_POOL_SIZE &&
+               pool[k].key &&
+               pool[k].idle < idle) k++;
+        if (k == 0 && pool[REDIS_EVICTION_POOL_SIZE-1].key != NULL) {
+            /* Can't insert if the element is < the worst element we have
+             * and there are no empty buckets. */
+            continue;
+        } else if (k < REDIS_EVICTION_POOL_SIZE && pool[k].key == NULL) {
+            /* Inserting into empty position. No setup needed before insert. */
+        } else {
+            /* Inserting in the middle. Now k points to the first element
+             * greater than the element to insert.  */
+            if (pool[REDIS_EVICTION_POOL_SIZE-1].key == NULL) {
+                /* Free space on the right? Insert at k shifting
+                 * all the elements from k to end to the right. */
+                memmove(pool+k+1,pool+k,
+                    sizeof(pool[0])*(REDIS_EVICTION_POOL_SIZE-k-1));
+            } else {
+                /* No free space on right? Insert at k-1 */
+                k--;
+                /* Shift all elements on the left of k (included) to the
+                 * left, so we discard the element with smaller idle time. */
+                sdsfree(pool[0].key);
+                memmove(pool,pool+1,sizeof(pool[0])*k);
+            }
+        }
+        pool[k].key = sdsdup(key);
+        pool[k].idle = idle;
+    }
+    if (samples != _samples) zfree(samples);
+}
+
 int freeMemoryIfNeeded(void) {
     size_t mem_used, mem_tofree, mem_freed;
     int slaves = listLength(server.slaves);
@@ -3366,7 +3610,7 @@ int freeMemoryIfNeeded(void) {
         for (j = 0; j < server.dbnum; j++) {
             long bestval = 0; /* just to prevent warning */
             sds bestkey = NULL;
-            struct dictEntry *de;
+            dictEntry *de;
             redisDb *db = server.db+j;
             dict *dict;
 
@@ -3400,33 +3644,34 @@ int freeMemoryIfNeeded(void) {
             else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                 server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
             {
-                // 遍历 maxmemory_samples 个键
-                for (k = 0; k < server.maxmemory_samples; k++) {
-                    sds thiskey;
-                    long thisval;
-                    robj *o;
-
-                    // 随机选出一个带过期时间的键
-                    de = dictGetRandomKey(dict);
-                    thiskey = dictGetKey(de);
-
-                    /* When policy is volatile-lru we need an additional lookup
-                     * to locate the real key, as dict is set to db->expires. */
-                    // volatile-lru 策略需要取出和过期时间关联的数据库键
-                    if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
-                        de = dictFind(db->dict, thiskey);
-
-                    // 取出值对象
-                    o = dictGetVal(de);
-                    // 计算值对象的 IDLE 时间
-                    thisval = estimateObjectIdleTime(o);
-
-                    /* Higher idle time is better candidate for deletion */
-                    // 如果这个值对象的 IDLE 时间比之前的对象的 IDLE 时间长
-                    // 那么将当前的值对象设置为目标键
-                    if (bestkey == NULL || thisval > bestval) {
-                        bestkey = thiskey;
-                        bestval = thisval;
+                struct evictionPoolEntry *pool = db->eviction_pool;
+
+                while(bestkey == NULL) {
+                    evictionPoolPopulate(dict, db->dict, db->eviction_pool);
+                    /* Go backward from best to worst element to evict. */
+                    for (k = REDIS_EVICTION_POOL_SIZE-1; k >= 0; k--) {
+                        if (pool[k].key == NULL) continue;
+                        de = dictFind(dict,pool[k].key);
+
+                        /* Remove the entry from the pool. */
+                        sdsfree(pool[k].key);
+                        /* Shift all elements on its right to left. */
+                        memmove(pool+k,pool+k+1,
+                            sizeof(pool[0])*(REDIS_EVICTION_POOL_SIZE-k-1));
+                        /* Clear the element on the right which is empty
+                         * since we shifted one position to the left.  */
+                        pool[REDIS_EVICTION_POOL_SIZE-1].key = NULL;
+                        pool[REDIS_EVICTION_POOL_SIZE-1].idle = 0;
+
+                        /* If the key exists, is our pick. Otherwise it is
+                         * a ghost and we need to try the next element. */
+                        if (de) {
+                            bestkey = dictGetKey(de);
+                            break;
+                        } else {
+                            /* Ghost... */
+                            continue;
+                        }
                     }
                 }
             }
@@ -3670,10 +3915,15 @@ void redisOutOfMemoryHandler(size_t allocation_size) {
 
 void redisSetProcTitle(char *title) {
 #ifdef USE_SETPROCTITLE
-    setproctitle("%s %s:%d",
+    char *server_mode = "";
+    if (server.cluster_enabled) server_mode = " [cluster]";
+    else if (server.sentinel_mode) server_mode = " [sentinel]";
+
+    setproctitle("%s %s:%d%s",
         title,
         server.bindaddr_count ? server.bindaddr[0] : "*",
-        server.port);
+        server.port,
+        server_mode);
 #else
     REDIS_NOTUSED(title);
 #endif
@@ -3758,7 +4008,7 @@ int main(int argc, char **argv) {
             }
             j++;
         }
-
+        if (configfile) server.configfile = getAbsolutePath(configfile);
         // 重置保存条件
         resetServerSaveParams();
 
diff --git a/src/redis.h b/src/redis.h
index a72aa4fbb..a393fd9e7 100644
--- a/src/redis.h
+++ b/src/redis.h
@@ -71,6 +71,7 @@
 #define REDIS_MIN_HZ            1
 #define REDIS_MAX_HZ            500 
 #define REDIS_SERVERPORT        6379    /* TCP port */
+#define REDIS_TCP_BACKLOG       511     /* TCP listen backlog */
 #define REDIS_MAXIDLETIME       0       /* default client timeout: infinite */
 #define REDIS_DEFAULT_DBNUM     16
 #define REDIS_CONFIGLINE_MAX    1024
@@ -112,7 +113,7 @@
 #define REDIS_DEFAULT_SLAVE_READ_ONLY 1
 #define REDIS_DEFAULT_REPL_DISABLE_TCP_NODELAY 0
 #define REDIS_DEFAULT_MAXMEMORY 0
-#define REDIS_DEFAULT_MAXMEMORY_SAMPLES 3
+#define REDIS_DEFAULT_MAXMEMORY_SAMPLES 5
 #define REDIS_DEFAULT_AOF_FILENAME "appendonly.aof"
 #define REDIS_DEFAULT_AOF_NO_FSYNC_ON_REWRITE 0
 #define REDIS_DEFAULT_ACTIVE_REHASHING 1
@@ -122,6 +123,7 @@
 #define REDIS_IP_STR_LEN INET6_ADDRSTRLEN
 #define REDIS_PEER_ID_LEN (REDIS_IP_STR_LEN+32) /* Must be enough for ip:port */
 #define REDIS_BINDADDR_MAX 16
+#define REDIS_MIN_RESERVED_FDS 32
 
 #define ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 20 /* Loopkups per loop. */
 #define ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1000 /* Microseconds */
@@ -140,9 +142,9 @@
 // 就执行一次显式的 fsync
 #define REDIS_AOF_AUTOSYNC_BYTES (1024*1024*32) /* fdatasync every 32MB */
 /* When configuring the Redis eventloop, we setup it so that the total number
- * of file descriptors we can handle are server.maxclients + FDSET_INCR
+ * of file descriptors we can handle are server.maxclients + RESERVED_FDS + FDSET_INCR
  * that is our safety margin. */
-#define REDIS_EVENTLOOP_FDSET_INCR 128
+#define REDIS_EVENTLOOP_FDSET_INCR (REDIS_MIN_RESERVED_FDS+96)
 
 /* Hash table parameters */
 #define REDIS_HT_MINFILL        10      /* Minimal hash table fill 10% */
@@ -312,6 +314,9 @@
 #define REDIS_ZSET_MAX_ZIPLIST_ENTRIES 128
 #define REDIS_ZSET_MAX_ZIPLIST_VALUE 64
 
+/* HyperLogLog defines */
+#define REDIS_DEFAULT_HLL_SPARSE_MAX_BYTES 3000
+
 /* Sets operations codes */
 #define REDIS_OP_UNION 0
 #define REDIS_OP_DIFF 1
@@ -324,7 +329,7 @@
 #define REDIS_MAXMEMORY_ALLKEYS_LRU 3
 #define REDIS_MAXMEMORY_ALLKEYS_RANDOM 4
 #define REDIS_MAXMEMORY_NO_EVICTION 5
-#define REDIS_DEFAULT_MAXMEMORY_POLICY REDIS_MAXMEMORY_VOLATILE_LRU
+#define REDIS_DEFAULT_MAXMEMORY_POLICY REDIS_MAXMEMORY_NO_EVICTION
 
 /* Scripting */
 #define REDIS_LUA_TIME_LIMIT 5000 /* milliseconds */
@@ -383,24 +388,22 @@ typedef long long mstime_t; /* millisecond time type. */
 /* A redis object, that is a type able to hold a string / list / set */
 
 /* The actual Redis Object */
-#define REDIS_LRU_CLOCK_MAX ((1<<21)-1) /* Max value of obj->lru */
-#define REDIS_LRU_CLOCK_RESOLUTION 10 /* LRU clock resolution in seconds */
 /*
  * Redis 对象
  */
+#define REDIS_LRU_BITS 24
+#define REDIS_LRU_CLOCK_MAX ((1<<REDIS_LRU_BITS)-1) /* Max value of obj->lru */
+#define REDIS_LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */
 typedef struct redisObject {
 
     // 类型
     unsigned type:4;
 
-    // 对齐位，不使用
-    unsigned notused:2;     /* Not used */
-
     // 编码
     unsigned encoding:4;
 
     // 对象最后一次被访问的时间
-    unsigned lru:22;        /* lru time (relative to server.lruclock) */
+    unsigned lru:REDIS_LRU_BITS; /* lru time (relative to server.lruclock) */
 
     // 引用计数
     int refcount;
@@ -410,6 +413,12 @@ typedef struct redisObject {
 
 } robj;
 
+/* Macro used to obtain the current LRU clock.
+ * If the current resolution is lower than the frequency we refresh the
+ * LRU clock (as it should be in production servers) we return the
+ * precomputed value, otherwise we need to resort to a function call. */
+#define LRU_CLOCK() ((1000/server.hz <= REDIS_LRU_CLOCK_RESOLUTION) ? server.lruclock : getLRUClock())
+
 /* Macro used to initialize a Redis object allocated on the stack.
  * Note that this macro is taken near the structure definition to make sure
  * we'll update it when the structure is changed, to avoid bugs like
@@ -421,6 +430,22 @@ typedef struct redisObject {
     _var.ptr = _ptr; \
 } while(0);
 
+/* To improve the quality of the LRU approximation we take a set of keys
+ * that are good candidate for eviction across freeMemoryIfNeeded() calls.
+ *
+ * Entries inside the eviciton pool are taken ordered by idle time, putting
+ * greater idle times to the right (ascending order).
+ *
+ * Empty entries have the key pointer set to NULL. */
+#define REDIS_EVICTION_POOL_SIZE 16
+struct evictionPoolEntry {
+    unsigned long long idle;    /* Object idle time. */
+    sds key;                    /* Key name. */
+};
+
+/* Redis database representation. There are multiple databases identified
+ * by integers from 0 (the default database) up to the max configured
+ * database. The database number is the 'id' field in the structure. */
 typedef struct redisDb {
 
     // 数据库键空间，保存着数据库中的所有键值对
@@ -438,8 +463,10 @@ typedef struct redisDb {
     // 正在被 WATCH 命令监视的键
     dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
 
+    struct evictionPoolEntry *eviction_pool;    /* Eviction pool of keys */
+
     // 数据库号码
-    int id;
+    int id;                     /* Database ID */
 
     // 数据库的键的平均 TTL ，统计信息
     long long avg_ttl;          /* Average TTL, just for stats */
@@ -628,6 +655,7 @@ typedef struct redisClient {
     // 记录了所有订阅频道的客户端的信息
     // 新 pubsubPattern 结构总是被添加到表尾
     list *pubsub_patterns;  /* patterns a client is interested in (SUBSCRIBE) */
+    sds peerid;             /* Cached peer ID. */
 
     /* Response buffer */
     // 回复偏移量
@@ -654,9 +682,9 @@ struct sharedObjectsStruct {
     *emptymultibulk, *wrongtypeerr, *nokeyerr, *syntaxerr, *sameobjecterr,
     *outofrangeerr, *noscripterr, *loadingerr, *slowscripterr, *bgsaveerr,
     *masterdownerr, *roslaveerr, *execaborterr, *noautherr, *noreplicaserr,
-    *oomerr, *plus, *messagebulk, *pmessagebulk, *subscribebulk,
+    *busykeyerr, *oomerr, *plus, *messagebulk, *pmessagebulk, *subscribebulk,
     *unsubscribebulk, *psubscribebulk, *punsubscribebulk, *del, *rpop, *lpop,
-    *lpush, *emptyscan,
+    *lpush, *emptyscan, *minstring, *maxstring,
     *select[REDIS_SHARED_SELECT_CMDS],
     *integers[REDIS_SHARED_INTEGERS],
     *mbulkhdr[REDIS_SHARED_BULKHDR_LEN], /* "*<value>\r\n" */
@@ -800,8 +828,7 @@ struct redisServer {
     aeEventLoop *el;
 
     // 最近一次使用时钟
-    unsigned lruclock:22;       /* Clock incrementing every minute, for LRU */
-    unsigned lruclock_padding:10;
+    unsigned lruclock:REDIS_LRU_BITS; /* Clock for LRU eviction */
 
     // 关闭服务器的标识
     int shutdown_asap;          /* SHUTDOWN needed ASAP */
@@ -833,6 +860,8 @@ struct redisServer {
     // TCP 监听端口
     int port;                   /* TCP listening port */
 
+    int tcp_backlog;            /* TCP listen() backlog */
+
     // 地址
     char *bindaddr[REDIS_BINDADDR_MAX]; /* Addresses we should bind to */
     // 地址数量
@@ -864,6 +893,9 @@ struct redisServer {
     // 服务器的当前客户端，仅用于崩溃报告
     redisClient *current_client; /* Current client, only used on crash report */
 
+    int clients_paused;         /* True if clients are currently paused */
+    mstime_t clients_pause_end_time; /* Time when we undo clients_paused */
+
     // 网络错误
     char neterr[ANET_ERR_LEN];   /* Error buffer for anet.c */
 
@@ -947,8 +979,7 @@ struct redisServer {
 
     // 服务器配置 slowlog-max-len 选项的值
     unsigned long slowlog_max_len;     /* SLOWLOG max number of items logged */
-
-
+    size_t resident_set_size;       /* RSS sampled in serverCron(). */
     /* The following two are used to track instantaneous "load" in terms
      * of operations per second. */
     // 最后一次进行抽样的时间
@@ -1033,8 +1064,8 @@ struct redisServer {
 
     // 指示是否需要每写入一定量的数据，就主动执行一次 fsync()
     int aof_rewrite_incremental_fsync;/* fsync incrementally while rewriting? */
-
-
+    int aof_last_write_status;      /* REDIS_OK or REDIS_ERR */
+    int aof_last_write_errno;       /* Valid if aof_last_write_status is ERR */
     /* RDB persistence */
 
     // 自从上次 SAVE 执行以来，数据库被修改的次数
@@ -1173,7 +1204,7 @@ struct redisServer {
     list *clients_waiting_acks;         /* Clients waiting in WAIT command. */
     int get_ack_from_slaves;            /* If true we send REPLCONF GETACK. */
     /* Limits */
-    unsigned int maxclients;        /* Max number of simultaneous clients */
+    int maxclients;                 /* Max number of simultaneous clients */
     unsigned long long maxmemory;   /* Max number of memory bytes to use */
     int maxmemory_policy;           /* Policy for key eviction */
     int maxmemory_samples;          /* Pricision of random sampling */
@@ -1201,7 +1232,7 @@ struct redisServer {
     size_t set_max_intset_entries;
     size_t zset_max_ziplist_entries;
     size_t zset_max_ziplist_value;
-
+    size_t hll_sparse_max_bytes;
     time_t unixtime;        /* Unix time sampled every cron cycle. */
     long long mstime;       /* Like 'unixtime' but with milliseconds resolution. */
 
@@ -1226,7 +1257,7 @@ struct redisServer {
     char *cluster_configfile; /* Cluster auto-generated config file name. */
     struct clusterState *cluster;  /* State of the cluster */
 
-
+    int cluster_migration_barrier; /* Cluster replicas migration barrier. */
     /* Scripting */
 
     // Lua 环境
@@ -1240,12 +1271,10 @@ struct redisServer {
 
     // 一个字典，值为 Lua 脚本，键为脚本的 SHA1 校验和
     dict *lua_scripts;         /* A dictionary of SHA1 -> Lua scripts */
-
     // Lua 脚本的执行时限
-    long long lua_time_limit;  /* Script timeout in seconds */
-
+    mstime_t lua_time_limit;  /* Script timeout in milliseconds */
     // 脚本开始执行的时间
-    long long lua_time_start;  /* Start time of script */
+    mstime_t lua_time_start;  /* Start time of script, milliseconds time */
 
     // 脚本是否执行过写命令
     int lua_write_dirty;  /* True if a write command was called during the
@@ -1286,7 +1315,7 @@ typedef struct pubsubPattern {
 } pubsubPattern;
 
 typedef void redisCommandProc(redisClient *c);
-typedef int *redisGetKeysProc(struct redisCommand *cmd, robj **argv, int argc, int *numkeys, int flags);
+typedef int *redisGetKeysProc(struct redisCommand *cmd, robj **argv, int argc, int *numkeys);
 
 /*
  * Redis 命令
@@ -1511,8 +1540,8 @@ void *dupClientReplyValue(void *o);
 void getClientsMaxBuffers(unsigned long *longest_output_list,
                           unsigned long *biggest_input_buffer);
 void formatPeerId(char *peerid, size_t peerid_len, char *ip, int port);
-int getClientPeerId(redisClient *client, char *peerid, size_t peerid_len);
-sds getClientInfoString(redisClient *client);
+char *getClientPeerId(redisClient *client);
+sds catClientInfoString(sds s, redisClient *client);
 sds getAllClientsInfoString(void);
 void rewriteClientCommandVector(redisClient *c, int argc, ...);
 void rewriteClientCommandArgument(redisClient *c, int i, robj *newval);
@@ -1524,6 +1553,9 @@ char *getClientLimitClassName(int class);
 void flushSlavesOutputBuffers(void);
 void disconnectSlaves(void);
 int listenToPort(int port, int *fds, int *count);
+void pauseClients(mstime_t duration);
+int clientsArePaused(void);
+int processEventsWhileBlocked(void);
 
 #ifdef __GNUC__
 void addReplyErrorFormat(redisClient *c, const char *fmt, ...)
@@ -1601,7 +1633,7 @@ char *strEncoding(int encoding);
 int compareStringObjects(robj *a, robj *b);
 int collateStringObjects(robj *a, robj *b);
 int equalStringObjects(robj *a, robj *b);
-unsigned long estimateObjectIdleTime(robj *o);
+unsigned long long estimateObjectIdleTime(robj *o);
 #define sdsEncodedObject(objptr) (objptr->encoding == REDIS_ENCODING_RAW || objptr->encoding == REDIS_ENCODING_EMBSTR)
 
 /* Synchronous I/O with timeout */
@@ -1628,6 +1660,7 @@ void processClientsWaitingReplicas(void);
 void unblockClientWaitingReplicas(redisClient *c);
 int replicationCountAcksByOffset(long long offset);
 void replicationSendNewlineToMaster(void);
+long long replicationGetSlaveOffset(void);
 
 /* Generic persistence functions */
 void startLoading(FILE *fp);
@@ -1651,10 +1684,8 @@ unsigned long aofRewriteBufferSize(void);
 
 /* Sorted sets data type */
 
-/* Struct to hold a inclusive/exclusive range spec.
- *
- * 表示开区间/闭区间范围的结构
- */
+/* Struct to hold a inclusive/exclusive range spec by score comparison. */
+// 表示开区间/闭区间范围的结构
 typedef struct {
 
     // 最小值和最大值
@@ -1665,13 +1696,19 @@ typedef struct {
     int minex, maxex; /* are min or max exclusive? */
 } zrangespec;
 
+/* Struct to hold an inclusive/exclusive range spec by lexicographic comparison. */
+typedef struct {
+    robj *min, *max;  /* May be set to shared.(minstring|maxstring) */
+    int minex, maxex; /* are min or max exclusive? */
+} zlexrangespec;
+
 zskiplist *zslCreate(void);
 void zslFree(zskiplist *zsl);
 zskiplistNode *zslInsert(zskiplist *zsl, double score, robj *obj);
 unsigned char *zzlInsert(unsigned char *zl, robj *ele, double score);
 int zslDelete(zskiplist *zsl, double score, robj *obj);
-zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec range);
-zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec range);
+zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec *range);
+zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec *range);
 double zzlGetScore(unsigned char *sptr);
 void zzlNext(unsigned char *zl, unsigned char **eptr, unsigned char **sptr);
 void zzlPrev(unsigned char *zl, unsigned char **eptr, unsigned char **sptr);
@@ -1707,6 +1744,9 @@ void populateCommandTable(void);
 void resetCommandTableStats(void);
 void adjustOpenFilesLimit(void);
 void closeListeningSockets(int unlink_unix_socket);
+void updateCachedTime(void);
+void resetServerStats(void);
+unsigned int getLRUClock(void);
 
 /* Set data type */
 robj *setTypeCreate(robj *value);
@@ -1778,24 +1818,24 @@ void setKey(redisDb *db, robj *key, robj *val);
 int dbExists(redisDb *db, robj *key);
 robj *dbRandomKey(redisDb *db);
 int dbDelete(redisDb *db, robj *key);
+robj *dbUnshareStringValue(redisDb *db, robj *key, robj *o);
 long long emptyDb(void(callback)(void*));
 int selectDb(redisClient *c, int id);
 void signalModifiedKey(redisDb *db, robj *key);
 void signalFlushedDb(int dbid);
 unsigned int getKeysInSlot(unsigned int hashslot, robj **keys, unsigned int count);
 unsigned int countKeysInSlot(unsigned int hashslot);
+unsigned int delKeysInSlot(unsigned int hashslot);
 int verifyClusterConfigWithData(void);
 void scanGenericCommand(redisClient *c, robj *o, unsigned long cursor);
 int parseScanCursorOrReply(redisClient *c, robj *o, unsigned long *cursor);
 
 /* API to get key arguments from commands */
-#define REDIS_GETKEYS_ALL 0
-#define REDIS_GETKEYS_PRELOAD 1
-int *getKeysFromCommand(struct redisCommand *cmd, robj **argv, int argc, int *numkeys, int flags);
+int *getKeysFromCommand(struct redisCommand *cmd, robj **argv, int argc, int *numkeys);
 void getKeysFreeResult(int *result);
-int *noPreloadGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *numkeys, int flags);
-int *renameGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *numkeys, int flags);
-int *zunionInterGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *numkeys, int flags);
+int *zunionInterGetKeys(struct redisCommand *cmd,robj **argv, int argc, int *numkeys);
+int *evalGetKeys(struct redisCommand *cmd, robj **argv, int argc, int *numkeys);
+int *sortGetKeys(struct redisCommand *cmd, robj **argv, int argc, int *numkeys);
 
 /* Cluster */
 void clusterInit(void);
@@ -1914,12 +1954,16 @@ void zincrbyCommand(redisClient *c);
 void zrangeCommand(redisClient *c);
 void zrangebyscoreCommand(redisClient *c);
 void zrevrangebyscoreCommand(redisClient *c);
+void zrangebylexCommand(redisClient *c);
+void zrevrangebylexCommand(redisClient *c);
 void zcountCommand(redisClient *c);
+void zlexcountCommand(redisClient *c);
 void zrevrangeCommand(redisClient *c);
 void zcardCommand(redisClient *c);
 void zremCommand(redisClient *c);
 void zscoreCommand(redisClient *c);
 void zremrangebyscoreCommand(redisClient *c);
+void zremrangebylexCommand(redisClient *c);
 void multiCommand(redisClient *c);
 void execCommand(redisClient *c);
 void discardCommand(redisClient *c);
@@ -1972,8 +2016,14 @@ void scriptCommand(redisClient *c);
 void timeCommand(redisClient *c);
 void bitopCommand(redisClient *c);
 void bitcountCommand(redisClient *c);
+void bitposCommand(redisClient *c);
 void replconfCommand(redisClient *c);
 void waitCommand(redisClient *c);
+void pfselftestCommand(redisClient *c);
+void pfaddCommand(redisClient *c);
+void pfcountCommand(redisClient *c);
+void pfmergeCommand(redisClient *c);
+void pfdebugCommand(redisClient *c);
 
 #if defined(__GNUC__)
 void *calloc(size_t count, size_t size) __attribute__ ((deprecated));
diff --git a/src/replication.c b/src/replication.c
index 9507784de..b3a4294f2 100644
--- a/src/replication.c
+++ b/src/replication.c
@@ -312,7 +312,6 @@ void replicationFeedMonitors(redisClient *c, list *monitors, int dictid, robj **
     int j;
     sds cmdrepr = sdsnew("+");
     robj *cmdobj;
-    char peerid[REDIS_PEER_ID_LEN];
     struct timeval tv;
 
     // 获取时间戳
@@ -323,8 +322,7 @@ void replicationFeedMonitors(redisClient *c, list *monitors, int dictid, robj **
     } else if (c->flags & REDIS_UNIX_SOCKET) {
         cmdrepr = sdscatprintf(cmdrepr,"[%d unix:%s] ",dictid,server.unixsocket);
     } else {
-        getClientPeerId(c,peerid,sizeof(peerid));
-        cmdrepr = sdscatprintf(cmdrepr,"[%d %s] ",dictid,peerid);
+        cmdrepr = sdscatprintf(cmdrepr,"[%d %s] ",dictid,getClientPeerId(c));
     }
 
     // 获取命令和参数
@@ -774,9 +772,11 @@ void sendBulkToSlave(aeEventLoop *el, int fd, void *privdata, int mask) {
     }
     // 写入数据到 slave
     if ((nwritten = write(fd,buf,buflen)) == -1) {
-        redisLog(REDIS_VERBOSE,"Write error sending DB to slave: %s",
-            strerror(errno));
-        freeClient(slave);
+        if (errno != EAGAIN) {
+            redisLog(REDIS_WARNING,"Write error sending DB to slave: %s",
+                strerror(errno));
+            freeClient(slave);
+        }
         return;
     }
 
@@ -798,6 +798,7 @@ void sendBulkToSlave(aeEventLoop *el, int fd, void *privdata, int mask) {
         // 将保存并发送 RDB 期间的回复全部发送给从服务器
         if (aeCreateFileEvent(server.el, slave->fd, AE_WRITABLE,
             sendReplyToClient, slave) == AE_ERR) {
+            redisLog(REDIS_WARNING,"Unable to register writable event for slave bulk transfer: %s", strerror(errno));
             freeClient(slave);
             return;
         }
@@ -1781,6 +1782,12 @@ void replicationCacheMaster(redisClient *c) {
     /* Set fd to -1 so that we can safely call freeClient(c) later. */
     c->fd = -1;
 
+    /* Invalidate the Peer ID cache. */
+    if (c->peerid) {
+        sdsfree(c->peerid);
+        c->peerid = NULL;
+    }
+
     /* Caching the master happens instead of the actual freeClient() call,
      * so make sure to adjust the replication state. This function will
      * also set server.master to NULL. */
@@ -2155,6 +2162,26 @@ void processClientsWaitingReplicas(void) {
     }
 }
 
+/* Return the slave replication offset for this instance, that is
+ * the offset for which we already processed the master replication stream. */
+long long replicationGetSlaveOffset(void) {
+    long long offset = 0;
+
+    if (server.masterhost != NULL) {
+        if (server.master) {
+            offset = server.master->reploff;
+        } else if (server.cached_master) {
+            offset = server.cached_master->reploff;
+        }
+    }
+    /* offset may be -1 when the master does not support it at all, however
+     * this function is designed to return an offset that can express the
+     * amount of data processed by the master, so we return a positive
+     * integer. */
+    if (offset < 0) offset = 0;
+    return offset;
+}
+
 /* --------------------------- REPLICATION CRON  ---------------------------- */
 
 /* Replication cron funciton, called 1 time per second. */
diff --git a/src/scripting.c b/src/scripting.c
index bddba7954..809fc2db2 100644
--- a/src/scripting.c
+++ b/src/scripting.c
@@ -230,14 +230,20 @@ void luaSortArray(lua_State *lua) {
  *
  * 执行出错时返回 1 
  */
+#define LUA_CMD_OBJCACHE_SIZE 32
+#define LUA_CMD_OBJCACHE_MAX_LEN 64
 int luaRedisGenericCommand(lua_State *lua, int raise_error) {
     int j, argc = lua_gettop(lua);
     struct redisCommand *cmd;
-    robj **argv;
-    // 指向伪客户端
     redisClient *c = server.lua_client;
     sds reply;
 
+    /* Cached across calls. */
+    static robj **argv = NULL;
+    static int argv_size = 0;
+    static robj *cached_objects[LUA_CMD_OBJCACHE_SIZE];
+    static int cached_objects_len[LUA_CMD_OBJCACHE_SIZE];
+
     /* Require at least one argument */
     // 命令参数检查（命令本身是 argv[0]）
     if (argc == 0) {
@@ -248,11 +254,33 @@ int luaRedisGenericCommand(lua_State *lua, int raise_error) {
 
     /* Build the arguments vector */
     // 构建参数数组
-    argv = zmalloc(sizeof(robj*)*argc);
+    if (!argv) {
+        argv = zmalloc(sizeof(robj*)*argc);
+    } else if (argv_size < argc) {
+        argv = zrealloc(argv,sizeof(robj*)*argc);
+        argv_size = argc;
+    }
+
     for (j = 0; j < argc; j++) {
-        if (!lua_isstring(lua,j+1)) break;
-        argv[j] = createStringObject((char*)lua_tostring(lua,j+1),
-                                     lua_strlen(lua,j+1));
+        char *obj_s;
+        size_t obj_len;
+
+        obj_s = (char*)lua_tolstring(lua,j+1,&obj_len);
+        if (obj_s == NULL) break; /* Not a string. */
+
+        /* Try to use a cached object. */
+        if (cached_objects[j] && cached_objects_len[j] >= obj_len) {
+            char *s = cached_objects[j]->ptr;
+            struct sdshdr *sh = (void*)(s-(sizeof(struct sdshdr)));
+
+            argv[j] = cached_objects[j];
+            cached_objects[j] = NULL;
+            memcpy(s,obj_s,obj_len+1);
+            sh->free += sh->len - obj_len;
+            sh->len = obj_len;
+        } else {
+            argv[j] = createStringObject(obj_s, obj_len);
+        }
     }
     
     /* Check if one of the arguments passed by the Lua script
@@ -267,7 +295,6 @@ int luaRedisGenericCommand(lua_State *lua, int raise_error) {
             decrRefCount(argv[j]);
             j--;
         }
-        zfree(argv);
         luaPushError(lua,
             "Lua redis() command arguments must be strings or integers");
         return 1;
@@ -371,18 +398,21 @@ int luaRedisGenericCommand(lua_State *lua, int raise_error) {
      *
      * 首先要做的就是为客户端输出缓存中创建一个字符串
      */
-    reply = sdsempty();
-    // 先读取输出 buf 中的内容
-    if (c->bufpos) {
-        reply = sdscatlen(reply,c->buf,c->bufpos);
+    if (listLength(c->reply) == 0 && c->bufpos < REDIS_REPLY_CHUNK_BYTES) {
+        /* This is a fast path for the common case of a reply inside the
+         * client static buffer. Don't create an SDS string but just use
+         * the client buffer directly. */
+        c->buf[c->bufpos] = '\0';
+        reply = c->buf;
         c->bufpos = 0;
-    }
-    // 再读取输出列表中的内容
-    while(listLength(c->reply)) {
-        robj *o = listNodeValue(listFirst(c->reply));
-
-        reply = sdscatlen(reply,o->ptr,sdslen(o->ptr));
-        listDelNode(c->reply,listFirst(c->reply));
+    } else {
+        reply = sdsnewlen(c->buf,c->bufpos);
+        c->bufpos = 0;
+        while(listLength(c->reply)) {
+            robj *o = listNodeValue(listFirst(c->reply));
+            reply = sdscatlen(reply,o->ptr,sdslen(o->ptr));
+            listDelNode(c->reply,listFirst(c->reply));
+        }
     }
 
     // 检测执行的命令是否出错
@@ -404,18 +434,38 @@ int luaRedisGenericCommand(lua_State *lua, int raise_error) {
             luaSortArray(lua);
     }
 
-    // 释放回复
-    sdsfree(reply);
-
+    if (reply != c->buf) sdsfree(reply);
     c->reply_bytes = 0;
 
 cleanup:
     /* Clean up. Command code may have changed argv/argc so we use the
      * argv/argc of the client instead of the local variables. */
-    // 释放所有参数
-    for (j = 0; j < c->argc; j++)
-        decrRefCount(c->argv[j]);
-    zfree(c->argv);
+    for (j = 0; j < c->argc; j++) {
+        robj *o = c->argv[j];
+
+        /* Try to cache the object in the cached_objects array.
+         * The object must be small, SDS-encoded, and with refcount = 1
+         * (we must be the only owner) for us to cache it. */
+        if (j < LUA_CMD_OBJCACHE_SIZE &&
+            o->refcount == 1 &&
+            (o->encoding == REDIS_ENCODING_RAW ||
+             o->encoding == REDIS_ENCODING_EMBSTR) &&
+            sdslen(o->ptr) <= LUA_CMD_OBJCACHE_MAX_LEN)
+        {
+            struct sdshdr *sh = (void*)(((char*)(o->ptr))-(sizeof(struct sdshdr)));
+
+            if (cached_objects[j]) decrRefCount(cached_objects[j]);
+            cached_objects[j] = o;
+            cached_objects_len[j] = sh->free + sh->len;
+        } else {
+            decrRefCount(o);
+        }
+    }
+
+    if (c->argv != argv) {
+        zfree(c->argv);
+        argv = NULL;
+    }
 
     // 返回错误
     if (raise_error) {
@@ -529,8 +579,7 @@ void luaMaskCountHook(lua_State *lua, lua_Debug *ar) {
     REDIS_NOTUSED(lua);
 
     // 计算已执行时间
-    elapsed = (ustime()/1000) - server.lua_time_start;
-
+    elapsed = mstime() - server.lua_time_start;
     // 执行已超时
     if (elapsed >= server.lua_time_limit && server.lua_timedout == 0) {
 
@@ -551,9 +600,7 @@ void luaMaskCountHook(lua_State *lua, lua_Debug *ar) {
     }
 
     // 在脚本上下文中，启动文件事件处理（等待 SCRIPT KILL 或 SHUTDOWN NOSAVE）
-    if (server.lua_timedout)
-        aeProcessEvents(server.el, AE_FILE_EVENTS|AE_DONT_WAIT);
-
+    if (server.lua_timedout) processEventsWhileBlocked();
     // 如果接到 SCRIPT KILL ，那么杀死脚本
     if (server.lua_kill) {
         redisLog(REDIS_WARNING,"Lua script killed by user with SCRIPT KILL.");
@@ -1122,8 +1169,12 @@ void evalGenericCommand(redisClient *c, int evalsha) {
         int j;
         char *sha = c->argv[1]->ptr;
 
+        /* Convert to lowercase. We don't use tolower since the function
+         * managed to always show up in the profiler output consuming
+         * a non trivial amount of time. */
         for (j = 0; j < 40; j++)
-            funcname[j+2] = tolower(sha[j]);
+            funcname[j+2] = (sha[j] >= 'A' && sha[j] <= 'Z') ?
+                sha[j]+('a'-'A') : sha[j];
         funcname[42] = '\0';
     }
 
@@ -1189,7 +1240,7 @@ void evalGenericCommand(redisClient *c, int evalsha) {
     // 调用客户端
     server.lua_caller = c;
     // 脚本开始时间
-    server.lua_time_start = ustime()/1000;
+    server.lua_time_start = mstime();
     // 是否杀死脚本
     server.lua_kill = 0;
     // 只在开启了时间限制功能的时候，才使用钩子
@@ -1231,8 +1282,22 @@ void evalGenericCommand(redisClient *c, int evalsha) {
     // 所以这里需要更新客户端
     selectDb(c,server.lua_client->db->id); /* set DB ID from Lua client */
 
-    // 执行 1 步渐进式 GC
-    lua_gc(lua,LUA_GCSTEP,1);
+    /* Call the Lua garbage collector from time to time to avoid a
+     * full cycle performed by Lua, which adds too latency.
+     *
+     * The call is performed every LUA_GC_CYCLE_PERIOD executed commands
+     * (and for LUA_GC_CYCLE_PERIOD collection steps) because calling it
+     * for every command uses too much CPU. */
+    #define LUA_GC_CYCLE_PERIOD 50
+    {
+        static long gc_count = 0;
+
+        gc_count++;
+        if (gc_count == LUA_GC_CYCLE_PERIOD) {
+            lua_gc(lua,LUA_GCSTEP,LUA_GC_CYCLE_PERIOD);
+            gc_count = 0;
+        }
+    }
 
     // 检查脚本运行是否出错
     if (err) {
@@ -1291,6 +1356,7 @@ void evalGenericCommand(redisClient *c, int evalsha) {
             rewriteClientCommandArgument(c,0,
                 resetRefCount(createStringObject("EVAL",4)));
             rewriteClientCommandArgument(c,1,script);
+            forceCommandPropagation(c,REDIS_PROPAGATE_REPL|REDIS_PROPAGATE_AOF);
         }
     }
 }
diff --git a/src/sds.c b/src/sds.c
index c70beecb8..55c2e2d99 100644
--- a/src/sds.c
+++ b/src/sds.c
@@ -551,6 +551,84 @@ sds sdscpy(sds s, const char *t) {
     return sdscpylen(s, t, strlen(t));
 }
 
+/* Helper for sdscatlonglong() doing the actual number -> string
+ * conversion. 's' must point to a string with room for at least
+ * SDS_LLSTR_SIZE bytes.
+ *
+ * The function returns the lenght of the null-terminated string
+ * representation stored at 's'. */
+#define SDS_LLSTR_SIZE 21
+int sdsll2str(char *s, long long value) {
+    char *p, aux;
+    unsigned long long v;
+    size_t l;
+
+    /* Generate the string representation, this method produces
+     * an reversed string. */
+    v = (value < 0) ? -value : value;
+    p = s;
+    do {
+        *p++ = '0'+(v%10);
+        v /= 10;
+    } while(v);
+    if (value < 0) *p++ = '-';
+
+    /* Compute length and add null term. */
+    l = p-s;
+    *p = '\0';
+
+    /* Reverse the string. */
+    p--;
+    while(s < p) {
+        aux = *s;
+        *s = *p;
+        *p = aux;
+        s++;
+        p--;
+    }
+    return l;
+}
+
+/* Identical sdsll2str(), but for unsigned long long type. */
+int sdsull2str(char *s, unsigned long long v) {
+    char *p, aux;
+    size_t l;
+
+    /* Generate the string representation, this method produces
+     * an reversed string. */
+    p = s;
+    do {
+        *p++ = '0'+(v%10);
+        v /= 10;
+    } while(v);
+
+    /* Compute length and add null term. */
+    l = p-s;
+    *p = '\0';
+
+    /* Reverse the string. */
+    p--;
+    while(s < p) {
+        aux = *s;
+        *s = *p;
+        *p = aux;
+        s++;
+        p--;
+    }
+    return l;
+}
+
+/* Create an sds string from a long long value. It is much faster than:
+ *
+ * sdscatprintf(sdsempty(),"%lld\n", value);
+ */
+sds sdsfromlonglong(long long value) {
+    char buf[SDS_LLSTR_SIZE];
+    int len = sdsll2str(buf,value);
+
+    return sdsnewlen(buf,len);
+}
+
 /* 
  * 打印函数，被 sdscatprintf 所调用
  *
@@ -559,34 +637,38 @@ sds sdscpy(sds s, const char *t) {
 /* Like sdscatpritf() but gets va_list instead of being variadic. */
 sds sdscatvprintf(sds s, const char *fmt, va_list ap) {
     va_list cpy;
-    char *buf, *t;
-    size_t buflen = 16;
+    char staticbuf[1024], *buf = staticbuf, *t;
+    size_t buflen = strlen(fmt)*2;
 
-    // 以 16 字节为起始值，然后以 2 为乘法因子
-    // 猜测字符串的长度
-    // T = O(N^2)
-    while(1) {
-        // T = O(N)
+    /* We try to start using a static buffer for speed.
+     * If not possible we revert to heap allocation. */
+    if (buflen > sizeof(staticbuf)) {
         buf = zmalloc(buflen);
         if (buf == NULL) return NULL;
+    } else {
+        buflen = sizeof(staticbuf);
+    }
+
+    /* Try with buffers two times bigger every time we fail to
+     * fit the string in the current buffer size. */
+    while(1) {
         buf[buflen-2] = '\0';
         va_copy(cpy,ap);
         // T = O(N)
         vsnprintf(buf, buflen, fmt, cpy);
         if (buf[buflen-2] != '\0') {
-            // T = O(N)
-            zfree(buf);
+            if (buf != staticbuf) zfree(buf);
             buflen *= 2;
+            buf = zmalloc(buflen);
+            if (buf == NULL) return NULL;
             continue;
         }
         break;
     }
 
-    // 追加到 t 之后
-    // T = O(N)
+    /* Finally concat the obtained string to the SDS string and return it. */
     t = sdscat(s, buf);
-    zfree(buf);
-
+    if (buf != staticbuf) zfree(buf);
     return t;
 }
 
@@ -621,6 +703,122 @@ sds sdscatprintf(sds s, const char *fmt, ...) {
     return t;
 }
 
+/* This function is similar to sdscatprintf, but much faster as it does
+ * not rely on sprintf() family functions implemented by the libc that
+ * are often very slow. Moreover directly handling the sds string as
+ * new data is concatenated provides a performance improvement.
+ *
+ * However this function only handles an incompatible subset of printf-alike
+ * format specifiers:
+ *
+ * %s - C String
+ * %S - SDS string
+ * %i - signed int
+ * %I - 64 bit signed integer (long long, int64_t)
+ * %u - unsigned int
+ * %U - 64 bit unsigned integer (unsigned long long, uint64_t)
+ * %% - Verbatim "%" character.
+ */
+sds sdscatfmt(sds s, char const *fmt, ...) {
+    struct sdshdr *sh = (void*) (s-(sizeof(struct sdshdr)));
+    size_t initlen = sdslen(s);
+    const char *f = fmt;
+    int i;
+    va_list ap;
+
+    va_start(ap,fmt);
+    f = fmt;    /* Next format specifier byte to process. */
+    i = initlen; /* Position of the next byte to write to dest str. */
+    while(*f) {
+        char next, *str;
+        size_t l;
+        long long num;
+        unsigned long long unum;
+
+        /* Make sure there is always space for at least 1 char. */
+        if (sh->free == 0) {
+            s = sdsMakeRoomFor(s,1);
+            sh = (void*) (s-(sizeof(struct sdshdr)));
+        }
+
+        switch(*f) {
+        case '%':
+            next = *(f+1);
+            f++;
+            switch(next) {
+            case 's':
+            case 'S':
+                str = va_arg(ap,char*);
+                l = (next == 's') ? strlen(str) : sdslen(str);
+                if (sh->free < l) {
+                    s = sdsMakeRoomFor(s,l);
+                    sh = (void*) (s-(sizeof(struct sdshdr)));
+                }
+                memcpy(s+i,str,l);
+                sh->len += l;
+                sh->free -= l;
+                i += l;
+                break;
+            case 'i':
+            case 'I':
+                if (next == 'i')
+                    num = va_arg(ap,int);
+                else
+                    num = va_arg(ap,long long);
+                {
+                    char buf[SDS_LLSTR_SIZE];
+                    l = sdsll2str(buf,num);
+                    if (sh->free < l) {
+                        s = sdsMakeRoomFor(s,l);
+                        sh = (void*) (s-(sizeof(struct sdshdr)));
+                    }
+                    memcpy(s+i,buf,l);
+                    sh->len += l;
+                    sh->free -= l;
+                    i += l;
+                }
+                break;
+            case 'u':
+            case 'U':
+                if (next == 'u')
+                    unum = va_arg(ap,unsigned int);
+                else
+                    unum = va_arg(ap,unsigned long long);
+                {
+                    char buf[SDS_LLSTR_SIZE];
+                    l = sdsull2str(buf,unum);
+                    if (sh->free < l) {
+                        s = sdsMakeRoomFor(s,l);
+                        sh = (void*) (s-(sizeof(struct sdshdr)));
+                    }
+                    memcpy(s+i,buf,l);
+                    sh->len += l;
+                    sh->free -= l;
+                    i += l;
+                }
+                break;
+            default: /* Handle %% and generally %<unknown>. */
+                s[i++] = next;
+                sh->len += 1;
+                sh->free -= 1;
+                break;
+            }
+            break;
+        default:
+            s[i++] = *f;
+            sh->len += 1;
+            sh->free -= 1;
+            break;
+        }
+        f++;
+    }
+    va_end(ap);
+
+    /* Add null-term */
+    s[i] = '\0';
+    return s;
+}
+
 /*
  * 对 sds 左右两端进行修剪，清除其中 cset 指定的所有字符
  *
@@ -888,30 +1086,6 @@ void sdsfreesplitres(sds *tokens, int count) {
     zfree(tokens);
 }
 
-/*
- * 根据给定 long long 值，创建一个相应的 sds 表示
- *
- * T = O(N)
- */
-/* Create an sds string from a long long value. It is much faster than:
- *
- * sdscatprintf(sdsempty(),"%lld\n", value);
- */
-sds sdsfromlonglong(long long value) {
-    char buf[32], *p;
-    unsigned long long v;
-
-    v = (value < 0) ? -value : value;
-    p = buf+31; /* point to the last character */
-    do {
-        *p-- = '0'+(v%10);
-        v /= 10;
-    } while(v);
-    if (value < 0) *p-- = '-';
-    p++;
-    return sdsnewlen(p,32-(p-buf));
-}
-
 /*
  * 将长度为 len 的字符串 p 以带引号（quoted）的格式
  * 追加到给定 sds 的末尾
@@ -1205,6 +1379,7 @@ sds sdsjoin(char **argv, int argc, char *sep) {
 #ifdef SDS_TEST_MAIN
 #include <stdio.h>
 #include "testhelp.h"
+#include "limits.h"
 
 int main(void) {
     {
@@ -1235,39 +1410,61 @@ int main(void) {
         sdsfree(x);
         x = sdscatprintf(sdsempty(),"%d",123);
         test_cond("sdscatprintf() seems working in the base case",
-            sdslen(x) == 3 && memcmp(x,"123\0",4) ==0)
+            sdslen(x) == 3 && memcmp(x,"123\0",4) == 0)
+
+        sdsfree(x);
+        x = sdsnew("--");
+        x = sdscatfmt(x, "Hello %s World %I,%I--", "Hi!", LLONG_MIN,LLONG_MAX);
+        test_cond("sdscatfmt() seems working in the base case",
+            sdslen(x) == 60 &&
+            memcmp(x,"--Hello Hi! World -9223372036854775808,"
+                     "9223372036854775807--",60) == 0)
 
         sdsfree(x);
-        x = sdstrim(sdsnew("xxciaoyyy"),"xy");
+        x = sdsnew("--");
+        x = sdscatfmt(x, "%u,%U--", UINT_MAX, ULLONG_MAX);
+        test_cond("sdscatfmt() seems working with unsigned numbers",
+            sdslen(x) == 35 &&
+            memcmp(x,"--4294967295,18446744073709551615--",35) == 0)
+
+        sdsfree(x);
+        x = sdsnew("xxciaoyyy");
+        sdstrim(x,"xy");
         test_cond("sdstrim() correctly trims characters",
             sdslen(x) == 4 && memcmp(x,"ciao\0",5) == 0)
 
-        y = sdsrange(sdsdup(x),1,1);
+        y = sdsdup(x);
+        sdsrange(y,1,1);
         test_cond("sdsrange(...,1,1)",
             sdslen(y) == 1 && memcmp(y,"i\0",2) == 0)
 
         sdsfree(y);
-        y = sdsrange(sdsdup(x),1,-1);
+        y = sdsdup(x);
+        sdsrange(y,1,-1);
         test_cond("sdsrange(...,1,-1)",
             sdslen(y) == 3 && memcmp(y,"iao\0",4) == 0)
 
         sdsfree(y);
-        y = sdsrange(sdsdup(x),-2,-1);
+        y = sdsdup(x);
+        sdsrange(y,-2,-1);
         test_cond("sdsrange(...,-2,-1)",
             sdslen(y) == 2 && memcmp(y,"ao\0",3) == 0)
 
         sdsfree(y);
-        y = sdsrange(sdsdup(x),2,1);
+        y = sdsdup(x);
+        sdsrange(y,2,1);
         test_cond("sdsrange(...,2,1)",
             sdslen(y) == 0 && memcmp(y,"\0",1) == 0)
 
         sdsfree(y);
-        y = sdsrange(sdsdup(x),1,100);
+        y = sdsdup(x);
+        sdsrange(y,1,100);
         test_cond("sdsrange(...,1,100)",
             sdslen(y) == 3 && memcmp(y,"iao\0",4) == 0)
 
         sdsfree(y);
-        y = sdsrange(sdsdup(x),100,100);
+        y = sdsdup(x);
+        sdsrange(y,100,100);
         test_cond("sdsrange(...,100,100)",
             sdslen(y) == 0 && memcmp(y,"\0",1) == 0)
 
@@ -1289,6 +1486,13 @@ int main(void) {
         y = sdsnew("bar");
         test_cond("sdscmp(bar,bar)", sdscmp(x,y) < 0)
 
+        sdsfree(y);
+        sdsfree(x);
+        x = sdsnewlen("\a\n\0foo\r",7);
+        y = sdscatrepr(sdsempty(),x,sdslen(x));
+        test_cond("sdscatrepr(...data...)",
+            memcmp(y,"\"\\a\\n\\x00foo\\r\"",15) == 0)
+
         {
             int oldfree;
 
diff --git a/src/sds.h b/src/sds.h
index 1186f6236..e752027b8 100644
--- a/src/sds.h
+++ b/src/sds.h
@@ -101,6 +101,7 @@ sds sdscatprintf(sds s, const char *fmt, ...)
 sds sdscatprintf(sds s, const char *fmt, ...);
 #endif
 
+sds sdscatfmt(sds s, char const *fmt, ...);
 sds sdstrim(sds s, const char *cset);
 void sdsrange(sds s, int start, int end);
 void sdsupdatelen(sds s);
diff --git a/src/sentinel.c b/src/sentinel.c
index 56e03dcd5..255619a42 100644
--- a/src/sentinel.c
+++ b/src/sentinel.c
@@ -106,8 +106,7 @@ typedef struct sentinelAddr {
 #define SENTINEL_TILT_PERIOD (SENTINEL_PING_PERIOD*30)
 // 默认从服务器优先级
 #define SENTINEL_DEFAULT_SLAVE_PRIORITY 100
-// 默认的重新配置从服务器的间隔
-#define SENTINEL_SLAVE_RECONF_RETRY_PERIOD 10000
+#define SENTINEL_SLAVE_RECONF_TIMEOUT 10000
 // 默认的同时对新主服务器进行复制的从服务器个数
 #define SENTINEL_DEFAULT_PARALLEL_SYNCS 1
 // 默认的最少重连接间隔
@@ -118,6 +117,7 @@ typedef struct sentinelAddr {
 #define SENTINEL_MAX_PENDING_COMMANDS 100
 // 默认的选举超时时长
 #define SENTINEL_ELECTION_TIMEOUT 10000
+#define SENTINEL_MAX_DESYNC 1000
 
 /* Failover machine different states. */
 /* 故障转移时的状态 */
@@ -218,7 +218,11 @@ typedef struct sentinelRedisInstance {
     // 实例最后一次返回正确的 PING 命令回复的时间
     mstime_t last_avail_time; /* Last time the instance replied to ping with
                                  a reply we consider valid. */
-
+    // 实例最后一次发送 PING 命令的时间
+    mstime_t last_ping_time;  /* Last time a pending ping was sent in the
+                                 context of the current command connection
+                                 with the instance. 0 if still not sent or
+                                 if pong already received. */
     // 实例最后一次返回 PING 命令的时间，无论内容正确与否
     mstime_t last_pong_time;  /* Last time the instance replied to ping,
                                  whatever the reply was. That's used to check
@@ -342,6 +346,8 @@ typedef struct sentinelRedisInstance {
     // 刷新故障迁移状态的最大时限
     mstime_t failover_timeout;      /* Max time to refresh failover state. */
 
+    mstime_t failover_delay_logged; /* For what failover_start_time value we
+                                       logged the failover delay. */
     // 指向被提升为新主服务器的从服务器的指针
     struct sentinelRedisInstance *promoted_slave; /* Promoted slave instance. */
 
@@ -554,6 +560,8 @@ void sentinelDiscardReplyCallback(redisAsyncContext *c, void *reply, void *privd
 int sentinelSendSlaveOf(sentinelRedisInstance *ri, char *host, int port);
 char *sentinelVoteLeader(sentinelRedisInstance *master, uint64_t req_epoch, char *req_runid, uint64_t *leader_epoch);
 void sentinelFlushConfig(void);
+void sentinelGenerateInitialMonitorEvents(void);
+int sentinelSendPing(sentinelRedisInstance *ri);
 
 /* ========================= Dictionary types =============================== */
 
@@ -603,6 +611,7 @@ dictType leaderVotesDictType = {
 void sentinelCommand(redisClient *c);
 void sentinelInfoCommand(redisClient *c);
 void sentinelSetCommand(redisClient *c);
+void sentinelPublishCommand(redisClient *c);
 
 // 服务器在 sentinel 模式下可执行的命令
 struct redisCommand sentinelcmds[] = {
@@ -612,7 +621,9 @@ struct redisCommand sentinelcmds[] = {
     {"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
     {"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0},
     {"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
-    {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0}
+    {"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0},
+    {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0},
+    {"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0}
 };
 
 /* This function overwrites a few normal Redis config default with Sentinel
@@ -666,10 +677,20 @@ void sentinelIsRunning(void) {
     redisLog(REDIS_WARNING,"Sentinel runid is %s", server.runid);
 
     // Sentinel 不能在没有配置文件的情况下执行
-    if (server.configfile == NULL || access(server.configfile,W_OK) == -1) {
-        redisLog(REDIS_WARNING,"Sentinel started without a config file, or config file not writable. Exiting...");
+    if (server.configfile == NULL) {
+        redisLog(REDIS_WARNING,
+            "Sentinel started without a config file. Exiting...");
+        exit(1);
+    } else if (access(server.configfile,W_OK) == -1) {
+        redisLog(REDIS_WARNING,
+            "Sentinel config file %s is not writable: %s. Exiting...",
+            server.configfile,strerror(errno));
         exit(1);
     }
+
+    /* We want to generate a +monitor event for every configured master
+     * at startup. */
+    sentinelGenerateInitialMonitorEvents();
 }
 
 /* ============================== sentinelAddr ============================== */
@@ -867,6 +888,22 @@ void sentinelEvent(int level, char *type, sentinelRedisInstance *ri,
     }
 }
 
+/* This function is called only at startup and is used to generate a
+ * +monitor event for every configured master. The same events are also
+ * generated when a master to monitor is added at runtime via the
+ * SENTINEL MONITOR command. */
+void sentinelGenerateInitialMonitorEvents(void) {
+    dictIterator *di;
+    dictEntry *de;
+
+    di = dictGetIterator(sentinel.masters);
+    while((de = dictNext(di)) != NULL) {
+        sentinelRedisInstance *ri = dictGetVal(de);
+        sentinelEvent(REDIS_WARNING,"+monitor",ri,"%@ quorum %d",ri->quorum);
+    }
+    dictReleaseIterator(di);
+}
+
 /* ============================ script execution ============================ */
 
 /* Release a script job structure and all the associated data. */
@@ -1310,6 +1347,11 @@ sentinelRedisInstance *createSentinelRedisInstance(char *name, int flags, char *
     ri->cc_conn_time = 0;
     ri->pc_conn_time = 0;
     ri->pc_last_activity = 0;
+    /* We set the last_ping_time to "now" even if we actually don't have yet
+     * a connection with the node, nor we sent a ping.
+     * This is useful to detect a timeout in case we'll not be able to connect
+     * with the node at all. */
+    ri->last_ping_time = mstime();
     ri->last_avail_time = mstime();
     ri->last_pong_time = mstime();
     ri->last_pub_time = mstime();
@@ -1342,6 +1384,7 @@ sentinelRedisInstance *createSentinelRedisInstance(char *name, int flags, char *
     ri->failover_state_change_time = 0;
     ri->failover_start_time = 0;
     ri->failover_timeout = SENTINEL_DEFAULT_FAILOVER_TIMEOUT;
+    ri->failover_delay_logged = 0;
     ri->promoted_slave = NULL;
     ri->notification_script = NULL;
     ri->client_reconfig_script = NULL;
@@ -1616,6 +1659,7 @@ void sentinelResetMaster(sentinelRedisInstance *ri, int flags) {
     sdsfree(ri->slave_master_host);
     ri->runid = NULL;
     ri->slave_master_host = NULL;
+    ri->last_ping_time = mstime();
     ri->last_avail_time = mstime();
     ri->last_pong_time = mstime();
     ri->role_reported_time = mstime();
@@ -1772,6 +1816,24 @@ sentinelAddr *sentinelGetCurrentMasterAddress(sentinelRedisInstance *master) {
     }
 }
 
+/* This function sets the down_after_period field value in 'master' to all
+ * the slaves and sentinel instances connected to this master. */
+void sentinelPropagateDownAfterPeriod(sentinelRedisInstance *master) {
+    dictIterator *di;
+    dictEntry *de;
+    int j;
+    dict *d[] = {master->slaves, master->sentinels, NULL};
+
+    for (j = 0; d[j]; j++) {
+        di = dictGetIterator(d[j]);
+        while((de = dictNext(di)) != NULL) {
+            sentinelRedisInstance *ri = dictGetVal(de);
+            ri->down_after_period = master->down_after_period;
+        }
+        dictReleaseIterator(di);
+    }
+}
+
 /* ============================ Config handling ============================= */
 
 // Sentinel 配置文件分析器
@@ -1813,6 +1875,8 @@ char *sentinelHandleConfiguration(char **argv, int argc) {
         if (ri->down_after_period <= 0)
             return "negative or zero time parameter.";
 
+        sentinelPropagateDownAfterPeriod(ri);
+
     // SENTINEL failover-timeout 选项
     } else if (!strcasecmp(argv[0],"failover-timeout") && argc == 3) {
 
@@ -1883,6 +1947,12 @@ char *sentinelHandleConfiguration(char **argv, int argc) {
         // 设置选项
         ri->auth_pass = sdsnew(argv[2]);
 
+    } else if (!strcasecmp(argv[0],"current-epoch") && argc == 2) {
+        /* current-epoch <epoch> */
+        unsigned long long current_epoch = strtoull(argv[1],NULL,10);
+        if (current_epoch > sentinel.current_epoch)
+            sentinel.current_epoch = current_epoch;
+
     // SENTINEL config-epoch 选项
     } else if (!strcasecmp(argv[0],"config-epoch") && argc == 3) {
 
@@ -1892,9 +1962,18 @@ char *sentinelHandleConfiguration(char **argv, int argc) {
         if (!ri) return "No such master with specified name.";
 
         ri->config_epoch = strtoull(argv[2],NULL,10);
+        /* The following update of current_epoch is not really useful as
+         * now the current epoch is persisted on the config file, but
+         * we leave this check here for redundancy. */
         if (ri->config_epoch > sentinel.current_epoch)
             sentinel.current_epoch = ri->config_epoch;
 
+    } else if (!strcasecmp(argv[0],"leader-epoch") && argc == 3) {
+        /* leader-epoch <name> <epoch> */
+        ri = sentinelGetMasterByName(argv[1]);
+        if (!ri) return "No such master with specified name.";
+        ri->leader_epoch = strtoull(argv[2],NULL,10);
+
     // SENTINEL known-slave 选项
     } else if (!strcasecmp(argv[0],"known-slave") && argc == 4) {
         sentinelRedisInstance *slave;
@@ -1942,13 +2021,13 @@ char *sentinelHandleConfiguration(char **argv, int argc) {
 void rewriteConfigSentinelOption(struct rewriteConfigState *state) {
     dictIterator *di, *di2;
     dictEntry *de;
+    sds line;
 
     /* For every master emit a "sentinel monitor" config entry. */
     di = dictGetIterator(sentinel.masters);
     while((de = dictNext(di)) != NULL) {
         sentinelRedisInstance *master, *ri;
         sentinelAddr *master_addr;
-        sds line;
 
         /* sentinel monitor */
         master = dictGetVal(de);
@@ -2012,6 +2091,12 @@ void rewriteConfigSentinelOption(struct rewriteConfigState *state) {
             master->name, (unsigned long long) master->config_epoch);
         rewriteConfigRewriteLine(state,"sentinel",line,1);
 
+        /* sentinel leader-epoch */
+        line = sdscatprintf(sdsempty(),
+            "sentinel leader-epoch %s %llu",
+            master->name, (unsigned long long) master->leader_epoch);
+        rewriteConfigRewriteLine(state,"sentinel",line,1);
+
         /* sentinel known-slave */
         di2 = dictGetIterator(master->slaves);
         while((de = dictNext(di2)) != NULL) {
@@ -2047,6 +2132,12 @@ void rewriteConfigSentinelOption(struct rewriteConfigState *state) {
         }
         dictReleaseIterator(di2);
     }
+
+    /* sentinel current-epoch is a global state valid for all the masters. */
+    line = sdscatprintf(sdsempty(),
+        "sentinel current-epoch %llu", (unsigned long long) sentinel.current_epoch);
+    rewriteConfigRewriteLine(state,"sentinel",line,1);
+
     dictReleaseIterator(di);
 }
 
@@ -2065,21 +2156,23 @@ void rewriteConfigSentinelOption(struct rewriteConfigState *state) {
  * 如果保存失败，那么打印一条警告日志。
  */
 void sentinelFlushConfig(void) {
-    int fd;
+    int fd = -1;
     int saved_hz = server.hz;
+    int rewrite_status;
 
     server.hz = REDIS_DEFAULT_HZ;
-    if (rewriteConfig(server.configfile) != -1) {
-        /* Rewrite succeded, fsync it. */
-        if ((fd = open(server.configfile,O_RDONLY)) != -1) {
-            fsync(fd);
-            close(fd);
-        }
-    } else {
-        redisLog(REDIS_WARNING,"WARNING: Senitnel was not able to save the new configuration on disk!!!: %s", strerror(errno));
-    }
+    rewrite_status = rewriteConfig(server.configfile);
     server.hz = saved_hz;
+
+    if (rewrite_status == -1) goto werr;
+    if ((fd = open(server.configfile,O_RDONLY)) == -1) goto werr;
+    if (fsync(fd) == -1) goto werr;
+    if (close(fd) == EOF) goto werr;
     return;
+
+werr:
+    if (fd != -1) close(fd);
+    redisLog(REDIS_WARNING,"WARNING: Sentinel was not able to save the new configuration on disk!!!: %s", strerror(errno));
 }
 
 /* ====================== hiredis connection handling ======================= */
@@ -2178,6 +2271,23 @@ void sentinelSendAuthIfNeeded(sentinelRedisInstance *ri, redisAsyncContext *c) {
     }
 }
 
+/* Use CLIENT SETNAME to name the connection in the Redis instance as
+ * sentinel-<first_8_chars_of_runid>-<connection_type>
+ * The connection type is "cmd" or "pubsub" as specified by 'type'.
+ *
+ * This makes it possible to list all the sentinel instances connected
+ * to a Redis servewr with CLIENT LIST, grepping for a specific name format. */
+void sentinelSetClientName(sentinelRedisInstance *ri, redisAsyncContext *c, char *type) {
+    char name[64];
+
+    snprintf(name,sizeof(name),"sentinel-%.8s-%s",server.runid,type);
+    if (redisAsyncCommand(c, sentinelDiscardReplyCallback, NULL,
+        "CLIENT SETNAME %s", name) == REDIS_OK)
+    {
+        ri->pending_commands++;
+    }
+}
+
 /* Create the async connections for the specified instance if the instance
  * is disconnected. Note that the SRI_DISCONNECTED flag is set even if just
  * one of the two links (commands and pub/sub) is missing. */
@@ -2214,6 +2324,10 @@ void sentinelReconnectInstance(sentinelRedisInstance *ri) {
                                             sentinelDisconnectCallback);
             // 发送 AUTH 命令，验证身份
             sentinelSendAuthIfNeeded(ri,ri->cc);
+            sentinelSetClientName(ri,ri->cc,"cmd");
+
+            /* Send a PING ASAP when reconnecting. */
+            sentinelSendPing(ri);
         }
     }
 
@@ -2247,6 +2361,8 @@ void sentinelReconnectInstance(sentinelRedisInstance *ri) {
             // 发送 AUTH 命令，验证身份
             sentinelSendAuthIfNeeded(ri,ri->pc);
 
+            sentinelSetClientName(ri,ri->pc,"pubsub");
+
             /* Now we subscribe to the Sentinels "Hello" channel. */
             // 发送 SUBSCRIBE __sentinel__:hello 命令，订阅频道
             retval = redisAsyncCommand(ri->pc,
@@ -2451,8 +2567,20 @@ void sentinelRefreshInstanceInfo(sentinelRedisInstance *ri, const char *info) {
         ri->role_reported_time = mstime();
         ri->role_reported = role;
         if (role == SRI_SLAVE) ri->slave_conf_change_time = mstime();
+        /* Log the event with +role-change if the new role is coherent or
+         * with -role-change if there is a mismatch with the current config. */
+        sentinelEvent(REDIS_VERBOSE,
+            ((ri->flags & (SRI_MASTER|SRI_SLAVE)) == role) ?
+            "+role-change" : "-role-change",
+            ri, "%@ new reported role is %s",
+            role == SRI_MASTER ? "master" : "slave",
+            ri->flags & SRI_MASTER ? "master" : "slave");
     }
 
+    /* None of the following conditions are processed when in tilt mode, so
+     * return asap. */
+    if (sentinel.tilt) return;
+
     /* Handle master -> slave role switch. */
     // 实例被 Sentinel 标识为主服务器，但根据 INFO 命令的回复
     // 这个实例的身份为从服务器
@@ -2468,10 +2596,10 @@ void sentinelRefreshInstanceInfo(sentinelRedisInstance *ri, const char *info) {
     if ((ri->flags & SRI_SLAVE) && role == SRI_MASTER) {
         /* If this is a promoted slave we can change state to the
          * failover state machine. */
+
         // 如果这是被选中升级为新主服务器的从服务器
         // 那么更新相关的故障转移属性
-        if (!sentinel.tilt &&
-            (ri->master->flags & SRI_FAILOVER_IN_PROGRESS) &&
+        if ((ri->master->flags & SRI_FAILOVER_IN_PROGRESS) &&
             (ri->master->failover_state ==
                 SENTINEL_FAILOVER_STATE_WAIT_PROMOTION))
         {
@@ -2503,7 +2631,7 @@ void sentinelRefreshInstanceInfo(sentinelRedisInstance *ri, const char *info) {
 
         // 这个实例由从服务器变为了主服务器，并且没有进入 TILT 模式
         // （可能是因为重启造成的，或者之前的下线主服务器重新上线了）
-        } else if (!sentinel.tilt) {
+        } else {
             /* A slave turned into a master. We want to force our view and
              * reconfigure as slave. Wait some time after the change before
              * going forward, to receive new configs if any. */
@@ -2533,7 +2661,7 @@ void sentinelRefreshInstanceInfo(sentinelRedisInstance *ri, const char *info) {
 
     /* Handle slaves replicating to a different master address. */
     // 让从服务器重新复制回正确的主服务器
-    if ((ri->flags & SRI_SLAVE) && !sentinel.tilt &&
+    if ((ri->flags & SRI_SLAVE) &&
         role == SRI_SLAVE &&
         // 从服务器现在的主服务器地址和 Sentinel 保存的信息不一致
         (ri->slave_master_port != ri->master->addr->port ||
@@ -2561,11 +2689,6 @@ void sentinelRefreshInstanceInfo(sentinelRedisInstance *ri, const char *info) {
         }
     }
 
-    /* None of the following conditions are processed when in tilt mode, so
-     * return asap. */
-    // 以下动作都不能在 TILT 模式下执行
-    if (sentinel.tilt) return;
-
     /* Detect if the slave that is in the process of being reconfigured
      * changed state. */
     // Sentinel 监视的实例为从服务器，并且已经向它发送 SLAVEOF 命令
@@ -2641,6 +2764,7 @@ void sentinelPingReplyCallback(redisAsyncContext *c, void *reply, void *privdata
         {
             // 实例运作正常
             ri->last_avail_time = mstime();
+            ri->last_ping_time = 0; /* Flag the pong as received. */
         } else {
 
             // 实例运作不正常
@@ -2684,20 +2808,109 @@ void sentinelPublishReplyCallback(redisAsyncContext *c, void *reply, void *privd
         ri->last_pub_time = mstime();
 }
 
+/* Process an hello message received via Pub/Sub in master or slave instance,
+ * or sent directly to this sentinel via the (fake) PUBLISH command of Sentinel.
+ *
+ * If the master name specified in the message is not known, the message is
+ * discareded. */
+void sentinelProcessHelloMessage(char *hello, int hello_len) {
+    /* Format is composed of 8 tokens:
+     * 0=ip,1=port,2=runid,3=current_epoch,4=master_name,
+     * 5=master_ip,6=master_port,7=master_config_epoch. */
+    int numtokens, port, removed, master_port;
+    uint64_t current_epoch, master_config_epoch;
+    char **token = sdssplitlen(hello, hello_len, ",", 1, &numtokens);
+    sentinelRedisInstance *si, *master;
+
+    if (numtokens == 8) {
+        /* Obtain a reference to the master this hello message is about */
+        master = sentinelGetMasterByName(token[4]);
+        if (!master) goto cleanup; /* Unknown master, skip the message. */
+
+        /* First, try to see if we already have this sentinel. */
+        port = atoi(token[1]);
+        master_port = atoi(token[6]);
+        si = getSentinelRedisInstanceByAddrAndRunID(
+                        master->sentinels,token[0],port,token[2]);
+        current_epoch = strtoull(token[3],NULL,10);
+        master_config_epoch = strtoull(token[7],NULL,10);
+
+        if (!si) {
+            /* If not, remove all the sentinels that have the same runid
+             * OR the same ip/port, because it's either a restart or a
+             * network topology change. */
+            removed = removeMatchingSentinelsFromMaster(master,token[0],port,
+                            token[2]);
+            if (removed) {
+                sentinelEvent(REDIS_NOTICE,"-dup-sentinel",master,
+                    "%@ #duplicate of %s:%d or %s",
+                    token[0],port,token[2]);
+            }
+
+            /* Add the new sentinel. */
+            si = createSentinelRedisInstance(NULL,SRI_SENTINEL,
+                            token[0],port,master->quorum,master);
+            if (si) {
+                sentinelEvent(REDIS_NOTICE,"+sentinel",si,"%@");
+                /* The runid is NULL after a new instance creation and
+                 * for Sentinels we don't have a later chance to fill it,
+                 * so do it now. */
+                si->runid = sdsnew(token[2]);
+                sentinelFlushConfig();
+            }
+        }
+
+        /* Update local current_epoch if received current_epoch is greater.*/
+        if (current_epoch > sentinel.current_epoch) {
+            sentinel.current_epoch = current_epoch;
+            sentinelFlushConfig();
+            sentinelEvent(REDIS_WARNING,"+new-epoch",master,"%llu",
+                (unsigned long long) sentinel.current_epoch);
+        }
+
+        /* Update master info if received configuration is newer. */
+        if (master->config_epoch < master_config_epoch) {
+            master->config_epoch = master_config_epoch;
+            if (master_port != master->addr->port ||
+                strcmp(master->addr->ip, token[5]))
+            {
+                sentinelAddr *old_addr;
+
+                sentinelEvent(REDIS_WARNING,"+config-update-from",si,"%@");
+                sentinelEvent(REDIS_WARNING,"+switch-master",
+                    master,"%s %s %d %s %d",
+                    master->name,
+                    master->addr->ip, master->addr->port,
+                    token[5], master_port);
+
+                old_addr = dupSentinelAddr(master->addr);
+                sentinelResetMasterAndChangeAddress(master, token[5], master_port);
+                sentinelCallClientReconfScript(master,
+                    SENTINEL_OBSERVER,"start",
+                    old_addr,master->addr);
+                releaseSentinelAddr(old_addr);
+            }
+        }
+
+        /* Update the state of the Sentinel. */
+        if (si) si->last_hello_time = mstime();
+    }
+
+cleanup:
+    sdsfreesplitres(token,numtokens);
+}
+
+
 /* This is our Pub/Sub callback for the Hello channel. It's useful in order
  * to discover other sentinels attached at the same master. */
 // 此回调函数用于处理 Hello 频道的返回值，它可以发现其他正在订阅同一主服务器的 Sentinel
 void sentinelReceiveHelloMessages(redisAsyncContext *c, void *reply, void *privdata) {
-    sentinelRedisInstance *ri = c->data, *master;
+    sentinelRedisInstance *ri = c->data;
     redisReply *r;
 
     if (!reply || !ri) return;
     r = reply;
 
-    // 如果实例是主服务器，那么指向实例
-    // 如果实例是从服务器，那么指向实例的主服务器
-    master = (ri->flags & SRI_MASTER) ? ri : ri->master;
-
     /* Update the last activity in the pubsub channel. Note that since we
      * receive our messages as well this timestamp can be used to detect
      * if the link is probably disconnected even if it seems otherwise. */
@@ -2718,103 +2931,7 @@ void sentinelReceiveHelloMessages(redisAsyncContext *c, void *reply, void *privd
     // 只处理非自己发送的信息
     if (strstr(r->element[2]->str,server.runid) != NULL) return;
 
-    {
-        /* Format is composed of 8 tokens:
-         * 0=ip,1=port,2=runid,3=current_epoch,4=master_name,
-         * 5=master_ip,6=master_port,7=master_config_epoch. */
-        int numtokens, port, removed, master_port;
-        uint64_t current_epoch, master_config_epoch;
-        char **token = sdssplitlen(r->element[2]->str,
-                                   r->element[2]->len,
-                                   ",",1,&numtokens);
-        sentinelRedisInstance *si;
-
-        if (numtokens == 8) {
-            /* First, try to see if we already have this sentinel. */
-
-            // 端口号
-            port = atoi(token[1]);
-            master_port = atoi(token[6]);
-            si = getSentinelRedisInstanceByAddrAndRunID(
-                            master->sentinels,token[0],port,token[2]);
-            current_epoch = strtoull(token[3],NULL,10);
-            master_config_epoch = strtoull(token[7],NULL,10);
-            sentinelRedisInstance *msgmaster;
-
-            // 检查 Sentinel 是否存在
-            if (!si) {
-                /* If not, remove all the sentinels that have the same runid
-                 * OR the same ip/port, because it's either a restart or a
-                 * network topology change. */
-                // 不存在，移除所有和 Sentinel 有相同运行 ID 或者相同地址（IP和端口）
-                // 的已存在 Sentinel ，因为这些已存在 Sentinel 有可能是重启之前的 Sentinel
-                removed = removeMatchingSentinelsFromMaster(master,token[0],port,
-                                token[2]);
-                // 发送移除 sentinel 实例事件
-                if (removed) {
-                    sentinelEvent(REDIS_NOTICE,"-dup-sentinel",master,
-                        "%@ #duplicate of %s:%d or %s",
-                        token[0],port,token[2]);
-                }
-
-                /* Add the new sentinel. */
-                // 添加新 Sentinel
-                si = createSentinelRedisInstance(NULL,SRI_SENTINEL,
-                                token[0],port,master->quorum,master);
-                if (si) {
-                    sentinelEvent(REDIS_NOTICE,"+sentinel",si,"%@");
-                    /* The runid is NULL after a new instance creation and
-                     * for Sentinels we don't have a later chance to fill it,
-                     * so do it now. */
-                    si->runid = sdsnew(token[2]);
-                    sentinelFlushConfig();
-                }
-            }
-
-            /* Update local current_epoch if received current_epoch is greater.*/
-            // 如果从对方 Sentinel 接收到更高的纪元，那么更新本 Sentinel 的纪元
-            if (current_epoch > sentinel.current_epoch) {
-                sentinel.current_epoch = current_epoch;
-                sentinelEvent(REDIS_WARNING,"+new-epoch",ri,"%llu",
-                    (unsigned long long) sentinel.current_epoch);
-            }
-
-            /* Update master info if received configuration is newer. */
-            // 查找指定的主服务器
-            if ((msgmaster = sentinelGetMasterByName(token[4])) != NULL) {
-
-                // 如果 Sentinel 保存的信息中的主服务器的纪元未更新，那么进行更新
-                if (msgmaster->config_epoch < master_config_epoch) {
-                    msgmaster->config_epoch = master_config_epoch;
-
-                    // 如果 Sentinel 的地址已经修改，那么更新实例结构中的地址
-                    if (master_port != msgmaster->addr->port ||
-                        strcmp(msgmaster->addr->ip, token[5]))
-                    {
-                        sentinelAddr *old_addr;
-
-                        sentinelEvent(REDIS_WARNING,"+switch-master",
-                            msgmaster,"%s %s %d %s %d",
-                            msgmaster->name,
-                            msgmaster->addr->ip, msgmaster->addr->port,
-                            token[5], master_port);
-
-                        old_addr = dupSentinelAddr(msgmaster->addr);
-                        sentinelResetMasterAndChangeAddress(msgmaster,
-                                                    token[5], master_port);
-                        sentinelCallClientReconfScript(msgmaster,
-                            SENTINEL_OBSERVER,"start",
-                            old_addr,msgmaster->addr);
-                        releaseSentinelAddr(old_addr);
-                    }
-                }
-            }
-
-            /* Update the state of the Sentinel. */
-            if (si) si->last_hello_time = mstime();
-        }
-        sdsfreesplitres(token,numtokens);
-    }
+    sentinelProcessHelloMessage(r->element[2]->str, r->element[2]->len);
 }
 
 /* Send an "Hello" message via Pub/Sub to the specified 'ri' Redis
@@ -2881,19 +2998,33 @@ int sentinelSendHello(sentinelRedisInstance *ri) {
     return REDIS_OK;
 }
 
+/* Send a PING to the specified instance and refresh the last_ping_time
+ * if it is zero (that is, if we received a pong for the previous ping).
+ *
+ * On error zero is returned, and we can't consider the PING command
+ * queued in the connection. */
+int sentinelSendPing(sentinelRedisInstance *ri) {
+    int retval = redisAsyncCommand(ri->cc,
+        sentinelPingReplyCallback, NULL, "PING");
+    if (retval == REDIS_OK) {
+        ri->pending_commands++;
+        /* We update the ping time only if we received the pong for
+         * the previous ping, otherwise we are technically waiting
+         * since the first ping that did not received a reply. */
+        if (ri->last_ping_time == 0) ri->last_ping_time = mstime();
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
 // 根据时间和实例类型等情况，向实例发送命令，比如 INFO 、PING 和 PUBLISH
 // 虽然函数的名字包含 Ping ，但命令并不只发送 PING 命令
 /* Send periodic PING, INFO, and PUBLISH to the Hello channel to
  * the specified master or slave instance. */
-void sentinelPingInstance(sentinelRedisInstance *ri) {
-
-    // 当前时间
+void sentinelSendPeriodicCommands(sentinelRedisInstance *ri) {
     mstime_t now = mstime();
-
-    // 发送 INFO 命令的间隔
-    mstime_t info_period;
-
-    // 返回值
+    mstime_t info_period, ping_period;
     int retval;
 
     /* Return ASAP if we have already a PING or INFO already pending, or
@@ -2926,6 +3057,12 @@ void sentinelPingInstance(sentinelRedisInstance *ri) {
         info_period = SENTINEL_INFO_PERIOD;
     }
 
+    /* We ping instances every time the last received pong is older than
+     * the configured 'down-after-milliseconds' time, but every second
+     * anyway if 'down-after-milliseconds' is greater than 1 second. */
+    ping_period = ri->down_after_period;
+    if (ping_period > SENTINEL_PING_PERIOD) ping_period = SENTINEL_PING_PERIOD;
+
     // 实例不是 Sentinel （主服务器或者从服务器）
     // 并且以下条件的其中一个成立：
     // 1）SENTINEL 未收到过这个服务器的 INFO 命令回复
@@ -2938,29 +3075,12 @@ void sentinelPingInstance(sentinelRedisInstance *ri) {
         /* Send INFO to masters and slaves, not sentinels. */
         retval = redisAsyncCommand(ri->cc,
             sentinelInfoReplyCallback, NULL, "INFO");
-
-        if (retval != REDIS_OK) return;
-
-        ri->pending_commands++;
-
-    // 如果距离上次向实例发送 PING 命令已经过了 SENTINEL_PING_PERIOD 毫秒
-    // 那么再次发送 PING 命令
-    // 实例可以是任意类型：sentinel 、主服务器或者从服务器
-    } else if ((now - ri->last_pong_time) > SENTINEL_PING_PERIOD) {
+        if (retval == REDIS_OK) ri->pending_commands++;
+    } else if ((now - ri->last_pong_time) > ping_period) {
         /* Send PING to all the three kinds of instances. */
-        retval = redisAsyncCommand(ri->cc,
-            sentinelPingReplyCallback, NULL, "PING");
-        if (retval != REDIS_OK) return;
-        ri->pending_commands++;
-
-    // 如果距离上次向实例（主服务器或者从服务器）发送 
-    // PUBLISH 命令的时间已经超过了 SENTINEL_PUBLISH_PERIOD
-    // 那么向实例发送新的 PUBLISH 命令
-    } else if ((ri->flags & SRI_SENTINEL) == 0 &&
-               (now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD)
-    {
-        /* PUBLISH hello messages to masters and slaves. */
-        // 向主服务器（或者从服务器的主服务器）的频道发送信息
+        sentinelSendPing(ri);
+    } else if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {
+        /* PUBLISH hello messages to all the three kinds of instances. */
         sentinelSendHello(ri);
     }
 }
@@ -3036,6 +3156,11 @@ void addReplySentinelRedisInstance(redisClient *c, sentinelRedisInstance *ri) {
         fields++;
     }
 
+    addReplyBulkCString(c,"last-ping-sent");
+    addReplyBulkLongLong(c,
+        ri->last_ping_time ? (mstime() - ri->last_ping_time) : 0);
+    fields++;
+
     addReplyBulkCString(c,"last-ok-ping-reply");
     addReplyBulkLongLong(c,mstime() - ri->last_avail_time);
     fields++;
@@ -3056,6 +3181,10 @@ void addReplySentinelRedisInstance(redisClient *c, sentinelRedisInstance *ri) {
         fields++;
     }
 
+    addReplyBulkCString(c,"down-after-milliseconds");
+    addReplyBulkLongLong(c,ri->down_after_period);
+    fields++;
+
     /* Masters and Slaves */
     if (ri->flags & (SRI_MASTER|SRI_SLAVE)) {
         addReplyBulkCString(c,"info-refresh");
@@ -3090,10 +3219,6 @@ void addReplySentinelRedisInstance(redisClient *c, sentinelRedisInstance *ri) {
         addReplyBulkLongLong(c,ri->quorum);
         fields++;
 
-        addReplyBulkCString(c,"down-after-milliseconds");
-        addReplyBulkLongLong(c,ri->down_after_period);
-        fields++;
-
         addReplyBulkCString(c,"failover-timeout");
         addReplyBulkLongLong(c,ri->failover_timeout);
         fields++;
@@ -3208,6 +3333,7 @@ void sentinelCommand(redisClient *c) {
         /* SENTINEL MASTER <name> */
         sentinelRedisInstance *ri;
 
+        if (c->argc != 3) goto numargserr;
         if ((ri = sentinelGetMasterByNameOrReplyError(c,c->argv[2]))
             == NULL) return;
         addReplySentinelRedisInstance(c,ri);
@@ -3281,8 +3407,6 @@ void sentinelCommand(redisClient *c) {
         ri = sentinelGetMasterByName(c->argv[2]->ptr);
         if (ri == NULL) {
             addReply(c,shared.nullmultibulk);
-        } else if (ri->info_refresh == 0) {
-            addReplySds(c,sdsnew("-IDONTKNOW I have not enough information to reply. Please ask another Sentinel.\r\n"));
         } else {
             sentinelAddr *addr = sentinelGetCurrentMasterAddress(ri);
 
@@ -3317,6 +3441,7 @@ void sentinelCommand(redisClient *c) {
         sentinelPendingScriptsCommand(c);
     } else if (!strcasecmp(c->argv[1]->ptr,"monitor")) {
         /* SENTINEL MONITOR <name> <ip> <port> <quorum> */
+        sentinelRedisInstance *ri;
         long quorum, port;
         char buf[32];
 
@@ -3332,9 +3457,11 @@ void sentinelCommand(redisClient *c) {
             addReplyError(c,"Invalid IP address specified");
             return;
         }
-        if (createSentinelRedisInstance(c->argv[2]->ptr,SRI_MASTER,
-                c->argv[3]->ptr,port,quorum,NULL) == NULL)
-        {
+
+        /* Parameters are valid. Try to create the master instance. */
+        ri = createSentinelRedisInstance(c->argv[2]->ptr,SRI_MASTER,
+                c->argv[3]->ptr,port,quorum,NULL);
+        if (ri == NULL) {
             switch(errno) {
             case EBUSY:
                 addReplyError(c,"Duplicated master name");
@@ -3348,6 +3475,7 @@ void sentinelCommand(redisClient *c) {
             }
         } else {
             sentinelFlushConfig();
+            sentinelEvent(REDIS_WARNING,"+monitor",ri,"%@ quorum %d",ri->quorum);
             addReply(c,shared.ok);
         }
     } else if (!strcasecmp(c->argv[1]->ptr,"remove")) {
@@ -3356,6 +3484,7 @@ void sentinelCommand(redisClient *c) {
 
         if ((ri = sentinelGetMasterByNameOrReplyError(c,c->argv[2]))
             == NULL) return;
+        sentinelEvent(REDIS_WARNING,"-monitor",ri,"%@");
         dictDelete(sentinel.masters,c->argv[2]->ptr);
         sentinelFlushConfig();
         addReply(c,shared.ok);
@@ -3455,6 +3584,7 @@ void sentinelSetCommand(redisClient *c) {
             if (getLongLongFromObject(o,&ll) == REDIS_ERR || ll <= 0)
                 goto badfmt;
             ri->down_after_period = ll;
+            sentinelPropagateDownAfterPeriod(ri);
             changes++;
         } else if (!strcasecmp(option,"failover-timeout")) {
             /* failover-timeout <milliseconds> */
@@ -3508,6 +3638,7 @@ void sentinelSetCommand(redisClient *c) {
             if (changes) sentinelFlushConfig();
             return;
         }
+        sentinelEvent(REDIS_WARNING,"+set",ri,"%@ %s %s",option,value);
     }
 
     if (changes) sentinelFlushConfig();
@@ -3520,14 +3651,31 @@ void sentinelSetCommand(redisClient *c) {
             value, option);
 }
 
+/* Our fake PUBLISH command: it is actually useful only to receive hello messages
+ * from the other sentinel instances, and publishing to a channel other than
+ * SENTINEL_HELLO_CHANNEL is forbidden.
+ *
+ * Because we have a Sentinel PUBLISH, the code to send hello messages is the same
+ * for all the three kind of instances: masters, slaves, sentinels. */
+void sentinelPublishCommand(redisClient *c) {
+    if (strcmp(c->argv[1]->ptr,SENTINEL_HELLO_CHANNEL)) {
+        addReplyError(c, "Only HELLO messages are accepted by Sentinel instances.");
+        return;
+    }
+    sentinelProcessHelloMessage(c->argv[2]->ptr,sdslen(c->argv[2]->ptr));
+    addReplyLongLong(c,1);
+}
+
 /* ===================== SENTINEL availability checks ======================= */
 
 /* Is this instance down from our point of view? */
 // 检查实例是否以下线（从本 Sentinel 的角度来看）
 void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
 
-    // 实例上次正确回复 PING 命令距离现在有多久
-    mstime_t elapsed = mstime() - ri->last_avail_time;
+    mstime_t elapsed = 0;
+
+    if (ri->last_ping_time)
+        elapsed = mstime() - ri->last_ping_time;
 
     /* Check if we are in need for a reconnection of one of the 
      * links, because we are detecting low activity.
@@ -3535,11 +3683,15 @@ void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
      * 如果检测到连接的活跃度（activity）很低，那么考虑重断开连接，并进行重连
      *
      * 1) Check if the command link seems connected, was connected not less
-     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have an
-     *    idle time that is greater than down_after_period / 2 seconds. */
+     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have a
+     *    pending ping for more than half the timeout. */
     // 考虑断开实例的 cc 连接
     if (ri->cc &&
         (mstime() - ri->cc_conn_time) > SENTINEL_MIN_LINK_RECONNECT_PERIOD &&
+        ri->last_ping_time != 0 && /* Ther is a pending ping... */
+        /* The pending ping is delayed, and we did not received
+         * error replies as well. */
+        (mstime() - ri->last_ping_time) > (ri->down_after_period/2) &&
         (mstime() - ri->last_pong_time) > (ri->down_after_period/2))
     {
         sentinelKillLink(ri,ri->cc);
@@ -3811,6 +3963,7 @@ void sentinelAskMasterStateToOtherSentinels(sentinelRedisInstance *master, int f
 char *sentinelVoteLeader(sentinelRedisInstance *master, uint64_t req_epoch, char *req_runid, uint64_t *leader_epoch) {
     if (req_epoch > sentinel.current_epoch) {
         sentinel.current_epoch = req_epoch;
+        sentinelFlushConfig();
         sentinelEvent(REDIS_WARNING,"+new-epoch",master,"%llu",
             (unsigned long long) sentinel.current_epoch);
     }
@@ -3820,13 +3973,14 @@ char *sentinelVoteLeader(sentinelRedisInstance *master, uint64_t req_epoch, char
         sdsfree(master->leader);
         master->leader = sdsnew(req_runid);
         master->leader_epoch = sentinel.current_epoch;
+        sentinelFlushConfig();
         sentinelEvent(REDIS_WARNING,"+vote-for-leader",master,"%s %llu",
             master->leader, (unsigned long long) master->leader_epoch);
         /* If we did not voted for ourselves, set the master failover start
          * time to now, in order to force a delay before we can start a
          * failover for the same master. */
         if (strcasecmp(master->leader,server.runid))
-            master->failover_start_time = mstime();
+            master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
     }
 
     *leader_epoch = master->leader_epoch;
@@ -4035,9 +4189,8 @@ void sentinelStartFailover(sentinelRedisInstance *master) {
 
     sentinelEvent(REDIS_WARNING,"+try-failover",master,"%@");
 
-    // 记录故障转移的开始时间
-    master->failover_start_time = mstime();
     // 记录故障转移状态的变更时间
+    master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
     master->failover_state_change_time = mstime();
 }
 
@@ -4074,7 +4227,22 @@ int sentinelStartFailoverIfNeeded(sentinelRedisInstance *master) {
 
     /* Last failover attempt started too little time ago? */
     if (mstime() - master->failover_start_time <
-        master->failover_timeout*2) return 0;
+        master->failover_timeout*2)
+    {
+        if (master->failover_delay_logged != master->failover_start_time) {
+            time_t clock = (master->failover_start_time +
+                            master->failover_timeout*2) / 1000;
+            char ctimebuf[26];
+
+            ctime_r(&clock,ctimebuf);
+            ctimebuf[24] = '\0'; /* Remove newline. */
+            master->failover_delay_logged = master->failover_start_time;
+            redisLog(REDIS_WARNING,
+                "Next failover delay: I will not start a failover before %s",
+                ctimebuf);
+        }
+        return 0;
+    }
 
     // 开始一次故障转移
     sentinelStartFailover(master);
@@ -4511,18 +4679,19 @@ void sentinelFailoverReconfNextSlave(sentinelRedisInstance *master) {
         // 跳过新主服务器，以及已经完成了同步的从服务器
         if (slave->flags & (SRI_PROMOTED|SRI_RECONF_DONE)) continue;
 
-        /* Clear the SRI_RECONF_SENT flag if too much time elapsed without
-         * the slave moving forward to the next state. */
-        // 如果 SLAVEOF 命令已经发送了很久，但从服务器仍未完成同步
-        // 那么尝试重新发送命令
+        /* If too much time elapsed without the slave moving forward to
+         * the next state, consider it reconfigured even if it is not.
+         * Sentinels will detect the slave as misconfigured and fix its
+         * configuration later. */
         if ((slave->flags & SRI_RECONF_SENT) &&
             (mstime() - slave->slave_reconf_sent_time) >
-            SENTINEL_SLAVE_RECONF_RETRY_PERIOD)
+            SENTINEL_SLAVE_RECONF_TIMEOUT)
         {
             // 发送重拾同步事件
             sentinelEvent(REDIS_NOTICE,"-slave-reconf-sent-timeout",slave,"%@");
             // 清除已发送 SLAVEOF 命令的标记
             slave->flags &= ~SRI_RECONF_SENT;
+            slave->flags |= SRI_RECONF_DONE;
         }
 
         /* Nothing to do for instances that are disconnected or already
@@ -4664,7 +4833,7 @@ void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
     sentinelReconnectInstance(ri);
 
     // 根据情况，向实例发送 PING、 INFO 或者 PUBLISH 命令
-    sentinelPingInstance(ri);
+    sentinelSendPeriodicCommands(ri);
 
     /* ============== ACTING HALF ============= */
     /* ==============  故障检测   ============= */
diff --git a/src/sort.c b/src/sort.c
index 31da8c465..9cd2e169d 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -260,6 +260,7 @@ void sortCommand(redisClient *c) {
     int j, dontsort = 0, vectorlen;
     int getop = 0; /* GET operation counter */
     int int_convertion_error = 0;
+    int syntax_error = 0;
     robj *sortval, *sortby = NULL, *storekey = NULL;
     redisSortObject *vector; /* Resulting vector to sort */
 
@@ -315,8 +316,14 @@ void sortCommand(redisClient *c) {
 		// LIMIT 选项
         } else if (!strcasecmp(c->argv[j]->ptr,"limit") && leftargs >= 2) {
 			// start 参数和 count 参数
-            if ((getLongFromObjectOrReply(c, c->argv[j+1], &limit_start, NULL) != REDIS_OK) ||
-                (getLongFromObjectOrReply(c, c->argv[j+2], &limit_count, NULL) != REDIS_OK)) return;
+            if ((getLongFromObjectOrReply(c, c->argv[j+1], &limit_start, NULL)
+                 != REDIS_OK) ||
+                (getLongFromObjectOrReply(c, c->argv[j+2], &limit_count, NULL)
+                 != REDIS_OK))
+            {
+                syntax_error++;
+                break;
+            }
             j+=2;
 
 		// STORE 选项
@@ -335,12 +342,30 @@ void sortCommand(redisClient *c) {
              * we don't need to sort nor to lookup the weight keys. */
 			// 如果 sortby 模式里面不包含 '*' 符号，
             // 那么无须执行排序操作
-            if (strchr(c->argv[j+1]->ptr,'*') == NULL) dontsort = 1;
+            if (strchr(c->argv[j+1]->ptr,'*') == NULL) {
+                dontsort = 1;
+            } else {
+                /* If BY is specified with a real patter, we can't accept
+                 * it in cluster mode. */
+                if (server.cluster_enabled) {
+                    addReplyError(c,"BY option of SORT denied in Cluster mode.");
+                    syntax_error++;
+                    break;
+                }
+            }
             j++;
 
 		// GET 选项
         } else if (!strcasecmp(c->argv[j]->ptr,"get") && leftargs >= 1) {
+
 			// 创建一个 GET 操作
+
+            // 不能在集群模式下使用 GET 选项
+            if (server.cluster_enabled) {
+                addReplyError(c,"GET option of SORT denied in Cluster mode.");
+                syntax_error++;
+                break;
+            }
             listAddNodeTail(operations,createSortOperation(
                 REDIS_SORT_GET,c->argv[j+1]));
             getop++;
@@ -348,15 +373,21 @@ void sortCommand(redisClient *c) {
 
 		// 未知选项，语法出错
         } else {
-            decrRefCount(sortval);
-            listRelease(operations);
             addReply(c,shared.syntaxerr);
-            return;
+            syntax_error++;
+            break;
         }
 
         j++;
     }
 
+    /* Handle syntax errors set during options parsing. */
+    if (syntax_error) {
+        decrRefCount(sortval);
+        listRelease(operations);
+        return;
+    }
+
     /* For the STORE option, or when SORT is called from a Lua script,
      * we want to force a specific ordering even when no explicit ordering
      * was asked (SORT BY nosort). This guarantees that replication / AOF
diff --git a/src/t_string.c b/src/t_string.c
index 6c0a54f23..71094ce89 100644
--- a/src/t_string.c
+++ b/src/t_string.c
@@ -303,16 +303,7 @@ void setrangeCommand(redisClient *c) {
             return;
 
         /* Create a copy when the object is shared or encoded. */
-        // 如果值对象是共享对象，或者已经被编码的话
-        // 那么创建值对象的一个副本，并用副本对象替换原对象
-        if (o->refcount != 1 || o->encoding != REDIS_ENCODING_RAW) {
-            robj *decoded = getDecodedObject(o);
-            // 创建副本
-            o = createRawStringObject(decoded->ptr, sdslen(decoded->ptr));
-            decrRefCount(decoded);
-            // 用新对象覆盖原有对象
-            dbOverwrite(c->db,c->argv[1],o);
-        }
+        o = dbUnshareStringValue(c->db,c->argv[1],o);
     }
 
     // 这里的 sdslen(value) > 0 其实可以去掉
@@ -618,18 +609,9 @@ void appendCommand(redisClient *c) {
         if (checkStringLength(c,totlen) != REDIS_OK)
             return;
 
-        /* If the object is shared or encoded, we have to make a copy */
-        // 如果对象是共享对象，或者是已编码对象，
-        // 那么创建一个这个对象的副本，并用这个副本替换原来的对象
-        if (o->refcount != 1 || o->encoding != REDIS_ENCODING_RAW) {
-            robj *decoded = getDecodedObject(o);
-            o = createRawStringObject(decoded->ptr, sdslen(decoded->ptr));
-            decrRefCount(decoded);
-            dbOverwrite(c->db,c->argv[1],o);
-        }
-
         /* Append the value */
         // 执行追加操作
+        o = dbUnshareStringValue(c->db,c->argv[1],o);
         o->ptr = sdscatlen(o->ptr,append->ptr,sdslen(append->ptr));
         totlen = sdslen(o->ptr);
     }
diff --git a/src/t_zset.c b/src/t_zset.c
index 329a8b2fa..6085607d9 100644
--- a/src/t_zset.c
+++ b/src/t_zset.c
@@ -72,6 +72,9 @@
 #include "redis.h"
 #include <math.h>
 
+static int zslLexValueGteMin(robj *value, zlexrangespec *spec);
+static int zslLexValueLteMax(robj *value, zlexrangespec *spec);
+
 /*
  * 创建一个层数为 level 的跳跃表节点，
  * 并将节点的成员对象设置为 obj ，分值设置为 score 。
@@ -443,21 +446,18 @@ int zslIsInRange(zskiplist *zsl, zrangespec *range) {
 /* Find the first node that is contained in the specified range.
  *
  * 返回 zsl 中第一个分值符合 range 中指定范围的节点。
- * Returns NULL when no element is contained in the range. 
+ * Returns NULL when no element is contained in the range.
  *
  * 如果 zsl 中没有符合范围的节点，返回 NULL 。
  *
  * T_wrost = O(N), T_avg = O(log N)
  */
-zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec range) {
+zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec *range) {
     zskiplistNode *x;
     int i;
 
     /* If everything is out of range, return early. */
-    // 先确保跳跃表中至少有一个节点符合 range 指定的范围，
-    // 否则直接失败
-    // T = O(1)
-    if (!zslIsInRange(zsl,&range)) return NULL;
+    if (!zslIsInRange(zsl,range)) return NULL;
 
     // 遍历跳跃表，查找符合范围 min 项的节点
     // T_wrost = O(N), T_avg = O(log N)
@@ -465,7 +465,7 @@ zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec range) {
     for (i = zsl->level-1; i >= 0; i--) {
         /* Go forward while *OUT* of range. */
         while (x->level[i].forward &&
-            !zslValueGteMin(x->level[i].forward->score,&range))
+            !zslValueGteMin(x->level[i].forward->score,range))
                 x = x->level[i].forward;
     }
 
@@ -476,13 +476,12 @@ zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec range) {
     /* Check if score <= max. */
     // 检查节点是否符合范围的 max 项
     // T = O(1)
-    if (!zslValueLteMax(x->score,&range)) return NULL;
-
+    if (!zslValueLteMax(x->score,range)) return NULL;
     return x;
 }
 
 /* Find the last node that is contained in the specified range.
- * Returns NULL when no element is contained in the range. 
+ * Returns NULL when no element is contained in the range.
  *
  * 返回 zsl 中最后一个分值符合 range 中指定范围的节点。
  *
@@ -490,7 +489,7 @@ zskiplistNode *zslFirstInRange(zskiplist *zsl, zrangespec range) {
  *
  * T_wrost = O(N), T_avg = O(log N)
  */
-zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec range) {
+zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec *range) {
     zskiplistNode *x;
     int i;
 
@@ -498,7 +497,7 @@ zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec range) {
     // 先确保跳跃表中至少有一个节点符合 range 指定的范围，
     // 否则直接失败
     // T = O(1)
-    if (!zslIsInRange(zsl,&range)) return NULL;
+    if (!zslIsInRange(zsl,range)) return NULL;
 
     // 遍历跳跃表，查找符合范围 max 项的节点
     // T_wrost = O(N), T_avg = O(log N)
@@ -506,7 +505,7 @@ zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec range) {
     for (i = zsl->level-1; i >= 0; i--) {
         /* Go forward while *IN* range. */
         while (x->level[i].forward &&
-            zslValueLteMax(x->level[i].forward->score,&range))
+            zslValueLteMax(x->level[i].forward->score,range))
                 x = x->level[i].forward;
     }
 
@@ -516,7 +515,7 @@ zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec range) {
     /* Check if score >= min. */
     // 检查节点是否符合范围的 min 项
     // T = O(1)
-    if (!zslValueGteMin(x->score,&range)) return NULL;
+    if (!zslValueGteMin(x->score,range)) return NULL;
 
     // 返回节点
     return x;
@@ -531,7 +530,7 @@ zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec range) {
  * min 和 max 参数都是包含在范围之内的，所以分值 >= min 或 <= max 的节点都会被删除。
  *
  * Note that this function takes the reference to the hash table view of the
- * sorted set, in order to remove the elements from the hash table too. 
+ * sorted set, in order to remove the elements from the hash table too.
  *
  * 节点不仅会从跳跃表中删除，而且会从相应的字典中删除。
  *
@@ -539,7 +538,7 @@ zskiplistNode *zslLastInRange(zskiplist *zsl, zrangespec range) {
  *
  * T = O(N)
  */
-unsigned long zslDeleteRangeByScore(zskiplist *zsl, zrangespec range, dict *dict) {
+unsigned long zslDeleteRangeByScore(zskiplist *zsl, zrangespec *range, dict *dict) {
     zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
     unsigned long removed = 0;
     int i;
@@ -548,9 +547,9 @@ unsigned long zslDeleteRangeByScore(zskiplist *zsl, zrangespec range, dict *dict
     // T_wrost = O(N) , T_avg = O(log N)
     x = zsl->header;
     for (i = zsl->level-1; i >= 0; i--) {
-        while (x->level[i].forward && (range.minex ?
-            x->level[i].forward->score <= range.min :
-            x->level[i].forward->score < range.min))
+        while (x->level[i].forward && (range->minex ?
+            x->level[i].forward->score <= range->min :
+            x->level[i].forward->score < range->min))
                 x = x->level[i].forward;
         update[i] = x;
     }
@@ -562,10 +561,40 @@ unsigned long zslDeleteRangeByScore(zskiplist *zsl, zrangespec range, dict *dict
     /* Delete nodes while in range. */
     // 删除范围中的所有节点
     // T = O(N)
-    while (x && (range.maxex ? x->score < range.max : x->score <= range.max)) {
-
+    while (x &&
+           (range->maxex ? x->score < range->max : x->score <= range->max))
+    {
         // 记录下个节点的指针
         zskiplistNode *next = x->level[0].forward;
+        zslDeleteNode(zsl,x,update);
+        dictDelete(dict,x->obj);
+        zslFreeNode(x);
+        removed++;
+        x = next;
+    }
+    return removed;
+}
+
+unsigned long zslDeleteRangeByLex(zskiplist *zsl, zlexrangespec *range, dict *dict) {
+    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
+    unsigned long removed = 0;
+    int i;
+
+
+    x = zsl->header;
+    for (i = zsl->level-1; i >= 0; i--) {
+        while (x->level[i].forward &&
+            !zslLexValueGteMin(x->level[i].forward->obj,range))
+                x = x->level[i].forward;
+        update[i] = x;
+    }
+
+    /* Current node is the last with score < or <= min. */
+    x = x->level[0].forward;
+
+    /* Delete nodes while in range. */
+    while (x && zslLexValueLteMax(x->obj,range)) {
+        zskiplistNode *next = x->level[0].forward;
 
         // 从跳跃表中删除当前节点
         zslDeleteNode(zsl,x,update);
@@ -786,6 +815,171 @@ static int zslParseRange(robj *min, robj *max, zrangespec *spec) {
     return REDIS_OK;
 }
 
+/* ------------------------ Lexicographic ranges ---------------------------- */
+
+/* Parse max or min argument of ZRANGEBYLEX.
+  * (foo means foo (open interval)
+  * [foo means foo (closed interval)
+  * - means the min string possible
+  * + means the max string possible
+  *
+  * If the string is valid the *dest pointer is set to the redis object
+  * that will be used for the comparision, and ex will be set to 0 or 1
+  * respectively if the item is exclusive or inclusive. REDIS_OK will be
+  * returned.
+  *
+  * If the string is not a valid range REDIS_ERR is returned, and the value
+  * of *dest and *ex is undefined. */
+int zslParseLexRangeItem(robj *item, robj **dest, int *ex) {
+    char *c = item->ptr;
+
+    switch(c[0]) {
+    case '+':
+        if (c[1] != '\0') return REDIS_ERR;
+        *ex = 0;
+        *dest = shared.maxstring;
+        incrRefCount(shared.maxstring);
+        return REDIS_OK;
+    case '-':
+        if (c[1] != '\0') return REDIS_ERR;
+        *ex = 0;
+        *dest = shared.minstring;
+        incrRefCount(shared.minstring);
+        return REDIS_OK;
+    case '(':
+        *ex = 1;
+        *dest = createStringObject(c+1,sdslen(c)-1);
+        return REDIS_OK;
+    case '[':
+        *ex = 0;
+        *dest = createStringObject(c+1,sdslen(c)-1);
+        return REDIS_OK;
+    default:
+        return REDIS_ERR;
+    }
+}
+
+/* Populate the rangespec according to the objects min and max.
+ *
+ * Return REDIS_OK on success. On error REDIS_ERR is returned.
+ * When OK is returned the structure must be freed with zslFreeLexRange(),
+ * otherwise no release is needed. */
+static int zslParseLexRange(robj *min, robj *max, zlexrangespec *spec) {
+    /* The range can't be valid if objects are integer encoded.
+     * Every item must start with ( or [. */
+    if (min->encoding == REDIS_ENCODING_INT ||
+        max->encoding == REDIS_ENCODING_INT) return REDIS_ERR;
+
+    spec->min = spec->max = NULL;
+    if (zslParseLexRangeItem(min, &spec->min, &spec->minex) == REDIS_ERR ||
+        zslParseLexRangeItem(max, &spec->max, &spec->maxex) == REDIS_ERR) {
+        if (spec->min) decrRefCount(spec->min);
+        if (spec->max) decrRefCount(spec->max);
+        return REDIS_ERR;
+    } else {
+        return REDIS_OK;
+    }
+}
+
+/* Free a lex range structure, must be called only after zelParseLexRange()
+ * populated the structure with success (REDIS_OK returned). */
+void zslFreeLexRange(zlexrangespec *spec) {
+    decrRefCount(spec->min);
+    decrRefCount(spec->max);
+}
+
+/* This is just a wrapper to compareStringObjects() that is able to
+ * handle shared.minstring and shared.maxstring as the equivalent of
+ * -inf and +inf for strings */
+int compareStringObjectsForLexRange(robj *a, robj *b) {
+    if (a == b) return 0; /* This makes sure that we handle inf,inf and
+                             -inf,-inf ASAP. One special case less. */
+    if (a == shared.minstring || b == shared.maxstring) return -1;
+    if (a == shared.maxstring || b == shared.minstring) return 1;
+    return compareStringObjects(a,b);
+}
+
+static int zslLexValueGteMin(robj *value, zlexrangespec *spec) {
+    return spec->minex ?
+        (compareStringObjectsForLexRange(value,spec->min) > 0) :
+        (compareStringObjectsForLexRange(value,spec->min) >= 0);
+}
+
+static int zslLexValueLteMax(robj *value, zlexrangespec *spec) {
+    return spec->maxex ?
+        (compareStringObjectsForLexRange(value,spec->max) < 0) :
+        (compareStringObjectsForLexRange(value,spec->max) <= 0);
+}
+
+/* Returns if there is a part of the zset is in the lex range. */
+int zslIsInLexRange(zskiplist *zsl, zlexrangespec *range) {
+    zskiplistNode *x;
+
+    /* Test for ranges that will always be empty. */
+    if (compareStringObjectsForLexRange(range->min,range->max) > 1 ||
+            (compareStringObjects(range->min,range->max) == 0 &&
+            (range->minex || range->maxex)))
+        return 0;
+    x = zsl->tail;
+    if (x == NULL || !zslLexValueGteMin(x->obj,range))
+        return 0;
+    x = zsl->header->level[0].forward;
+    if (x == NULL || !zslLexValueLteMax(x->obj,range))
+        return 0;
+    return 1;
+}
+
+/* Find the first node that is contained in the specified lex range.
+ * Returns NULL when no element is contained in the range. */
+zskiplistNode *zslFirstInLexRange(zskiplist *zsl, zlexrangespec *range) {
+    zskiplistNode *x;
+    int i;
+
+    /* If everything is out of range, return early. */
+    if (!zslIsInLexRange(zsl,range)) return NULL;
+
+    x = zsl->header;
+    for (i = zsl->level-1; i >= 0; i--) {
+        /* Go forward while *OUT* of range. */
+        while (x->level[i].forward &&
+            !zslLexValueGteMin(x->level[i].forward->obj,range))
+                x = x->level[i].forward;
+    }
+
+    /* This is an inner range, so the next node cannot be NULL. */
+    x = x->level[0].forward;
+    redisAssert(x != NULL);
+
+    /* Check if score <= max. */
+    if (!zslLexValueLteMax(x->obj,range)) return NULL;
+    return x;
+}
+
+/* Find the last node that is contained in the specified range.
+ * Returns NULL when no element is contained in the range. */
+zskiplistNode *zslLastInLexRange(zskiplist *zsl, zlexrangespec *range) {
+    zskiplistNode *x;
+    int i;
+
+    /* If everything is out of range, return early. */
+    if (!zslIsInLexRange(zsl,range)) return NULL;
+
+    x = zsl->header;
+    for (i = zsl->level-1; i >= 0; i--) {
+        /* Go forward while *IN* range. */
+        while (x->level[i].forward &&
+            zslLexValueLteMax(x->level[i].forward->obj,range))
+                x = x->level[i].forward;
+    }
+
+    /* This is an inner range, so this node cannot be NULL. */
+    redisAssert(x != NULL);
+
+    /* Check if score >= min. */
+    if (!zslLexValueGteMin(x->obj,range)) return NULL;
+    return x;
+}
+
 /*-----------------------------------------------------------------------------
  * Ziplist-backed sorted set API
  *----------------------------------------------------------------------------*/
@@ -817,6 +1011,24 @@ double zzlGetScore(unsigned char *sptr) {
     return score;
 }
 
+/* Return a ziplist element as a Redis string object.
+ * This simple abstraction can be used to simplifies some code at the
+ * cost of some performance. */
+robj *ziplistGetObject(unsigned char *sptr) {
+    unsigned char *vstr;
+    unsigned int vlen;
+    long long vlong;
+
+    redisAssert(sptr != NULL);
+    redisAssert(ziplistGet(sptr,&vstr,&vlen,&vlong));
+
+    if (vstr) {
+        return createStringObject((char*)vstr,vlen);
+    } else {
+        return createStringObjectFromLongLong(vlong);
+    }
+}
+
 /* Compare element in sorted set with given element. 
  *
  * 将 eptr 中的元素和 cstr 进行对比。
@@ -945,17 +1157,17 @@ int zzlIsInRange(unsigned char *zl, zrangespec *range) {
  * 返回第一个 score 值在给定范围内的节点
  *
  * Returns NULL when no element is contained in the range. 
+ * Returns NULL when no element is contained in the range.
  *
  * 如果没有节点的 score 值在给定范围，返回 NULL 。
  */
-unsigned char *zzlFirstInRange(unsigned char *zl, zrangespec range) {
-
+unsigned char *zzlFirstInRange(unsigned char *zl, zrangespec *range) {
     // 从表头开始遍历
     unsigned char *eptr = ziplistIndex(zl,0), *sptr;
     double score;
 
     /* If everything is out of range, return early. */
-    if (!zzlIsInRange(zl,&range)) return NULL;
+    if (!zzlIsInRange(zl,range)) return NULL;
 
     // 分值在 ziplist 中是从小到大排列的
     // 从表头向表尾遍历
@@ -964,11 +1176,11 @@ unsigned char *zzlFirstInRange(unsigned char *zl, zrangespec range) {
         redisAssert(sptr != NULL);
 
         score = zzlGetScore(sptr);
-        if (zslValueGteMin(score,&range)) {
+        if (zslValueGteMin(score,range)) {
             /* Check if score <= max. */
             // 遇上第一个符合范围的分值，
             // 返回它的节点指针
-            if (zslValueLteMax(score,&range))
+            if (zslValueLteMax(score,range))
                 return eptr;
             return NULL;
         }
@@ -988,14 +1200,13 @@ unsigned char *zzlFirstInRange(unsigned char *zl, zrangespec range) {
  *
  * 没有元素包含它时，返回 NULL
  */
-unsigned char *zzlLastInRange(unsigned char *zl, zrangespec range) {
-
+unsigned char *zzlLastInRange(unsigned char *zl, zrangespec *range) {
     // 从表尾开始遍历
     unsigned char *eptr = ziplistIndex(zl,-2), *sptr;
     double score;
 
     /* If everything is out of range, return early. */
-    if (!zzlIsInRange(zl,&range)) return NULL;
+    if (!zzlIsInRange(zl,range)) return NULL;
 
     // 在有序的 ziplist 里从表尾到表头遍历
     while (eptr != NULL) {
@@ -1004,11 +1215,102 @@ unsigned char *zzlLastInRange(unsigned char *zl, zrangespec range) {
 
         // 获取节点的 score 值
         score = zzlGetScore(sptr);
-        if (zslValueLteMax(score,&range)) {
+        if (zslValueLteMax(score,range)) {
+            /* Check if score >= min. */
+            if (zslValueGteMin(score,range))
+                return eptr;
+            return NULL;
+        }
+
+        /* Move to previous element by moving to the score of previous element.
+         * When this returns NULL, we know there also is no element. */
+        sptr = ziplistPrev(zl,eptr);
+        if (sptr != NULL)
+            redisAssert((eptr = ziplistPrev(zl,sptr)) != NULL);
+        else
+            eptr = NULL;
+    }
+
+    return NULL;
+}
+
+static int zzlLexValueGteMin(unsigned char *p, zlexrangespec *spec) {
+    robj *value = ziplistGetObject(p);
+    int res = zslLexValueGteMin(value,spec);
+    decrRefCount(value);
+    return res;
+}
+
+static int zzlLexValueLteMax(unsigned char *p, zlexrangespec *spec) {
+    robj *value = ziplistGetObject(p);
+    int res = zslLexValueLteMax(value,spec);
+    decrRefCount(value);
+    return res;
+}
+
+/* Returns if there is a part of the zset is in range. Should only be used
+ * internally by zzlFirstInRange and zzlLastInRange. */
+int zzlIsInLexRange(unsigned char *zl, zlexrangespec *range) {
+    unsigned char *p;
+
+    /* Test for ranges that will always be empty. */
+    if (compareStringObjectsForLexRange(range->min,range->max) > 1 ||
+            (compareStringObjects(range->min,range->max) == 0 &&
+            (range->minex || range->maxex)))
+        return 0;
+
+    p = ziplistIndex(zl,-2); /* Last element. */
+    if (p == NULL) return 0;
+    if (!zzlLexValueGteMin(p,range))
+        return 0;
+
+    p = ziplistIndex(zl,0); /* First element. */
+    redisAssert(p != NULL);
+    if (!zzlLexValueLteMax(p,range))
+        return 0;
+
+    return 1;
+}
+
+/* Find pointer to the first element contained in the specified lex range.
+ * Returns NULL when no element is contained in the range. */
+unsigned char *zzlFirstInLexRange(unsigned char *zl, zlexrangespec *range) {
+    unsigned char *eptr = ziplistIndex(zl,0), *sptr;
+
+    /* If everything is out of range, return early. */
+    if (!zzlIsInLexRange(zl,range)) return NULL;
+
+    while (eptr != NULL) {
+        if (zzlLexValueGteMin(eptr,range)) {
+            /* Check if score <= max. */
+            if (zzlLexValueLteMax(eptr,range))
+                return eptr;
+            return NULL;
+        }
+
+        /* Move to next element. */
+        sptr = ziplistNext(zl,eptr); /* This element score. Skip it. */
+        redisAssert(sptr != NULL);
+        eptr = ziplistNext(zl,sptr); /* Next element. */
+    }
+
+    return NULL;
+}
+
+/* Find pointer to the last element contained in the specified lex range.
+ * Returns NULL when no element is contained in the range. */
+unsigned char *zzlLastInLexRange(unsigned char *zl, zlexrangespec *range) {
+    unsigned char *eptr = ziplistIndex(zl,-2), *sptr;
+
+    /* If everything is out of range, return early. */
+    if (!zzlIsInLexRange(zl,range)) return NULL;
+
+    while (eptr != NULL) {
+        if (zzlLexValueLteMax(eptr,range)) {
             /* Check if score >= min. */
             // 找到最后一个符合范围的值
             // 返回它的指针
-            if (zslValueGteMin(score,&range))
+            if (zzlLexValueGteMin(eptr,range))
                 return eptr;
             return NULL;
         }
@@ -1183,7 +1485,7 @@ unsigned char *zzlInsert(unsigned char *zl, robj *ele, double score) {
  *
  * deleted 不为 NULL 时，在删除完毕之后，将被删除元素的数量保存到 *deleted 中。
  */
-unsigned char *zzlDeleteRangeByScore(unsigned char *zl, zrangespec range, unsigned long *deleted) {
+unsigned char *zzlDeleteRangeByScore(unsigned char *zl, zrangespec *range, unsigned long *deleted) {
     unsigned char *eptr, *sptr;
     double score;
     unsigned long num = 0;
@@ -1200,7 +1502,34 @@ unsigned char *zzlDeleteRangeByScore(unsigned char *zl, zrangespec range, unsign
     // 节点中的值都是有序的
     while ((sptr = ziplistNext(zl,eptr)) != NULL) {
         score = zzlGetScore(sptr);
-        if (zslValueLteMax(score,&range)) {
+        if (zslValueLteMax(score,range)) {
+            /* Delete both the element and the score. */
+            zl = ziplistDelete(zl,&eptr);
+            zl = ziplistDelete(zl,&eptr);
+            num++;
+        } else {
+            /* No longer in range. */
+            break;
+        }
+    }
+
+    if (deleted != NULL) *deleted = num;
+    return zl;
+}
+
+unsigned char *zzlDeleteRangeByLex(unsigned char *zl, zlexrangespec *range, unsigned long *deleted) {
+    unsigned char *eptr, *sptr;
+    unsigned long num = 0;
+
+    if (deleted != NULL) *deleted = 0;
+
+    eptr = zzlFirstInLexRange(zl,range);
+    if (eptr == NULL) return zl;
+
+    /* When the tail of the ziplist is deleted, eptr will point to the sentinel
+     * byte and ziplistNext will return NULL. */
+    while ((sptr = ziplistNext(zl,eptr)) != NULL) {
+        if (zzlLexValueLteMax(eptr,range)) {
             /* Delete both the element and the score. */
             zl = ziplistDelete(zl,&eptr);
             zl = ziplistDelete(zl,&eptr);
@@ -1668,28 +1997,69 @@ void zremCommand(redisClient *c) {
     addReplyLongLong(c,deleted);
 }
 
-void zremrangebyscoreCommand(redisClient *c) {
+/* Implements ZREMRANGEBYRANK, ZREMRANGEBYSCORE, ZREMRANGEBYLEX commands. */
+#define ZRANGE_RANK 0
+#define ZRANGE_SCORE 1
+#define ZRANGE_LEX 2
+void zremrangeGenericCommand(redisClient *c, int rangetype) {
     robj *key = c->argv[1];
     robj *zobj;
-    zrangespec range;
     int keyremoved = 0;
     unsigned long deleted;
+    zrangespec range;
+    zlexrangespec lexrange;
+    long start, end, llen;
 
-    /* Parse the range arguments. */
-    // 取出并分析范围
-    if (zslParseRange(c->argv[2],c->argv[3],&range) != REDIS_OK) {
-        addReplyError(c,"min or max is not a float");
-        return;
+    /* Step 1: Parse the range. */
+    if (rangetype == ZRANGE_RANK) {
+        if ((getLongFromObjectOrReply(c,c->argv[2],&start,NULL) != REDIS_OK) ||
+            (getLongFromObjectOrReply(c,c->argv[3],&end,NULL) != REDIS_OK))
+            return;
+    } else if (rangetype == ZRANGE_SCORE) {
+        if (zslParseRange(c->argv[2],c->argv[3],&range) != REDIS_OK) {
+            addReplyError(c,"min or max is not a float");
+            return;
+        }
+    } else if (rangetype == ZRANGE_LEX) {
+        if (zslParseLexRange(c->argv[2],c->argv[3],&lexrange) != REDIS_OK) {
+            addReplyError(c,"min or max not valid string range item");
+            return;
+        }
     }
 
-    // 取出有序集合对象
+    /* Step 2: Lookup & range sanity checks if needed. */
     if ((zobj = lookupKeyWriteOrReply(c,key,shared.czero)) == NULL ||
-        checkType(c,zobj,REDIS_ZSET)) return;
+        checkType(c,zobj,REDIS_ZSET)) goto cleanup;
+
+    if (rangetype == ZRANGE_RANK) {
+        /* Sanitize indexes. */
+        llen = zsetLength(zobj);
+        if (start < 0) start = llen+start;
+        if (end < 0) end = llen+end;
+        if (start < 0) start = 0;
+
+        /* Invariant: start >= 0, so this test will be true when end < 0.
+         * The range is empty when start > end or start >= length. */
+        if (start > end || start >= llen) {
+            addReply(c,shared.czero);
+            goto cleanup;
+        }
+        if (end >= llen) end = llen-1;
+    }
 
-    // 从 ziplist 中删除
+    /* Step 3: Perform the range deletion operation. */
     if (zobj->encoding == REDIS_ENCODING_ZIPLIST) {
-        zobj->ptr = zzlDeleteRangeByScore(zobj->ptr,range,&deleted);
-        // 对象已清空，从数据库中删除
+        switch(rangetype) {
+        case ZRANGE_RANK:
+            zobj->ptr = zzlDeleteRangeByRank(zobj->ptr,start+1,end+1,&deleted);
+            break;
+        case ZRANGE_SCORE:
+            zobj->ptr = zzlDeleteRangeByScore(zobj->ptr,&range,&deleted);
+            break;
+        case ZRANGE_LEX:
+            zobj->ptr = zzlDeleteRangeByLex(zobj->ptr,&lexrange,&deleted);
+            break;
+        }
         if (zzlLength(zobj->ptr) == 0) {
             dbDelete(c->db,key);
             keyremoved = 1;
@@ -1698,9 +2068,17 @@ void zremrangebyscoreCommand(redisClient *c) {
     // 从跳跃表和字典中删除
     } else if (zobj->encoding == REDIS_ENCODING_SKIPLIST) {
         zset *zs = zobj->ptr;
-        deleted = zslDeleteRangeByScore(zs->zsl,range,zs->dict);
-
-        // 检查字典是否需要缩小
+        switch(rangetype) {
+        case ZRANGE_RANK:
+            deleted = zslDeleteRangeByRank(zs->zsl,start+1,end+1,zs->dict);
+            break;
+        case ZRANGE_SCORE:
+            deleted = zslDeleteRangeByScore(zs->zsl,&range,zs->dict);
+            break;
+        case ZRANGE_LEX:
+            deleted = zslDeleteRangeByLex(zs->zsl,&lexrange,zs->dict);
+            break;
+        }
         if (htNeedsResize(zs->dict)) dictResize(zs->dict);
 
         // 对象已清空，从数据库中删除
@@ -1712,13 +2090,11 @@ void zremrangebyscoreCommand(redisClient *c) {
         redisPanic("Unknown sorted set encoding");
     }
 
-    // 如果有至少一个元素被删除的话，那么执行以下代码
+    /* Step 4: Notifications and reply. */
     if (deleted) {
-
+        char *event[3] = {"zremrangebyrank","zremrangebyscore","zremrangebylex"};
         signalModifiedKey(c->db,key);
-
-        notifyKeyspaceEvent(REDIS_NOTIFY_ZSET,"zrembyscore",key,c->db->id);
-
+        notifyKeyspaceEvent(REDIS_NOTIFY_ZSET,event[rangetype],key,c->db->id);
         if (keyremoved)
             notifyKeyspaceEvent(REDIS_NOTIFY_GENERIC,"del",key,c->db->id);
     }
@@ -1727,95 +2103,27 @@ void zremrangebyscoreCommand(redisClient *c) {
 
     // 回复被删除元素的个数
     addReplyLongLong(c,deleted);
+
+cleanup:
+    if (rangetype == ZRANGE_LEX) zslFreeLexRange(&lexrange);
 }
 
 void zremrangebyrankCommand(redisClient *c) {
-    robj *key = c->argv[1];
-    robj *zobj;
-    long start;
-    long end;
-    int llen;
-    unsigned long deleted;
-    int keyremoved = 0;
+    zremrangeGenericCommand(c,ZRANGE_RANK);
+}
 
-    // 取出 start 和 end 索引参数
-    if ((getLongFromObjectOrReply(c, c->argv[2], &start, NULL) != REDIS_OK) ||
-        (getLongFromObjectOrReply(c, c->argv[3], &end, NULL) != REDIS_OK)) return;
+void zremrangebyscoreCommand(redisClient *c) {
+    zremrangeGenericCommand(c,ZRANGE_SCORE);
+}
 
-    // 取出有序集合对象
-    if ((zobj = lookupKeyWriteOrReply(c,key,shared.czero)) == NULL ||
-        checkType(c,zobj,REDIS_ZSET)) return;
+void zremrangebylexCommand(redisClient *c) {
+    zremrangeGenericCommand(c,ZRANGE_LEX);
+}
 
-    /* Sanitize indexes. */
-    // 转换负数索引为正数索引
-    llen = zsetLength(zobj);
-    if (start < 0) start = llen+start;
-    if (end < 0) end = llen+end;
-    if (start < 0) start = 0;
-
-    /* Invariant: start >= 0, so this test will be true when end < 0.
-     * The range is empty when start > end or start >= length. */
-    // 当索引超出正常范围时，直接返回
-    if (start > end || start >= llen) {
-        addReply(c,shared.czero);
-        return;
-    }
-
-    // 处理 end 大于元素数量的情况
-    if (end >= llen) end = llen-1;
-
-    // 从 ZIPLIST 中删除
-    if (zobj->encoding == REDIS_ENCODING_ZIPLIST) {
-
-        /* Correct for 1-based rank. */
-        // 将索引的起始值设为 1 
-        zobj->ptr = zzlDeleteRangeByRank(zobj->ptr,start+1,end+1,&deleted);
-
-        // ziplist 已被清空，将有序集合对象从数据库中删除
-        if (zzlLength(zobj->ptr) == 0) {
-            dbDelete(c->db,key);
-            keyremoved = 1;
-        }
-
-    // 从跳跃表和字典中删除
-    } else if (zobj->encoding == REDIS_ENCODING_SKIPLIST) {
-        zset *zs = zobj->ptr;
-
-        /* Correct for 1-based rank. */
-        // 将索引的起始值设为 1 
-        deleted = zslDeleteRangeByRank(zs->zsl,start+1,end+1,zs->dict);
-
-        // 看是否需要缩小字典
-        if (htNeedsResize(zs->dict)) dictResize(zs->dict);
-
-        // 字典已被清空，将有序集合对象从数据库中删除
-        if (dictSize(zs->dict) == 0) {
-            dbDelete(c->db,key);
-            keyremoved = 1;
-        }
-
-    } else {
-        redisPanic("Unknown sorted set encoding");
-    }
-
-    // 如果有至少一个元素被删除的话，那么执行以下代码
-    if (deleted) {
-        signalModifiedKey(c->db,key);
-        notifyKeyspaceEvent(REDIS_NOTIFY_ZSET,"zrembyrank",key,c->db->id);
-        if (keyremoved)
-            notifyKeyspaceEvent(REDIS_NOTIFY_GENERIC,"del",key,c->db->id);
-    }
-
-    server.dirty += deleted;
-
-    // 回复被删除元素的个数
-    addReplyLongLong(c,deleted);
-}
-
-/*
- * 多态集合迭代器：可迭代集合或者有序集合
- */
-typedef struct {
+/*
+ * 多态集合迭代器：可迭代集合或者有序集合
+ */
+typedef struct {
 
     // 被迭代的对象
     robj *subject;
@@ -2844,9 +3152,9 @@ void genericZrangebyscoreCommand(redisClient *c, int reverse) {
         /* If reversed, get the last node in range as starting point. */
         // 迭代的方向
         if (reverse) {
-            eptr = zzlLastInRange(zl,range);
+            eptr = zzlLastInRange(zl,&range);
         } else {
-            eptr = zzlFirstInRange(zl,range);
+            eptr = zzlFirstInRange(zl,&range);
         }
 
         /* No "first" element in the specified interval. */
@@ -2920,9 +3228,9 @@ void genericZrangebyscoreCommand(redisClient *c, int reverse) {
         /* If reversed, get the last node in range as starting point. */
         // 方向
         if (reverse) {
-            ln = zslLastInRange(zsl,range);
+            ln = zslLastInRange(zsl,&range);
         } else {
-            ln = zslFirstInRange(zsl,range);
+            ln = zslFirstInRange(zsl,&range);
         }
 
         /* No "first" element in the specified interval. */
@@ -3015,7 +3323,7 @@ void zcountCommand(redisClient *c) {
 
         /* Use the first element in range as the starting point */
         // 指向指定范围内第一个元素的成员
-        eptr = zzlFirstInRange(zl,range);
+        eptr = zzlFirstInRange(zl,&range);
 
         /* No "first" element */
         // 没有任何元素在这个范围内，直接返回
@@ -3057,7 +3365,7 @@ void zcountCommand(redisClient *c) {
 
         /* Find first element in range */
         // 指向指定范围内第一个元素
-        zn = zslFirstInRange(zsl, range);
+        zn = zslFirstInRange(zsl, &range);
 
         /* Use rank of first element, if any, to determine preliminary count */
         // 如果有至少一个元素在范围内，那么执行以下代码
@@ -3069,7 +3377,7 @@ void zcountCommand(redisClient *c) {
 
             /* Find last element in range */
             // 指向指定范围内的最后一个元素
-            zn = zslLastInRange(zsl, range);
+            zn = zslLastInRange(zsl, &range);
 
             /* Use rank of last element, if any, to determine the actual count */
             // 如果范围内的最后一个元素不为空，那么执行以下代码
@@ -3090,6 +3398,270 @@ void zcountCommand(redisClient *c) {
     addReplyLongLong(c, count);
 }
 
+void zlexcountCommand(redisClient *c) {
+    robj *key = c->argv[1];
+    robj *zobj;
+    zlexrangespec range;
+    int count = 0;
+
+    /* Parse the range arguments */
+    if (zslParseLexRange(c->argv[2],c->argv[3],&range) != REDIS_OK) {
+        addReplyError(c,"min or max not valid string range item");
+        return;
+    }
+
+    /* Lookup the sorted set */
+    if ((zobj = lookupKeyReadOrReply(c, key, shared.czero)) == NULL ||
+        checkType(c, zobj, REDIS_ZSET))
+    {
+        zslFreeLexRange(&range);
+        return;
+    }
+
+    if (zobj->encoding == REDIS_ENCODING_ZIPLIST) {
+        unsigned char *zl = zobj->ptr;
+        unsigned char *eptr, *sptr;
+
+        /* Use the first element in range as the starting point */
+        eptr = zzlFirstInLexRange(zl,&range);
+
+        /* No "first" element */
+        if (eptr == NULL) {
+            zslFreeLexRange(&range);
+            addReply(c, shared.czero);
+            return;
+        }
+
+        /* First element is in range */
+        sptr = ziplistNext(zl,eptr);
+        redisAssertWithInfo(c,zobj,zzlLexValueLteMax(eptr,&range));
+
+        /* Iterate over elements in range */
+        while (eptr) {
+            /* Abort when the node is no longer in range. */
+            if (!zzlLexValueLteMax(eptr,&range)) {
+                break;
+            } else {
+                count++;
+                zzlNext(zl,&eptr,&sptr);
+            }
+        }
+    } else if (zobj->encoding == REDIS_ENCODING_SKIPLIST) {
+        zset *zs = zobj->ptr;
+        zskiplist *zsl = zs->zsl;
+        zskiplistNode *zn;
+        unsigned long rank;
+
+        /* Find first element in range */
+        zn = zslFirstInLexRange(zsl, &range);
+
+        /* Use rank of first element, if any, to determine preliminary count */
+        if (zn != NULL) {
+            rank = zslGetRank(zsl, zn->score, zn->obj);
+            count = (zsl->length - (rank - 1));
+
+            /* Find last element in range */
+            zn = zslLastInLexRange(zsl, &range);
+
+            /* Use rank of last element, if any, to determine the actual count */
+            if (zn != NULL) {
+                rank = zslGetRank(zsl, zn->score, zn->obj);
+                count -= (zsl->length - rank);
+            }
+        }
+    } else {
+        redisPanic("Unknown sorted set encoding");
+    }
+
+    zslFreeLexRange(&range);
+    addReplyLongLong(c, count);
+}
+
+/* This command implements ZRANGEBYLEX, ZREVRANGEBYLEX. */
+void genericZrangebylexCommand(redisClient *c, int reverse) {
+    zlexrangespec range;
+    robj *key = c->argv[1];
+    robj *zobj;
+    long offset = 0, limit = -1;
+    unsigned long rangelen = 0;
+    void *replylen = NULL;
+    int minidx, maxidx;
+
+    /* Parse the range arguments. */
+    if (reverse) {
+        /* Range is given as [max,min] */
+        maxidx = 2; minidx = 3;
+    } else {
+        /* Range is given as [min,max] */
+        minidx = 2; maxidx = 3;
+    }
+
+    if (zslParseLexRange(c->argv[minidx],c->argv[maxidx],&range) != REDIS_OK) {
+        addReplyError(c,"min or max not valid string range item");
+        return;
+    }
+
+    /* Parse optional extra arguments. Note that ZCOUNT will exactly have
+     * 4 arguments, so we'll never enter the following code path. */
+    if (c->argc > 4) {
+        int remaining = c->argc - 4;
+        int pos = 4;
+
+        while (remaining) {
+            if (remaining >= 3 && !strcasecmp(c->argv[pos]->ptr,"limit")) {
+                if ((getLongFromObjectOrReply(c, c->argv[pos+1], &offset, NULL) != REDIS_OK) ||
+                    (getLongFromObjectOrReply(c, c->argv[pos+2], &limit, NULL) != REDIS_OK)) return;
+                pos += 3; remaining -= 3;
+            } else {
+                zslFreeLexRange(&range);
+                addReply(c,shared.syntaxerr);
+                return;
+            }
+        }
+    }
+
+    /* Ok, lookup the key and get the range */
+    if ((zobj = lookupKeyReadOrReply(c,key,shared.emptymultibulk)) == NULL ||
+        checkType(c,zobj,REDIS_ZSET))
+    {
+        zslFreeLexRange(&range);
+        return;
+    }
+
+    if (zobj->encoding == REDIS_ENCODING_ZIPLIST) {
+        unsigned char *zl = zobj->ptr;
+        unsigned char *eptr, *sptr;
+        unsigned char *vstr;
+        unsigned int vlen;
+        long long vlong;
+
+        /* If reversed, get the last node in range as starting point. */
+        if (reverse) {
+            eptr = zzlLastInLexRange(zl,&range);
+        } else {
+            eptr = zzlFirstInLexRange(zl,&range);
+        }
+
+        /* No "first" element in the specified interval. */
+        if (eptr == NULL) {
+            addReply(c, shared.emptymultibulk);
+            zslFreeLexRange(&range);
+            return;
+        }
+
+        /* Get score pointer for the first element. */
+        redisAssertWithInfo(c,zobj,eptr != NULL);
+        sptr = ziplistNext(zl,eptr);
+
+        /* We don't know in advance how many matching elements there are in the
+         * list, so we push this object that will represent the multi-bulk
+         * length in the output buffer, and will "fix" it later */
+        replylen = addDeferredMultiBulkLength(c);
+
+        /* If there is an offset, just traverse the number of elements without
+         * checking the score because that is done in the next loop. */
+        while (eptr && offset--) {
+            if (reverse) {
+                zzlPrev(zl,&eptr,&sptr);
+            } else {
+                zzlNext(zl,&eptr,&sptr);
+            }
+        }
+
+        while (eptr && limit--) {
+            /* Abort when the node is no longer in range. */
+            if (reverse) {
+                if (!zzlLexValueGteMin(eptr,&range)) break;
+            } else {
+                if (!zzlLexValueLteMax(eptr,&range)) break;
+            }
+
+            /* We know the element exists, so ziplistGet should always
+             * succeed. */
+            redisAssertWithInfo(c,zobj,ziplistGet(eptr,&vstr,&vlen,&vlong));
+
+            rangelen++;
+            if (vstr == NULL) {
+                addReplyBulkLongLong(c,vlong);
+            } else {
+                addReplyBulkCBuffer(c,vstr,vlen);
+            }
+
+            /* Move to next node */
+            if (reverse) {
+                zzlPrev(zl,&eptr,&sptr);
+            } else {
+                zzlNext(zl,&eptr,&sptr);
+            }
+        }
+    } else if (zobj->encoding == REDIS_ENCODING_SKIPLIST) {
+        zset *zs = zobj->ptr;
+        zskiplist *zsl = zs->zsl;
+        zskiplistNode *ln;
+
+        /* If reversed, get the last node in range as starting point. */
+        if (reverse) {
+            ln = zslLastInLexRange(zsl,&range);
+        } else {
+            ln = zslFirstInLexRange(zsl,&range);
+        }
+
+        /* No "first" element in the specified interval. */
+        if (ln == NULL) {
+            addReply(c, shared.emptymultibulk);
+            zslFreeLexRange(&range);
+            return;
+        }
+
+        /* We don't know in advance how many matching elements there are in the
+         * list, so we push this object that will represent the multi-bulk
+         * length in the output buffer, and will "fix" it later */
+        replylen = addDeferredMultiBulkLength(c);
+
+        /* If there is an offset, just traverse the number of elements without
+         * checking the score because that is done in the next loop. */
+        while (ln && offset--) {
+            if (reverse) {
+                ln = ln->backward;
+            } else {
+                ln = ln->level[0].forward;
+            }
+        }
+
+        while (ln && limit--) {
+            /* Abort when the node is no longer in range. */
+            if (reverse) {
+                if (!zslLexValueGteMin(ln->obj,&range)) break;
+            } else {
+                if (!zslLexValueLteMax(ln->obj,&range)) break;
+            }
+
+            rangelen++;
+            addReplyBulk(c,ln->obj);
+
+            /* Move to next node */
+            if (reverse) {
+                ln = ln->backward;
+            } else {
+                ln = ln->level[0].forward;
+            }
+        }
+    } else {
+        redisPanic("Unknown sorted set encoding");
+    }
+
+    zslFreeLexRange(&range);
+    setDeferredMultiBulkLength(c, replylen, rangelen);
+}
+
+void zrangebylexCommand(redisClient *c) {
+    genericZrangebylexCommand(c,0);
+}
+
+void zrevrangebylexCommand(redisClient *c) {
+    genericZrangebylexCommand(c,1);
+}
+
 void zcardCommand(redisClient *c) {
     robj *key = c->argv[1];
     robj *zobj;
diff --git a/src/util.c b/src/util.c
index 37534dfb9..664c82b80 100644
--- a/src/util.c
+++ b/src/util.c
@@ -402,7 +402,7 @@ void getRandomHexChars(char *p, unsigned int len) {
     /* Turn it into hex digits taking just 4 bits out of 8 for every byte. */
     for (j = 0; j < len; j++)
         p[j] = charset[p[j] & 0x0F];
-    fclose(fp);
+    if (fp) fclose(fp);
 }
 
 /* Given the filename, return the absolute path as an SDS string, or NULL
diff --git a/src/zmalloc.c b/src/zmalloc.c
index e7e97aa67..d0cf726cb 100644
--- a/src/zmalloc.c
+++ b/src/zmalloc.c
@@ -321,8 +321,8 @@ size_t zmalloc_get_rss(void) {
 #endif
 
 /* Fragmentation = RSS / allocated-bytes */
-float zmalloc_get_fragmentation_ratio(void) {
-    return (float)zmalloc_get_rss()/zmalloc_used_memory();
+float zmalloc_get_fragmentation_ratio(size_t rss) {
+    return (float)rss/zmalloc_used_memory();
 }
 
 #if defined(HAVE_PROC_SMAPS)
diff --git a/src/zmalloc.h b/src/zmalloc.h
index 331d8e433..72a4f8138 100644
--- a/src/zmalloc.h
+++ b/src/zmalloc.h
@@ -73,7 +73,7 @@ char *zstrdup(const char *s);
 size_t zmalloc_used_memory(void);
 void zmalloc_enable_thread_safeness(void);
 void zmalloc_set_oom_handler(void (*oom_handler)(size_t));
-float zmalloc_get_fragmentation_ratio(void);
+float zmalloc_get_fragmentation_ratio(size_t rss);
 size_t zmalloc_get_rss(void);
 size_t zmalloc_get_private_dirty(void);
 void zlibc_free(void *ptr);
diff --git a/tests/cluster/cluster.tcl b/tests/cluster/cluster.tcl
new file mode 100644
index 000000000..a7d71ab47
--- /dev/null
+++ b/tests/cluster/cluster.tcl
@@ -0,0 +1,75 @@
+# Cluster-specific test functions.
+#
+# Copyright (C) 2014 Salvatore Sanfilippo antirez@gmail.com
+# This softare is released under the BSD License. See the COPYING file for
+# more information.
+
+# Returns a parsed CLUSTER NODES output as a list of dictionaries.
+proc get_cluster_nodes id {
+    set lines [split [R $id cluster nodes] "\r\n"]
+    set nodes {}
+    foreach l $lines {
+        set l [string trim $l]
+        if {$l eq {}} continue
+        set args [split $l]
+        set node [dict create \
+            id [lindex $args 0] \
+            addr [lindex $args 1] \
+            flags [split [lindex $args 2] ,] \
+            slaveof [lindex $args 3] \
+            ping_sent [lindex $args 4] \
+            pong_recv [lindex $args 5] \
+            config_epoch [lindex $args 6] \
+            linkstate [lindex $args 7] \
+            slots [lrange $args 8 -1] \
+        ]
+        lappend nodes $node
+    }
+    return $nodes
+}
+
+# Test node for flag.
+proc has_flag {node flag} {
+    expr {[lsearch -exact [dict get $node flags] $flag] != -1}
+}
+
+# Returns the parsed myself node entry as a dictionary.
+proc get_myself id {
+    set nodes [get_cluster_nodes $id]
+    foreach n $nodes {
+        if {[has_flag $n myself]} {return $n}
+    }
+    return {}
+}
+
+# Return the value of the specified CLUSTER INFO field.
+proc CI {n field} {
+    get_info_field [R $n cluster info] $field
+}
+
+# Assuming nodes are reest, this function performs slots allocation.
+# Only the first 'n' nodes are used.
+proc cluster_allocate_slots {n} {
+    set slot 16383
+    while {$slot >= 0} {
+        # Allocate successive slots to random nodes.
+        set node [randomInt $n]
+        lappend slots_$node $slot
+        incr slot -1
+    }
+    for {set j 0} {$j < $n} {incr j} {
+        R $j cluster addslots {*}[set slots_${j}]
+    }
+}
+
+# Check that cluster nodes agree about "state", or raise an error.
+proc assert_cluster_state {state} {
+    foreach_redis_id id {
+        if {[instance_is_killed redis $id]} continue
+        wait_for_condition 1000 50 {
+            [CI $id cluster_state] eq $state
+        } else {
+            fail "Cluster node $id cluster_state:[CI $id cluster_state]"
+        }
+    }
+}
diff --git a/tests/cluster/run.tcl b/tests/cluster/run.tcl
new file mode 100644
index 000000000..7b7d014b8
--- /dev/null
+++ b/tests/cluster/run.tcl
@@ -0,0 +1,25 @@
+# Cluster test suite. Copyright (C) 2014 Salvatore Sanfilippo antirez@gmail.com
+# This softare is released under the BSD License. See the COPYING file for
+# more information.
+
+cd tests/cluster
+source cluster.tcl
+source ../instances.tcl
+source ../../support/cluster.tcl ; # Redis Cluster client.
+
+set ::instances_count 20 ; # How many instances we use at max.
+
+proc main {} {
+    parse_options
+    spawn_instance redis $::redis_base_port $::instances_count {
+        "cluster-enabled yes"
+        "appendonly yes"
+    }
+    run_tests
+    cleanup
+}
+
+if {[catch main e]} {
+    puts $::errorInfo
+    cleanup
+}
diff --git a/tests/cluster/tests/00-base.tcl b/tests/cluster/tests/00-base.tcl
new file mode 100644
index 000000000..5048aa230
--- /dev/null
+++ b/tests/cluster/tests/00-base.tcl
@@ -0,0 +1,100 @@
+# Check the basic monitoring and failover capabilities.
+
+source "../tests/includes/init-tests.tcl"
+
+if {$::simulate_error} {
+    test "This test will fail" {
+        fail "Simulated error"
+    }
+}
+
+test "Cluster nodes are reachable" {
+    foreach_redis_id id {
+        # Every node should just know itself.
+        assert {[R $id ping] eq {PONG}}
+    }
+}
+
+test "Different nodes have different IDs" {
+    set ids {}
+    set numnodes 0
+    foreach_redis_id id {
+        incr numnodes
+        # Every node should just know itself.
+        set nodeid [dict get [get_myself $id] id]
+        assert {$nodeid ne {}}
+        lappend ids $nodeid
+    }
+    set numids [llength [lsort -unique $ids]]
+    assert {$numids == $numnodes}
+}
+
+test "Check if nodes auto-discovery works" {
+    # Join node 0 with 1, 1 with 2, ... and so forth.
+    # If auto-discovery works all nodes will know every other node
+    # eventually.
+    set ids {}
+    foreach_redis_id id {lappend ids $id}
+    for {set j 0} {$j < [expr [llength $ids]-1]} {incr j} {
+        set a [lindex $ids $j]
+        set b [lindex $ids [expr $j+1]]
+        set b_port [get_instance_attrib redis $b port]
+        R $a cluster meet 127.0.0.1 $b_port
+    }
+
+    foreach_redis_id id {
+        wait_for_condition 1000 50 {
+            [llength [get_cluster_nodes $id]] == [llength $ids]
+        } else {
+            fail "Cluster failed to join into a full mesh."
+        }
+    }
+}
+
+test "Before slots allocation, all nodes report cluster failure" {
+    assert_cluster_state fail
+}
+
+test "It is possible to perform slot allocation" {
+    cluster_allocate_slots 5
+}
+
+test "After the join, every node gets a different config epoch" {
+    set trynum 60
+    while {[incr trynum -1] != 0} {
+        # We check that this condition is true for *all* the nodes.
+        set ok 1 ; # Will be set to 0 every time a node is not ok.
+        foreach_redis_id id {
+            set epochs {}
+            foreach n [get_cluster_nodes $id] {
+                lappend epochs [dict get $n config_epoch]
+            }
+            if {[lsort $epochs] != [lsort -unique $epochs]} {
+                set ok 0 ; # At least one collision!
+            }
+        }
+        if {$ok} break
+        after 1000
+        puts -nonewline .
+        flush stdout
+    }
+    if {$trynum == 0} {
+        fail "Config epoch conflict resolution is not working."
+    }
+}
+
+test "Nodes should report cluster_state is ok now" {
+    assert_cluster_state ok
+}
+
+test "It is possible to write and read from the cluster" {
+    set port [get_instance_attrib redis 0 port]
+    set cluster [redis_cluster 127.0.0.1:$port]
+    for {set j 0} {$j < 100} {incr j} {
+        $cluster set key.$j $j
+    }
+    for {set j 0} {$j < 100} {incr j} {
+        assert {[$cluster get key.$j] eq $j}
+    }
+    $cluster close
+}
diff --git a/tests/cluster/tests/includes/init-tests.tcl b/tests/cluster/tests/includes/init-tests.tcl
new file mode 100644
index 000000000..e3d5471d3
--- /dev/null
+++ b/tests/cluster/tests/includes/init-tests.tcl
@@ -0,0 +1,13 @@
+# Initialization tests -- most units will start including this.
+
+test "(init) Restart killed instances" {
+    foreach type {redis} {
+        foreach_${type}_id id {
+            if {[get_instance_attrib $type $id pid] == -1} {
+                puts -nonewline "$type/$id "
+                flush stdout
+                restart_instance $type $id
+            }
+        }
+    }
+}
diff --git a/tests/instances.tcl b/tests/instances.tcl
new file mode 100644
index 000000000..4575c8388
--- /dev/null
+++ b/tests/instances.tcl
@@ -0,0 +1,394 @@
+# Multi-instance test framework.
+# This is used in order to test Sentinel and Redis Cluster, and provides
+# basic capabilities for spawning and handling N parallel Redis / Sentinel
+# instances.
+#
+# Copyright (C) 2014 Salvatore Sanfilippo antirez@gmail.com
+# This softare is released under the BSD License. See the COPYING file for
+# more information.
+
+package require Tcl 8.5
+
+set tcl_precision 17
+source ../support/redis.tcl
+source ../support/util.tcl
+source ../support/server.tcl
+source ../support/test.tcl
+
+set ::verbose 0
+set ::pause_on_error 0
+set ::simulate_error 0
+set ::sentinel_instances {}
+set ::redis_instances {}
+set ::sentinel_base_port 20000
+set ::redis_base_port 30000
+set ::pids {} ; # We kill everything at exit
+set ::dirs {} ; # We remove all the temp dirs at exit
+set ::run_matching {} ; # If non empty, only tests matching pattern are run.
+
+if {[catch {cd tmp}]} {
+    puts "tmp directory not found."
+    puts "Please run this test from the Redis source root."
+    exit 1
+}
+
+# Spawn a redis or sentinel instance, depending on 'type'.
+proc spawn_instance {type base_port count {conf {}}} {
+    for {set j 0} {$j < $count} {incr j} {
+        set port [find_available_port $base_port]
+        incr base_port
+        puts "Starting $type #$j at port $port"
+
+        # Create a directory for this instance.
+        set dirname "${type}_${j}"
+        lappend ::dirs $dirname
+        catch {exec rm -rf $dirname}
+        file mkdir $dirname
+
+        # Write the instance config file.
+        set cfgfile [file join $dirname $type.conf]
+        set cfg [open $cfgfile w]
+        puts $cfg "port $port"
+        puts $cfg "dir ./$dirname"
+        puts $cfg "logfile log.txt"
+        # Add additional config files
+        foreach directive $conf {
+            puts $cfg $directive
+        }
+        close $cfg
+
+        # Finally exec it and remember the pid for later cleanup.
+        if {$type eq "redis"} {
+            set prgname redis-server
+        } elseif {$type eq "sentinel"} {
+            set prgname redis-sentinel
+        } else {
+            error "Unknown instance type."
+        }
+        set pid [exec ../../../src/${prgname} $cfgfile &]
+        lappend ::pids $pid
+
+        # Check availability
+        if {[server_is_up 127.0.0.1 $port 100] == 0} {
+            abort_sentinel_test "Problems starting $type #$j: ping timeout"
+        }
+
+        # Push the instance into the right list
+        lappend ::${type}_instances [list \
+            pid $pid \
+            host 127.0.0.1 \
+            port $port \
+            link [redis 127.0.0.1 $port] \
+        ]
+    }
+}
+
+proc cleanup {} {
+    puts "Cleaning up..."
+    foreach pid $::pids {
+        catch {exec kill -9 $pid}
+    }
+    foreach dir $::dirs {
+        catch {exec rm -rf $dir}
+    }
+}
+
+proc abort_sentinel_test msg {
+    puts "WARNING: Aborting the test."
+    puts ">>>>>>>> $msg"
+    cleanup
+    exit 1
+}
+
+proc parse_options {} {
+    for {set j 0} {$j < [llength $::argv]} {incr j} {
+        set opt [lindex $::argv $j]
+        set val [lindex $::argv [expr $j+1]]
+        if {$opt eq "--single"} {
+            incr j
+            set ::run_matching "*${val}*"
+        } elseif {$opt eq "--pause-on-error"} {
+            set ::pause_on_error 1
+        } elseif {$opt eq "--fail"} {
+            set ::simulate_error 1
+        } elseif {$opt eq "--help"} {
+            puts "Hello, I'm sentinel.tcl and I run Sentinel unit tests."
+            puts "\nOptions:"
+            puts "--single <pattern>      Only runs tests specified by pattern."
+            puts "--pause-on-error        Pause for manual inspection on error."
+            puts "--fail                  Simulate a test failure."
+            puts "--help                  Shows this help."
+            exit 0
+        } else {
+            puts "Unknown option $opt"
+            exit 1
+        }
+    }
+}
+
+# If --pause-on-error option was passed at startup this function is called
+# on error in order to give the developer a chance to understand more about
+# the error condition while the instances are still running.
+proc pause_on_error {} {
+    puts ""
+    puts [colorstr yellow "*** Please inspect the error now ***"]
+    puts "\nType \"continue\" to resume the test, \"help\" for help screen.\n"
+    while 1 {
+        puts -nonewline "> "
+        flush stdout
+        set line [gets stdin]
+        set argv [split $line " "]
+        set cmd [lindex $argv 0]
+        if {$cmd eq {continue}} {
+            break
+        } elseif {$cmd eq {show-sentinel-logs}} {
+            set count 10
+            if {[lindex $argv 1] ne {}} {set count [lindex $argv 1]}
+            foreach_sentinel_id id {
+                puts "=== SENTINEL $id ===="
+                puts [exec tail -$count sentinel_$id/log.txt]
+                puts "---------------------\n"
+            }
+        } elseif {$cmd eq {ls}} {
+            foreach_redis_id id {
+                puts -nonewline "Redis $id"
+                set errcode [catch {
+                    set str {}
+                    append str "@[RI $id tcp_port]: "
+                    append str "[RI $id role] "
+                    if {[RI $id role] eq {slave}} {
+                        append str "[RI $id master_host]:[RI $id master_port]"
+                    }
+                    set str
+                } retval]
+                if {$errcode} {
+                    puts " -- $retval"
+                } else {
+                    puts $retval
+                }
+            }
+            foreach_sentinel_id id {
+                puts -nonewline "Sentinel $id"
+                set errcode [catch {
+                    set str {}
+                    append str "@[SI $id tcp_port]: "
+                    append str "[join [S $id sentinel get-master-addr-by-name mymaster]]"
+                    set str
+                } retval]
+                if {$errcode} {
+                    puts " -- $retval"
+                } else {
+                    puts $retval
+                }
+            }
+        } elseif {$cmd eq {help}} {
+            puts "ls                     List Sentinel and Redis instances."
+            puts "show-sentinel-logs \[N\] Show latest N lines of logs."
+            puts "S <id> cmd ... arg     Call command in Sentinel <id>."
+            puts "R <id> cmd ... arg     Call command in Redis <id>."
+            puts "SI <id> <field>        Show Sentinel <id> INFO <field>."
+            puts "RI <id> <field>        Show Sentinel <id> INFO <field>."
+            puts "continue               Resume test."
+        } else {
+            set errcode [catch {eval $line} retval]
+            if {$retval ne {}} {puts "$retval"}
+        }
+    }
+}
+
+# We redefine 'test' as for Sentinel we don't use the server-client
+# architecture for the test, everything is sequential.
+proc test {descr code} {
+    set ts [clock format [clock seconds] -format %H:%M:%S]
+    puts -nonewline "$ts> $descr: "
+    flush stdout
+
+    if {[catch {set retval [uplevel 1 $code]} error]} {
+        if {[string match "assertion:*" $error]} {
+            set msg [string range $error 10 end]
+            puts [colorstr red $msg]
+            if {$::pause_on_error} pause_on_error
+            puts "(Jumping to next unit after error)"
+            return -code continue
+        } else {
+            # Re-raise, let handler up the stack take care of this.
+            error $error $::errorInfo
+        }
+    } else {
+        puts [colorstr green OK]
+    }
+}
+
+proc run_tests {} {
+    set tests [lsort [glob ../tests/*]]
+    foreach test $tests {
+        if {$::run_matching ne {} && [string match $::run_matching $test] == 0} {
+            continue
+        }
+        if {[file isdirectory $test]} continue
+        puts [colorstr yellow "Testing unit: [lindex [file split $test] end]"]
+        source $test
+    }
+}
+
+# The "S" command is used to interact with the N-th Sentinel.
+# The general form is:
+#
+# S <sentinel-id> command arg arg arg ...
+#
+# Example to ping the Sentinel 0 (first instance): S 0 PING
+proc S {n args} {
+    set s [lindex $::sentinel_instances $n]
+    [dict get $s link] {*}$args
+}
+
+# Like R but to chat with Redis instances.
+proc R {n args} {
+    set r [lindex $::redis_instances $n]
+    [dict get $r link] {*}$args
+}
+
+proc get_info_field {info field} {
+    set fl [string length $field]
+    append field :
+    foreach line [split $info "\n"] {
+        set line [string trim $line "\r\n "]
+        if {[string range $line 0 $fl] eq $field} {
+            return [string range $line [expr {$fl+1}] end]
+        }
+    }
+    return {}
+}
+
+proc SI {n field} {
+    get_info_field [S $n info] $field
+}
+
+proc RI {n field} {
+    get_info_field [R $n info] $field
+}
+
+# Iterate over IDs of sentinel or redis instances.
+proc foreach_instance_id {instances idvar code} {
+    upvar 1 $idvar id
+    for {set id 0} {$id < [llength $instances]} {incr id} {
+        set errcode [catch {uplevel 1 $code} result]
+        if {$errcode == 1} {
+            error $result $::errorInfo $::errorCode
+        } elseif {$errcode == 4} {
+            continue
+        } elseif {$errcode == 3} {
+            break
+        } elseif {$errcode != 0} {
+            return -code $errcode $result
+        }
+    }
+}
+
+proc foreach_sentinel_id {idvar code} {
+    set errcode [catch {uplevel 1 [list foreach_instance_id $::sentinel_instances $idvar $code]} result]
+    return -code $errcode $result
+}
+
+proc foreach_redis_id {idvar code} {
+    set errcode [catch {uplevel 1 [list foreach_instance_id $::redis_instances $idvar $code]} result]
+    return -code $errcode $result
+}
+
+# Get the specific attribute of the specified instance type, id.
+proc get_instance_attrib {type id attrib} {
+    dict get [lindex [set ::${type}_instances] $id] $attrib
+}
+
+# Set the specific attribute of the specified instance type, id.
+proc set_instance_attrib {type id attrib newval} {
+    set d [lindex [set ::${type}_instances] $id]
+    dict set d $attrib $newval
+    lset ::${type}_instances $id $d
+}
+
+# Create a master-slave cluster of the given number of total instances.
+# The first instance "0" is the master, all others are configured as
+# slaves.
+proc create_redis_master_slave_cluster n {
+    foreach_redis_id id {
+        if {$id == 0} {
+            # Our master.
+            R $id slaveof no one
+            R $id flushall
+        } elseif {$id < $n} {
+            R $id slaveof [get_instance_attrib redis 0 host] \
+                          [get_instance_attrib redis 0 port]
+        } else {
+            # Instances not part of the cluster.
+            R $id slaveof no one
+        }
+    }
+    # Wait for all the slaves to sync.
+    wait_for_condition 1000 50 {
+        [RI 0 connected_slaves] == ($n-1)
+    } else {
+        fail "Unable to create a master-slaves cluster."
+    }
+}
+
+proc get_instance_id_by_port {type port} {
+    foreach_${type}_id id {
+        if {[get_instance_attrib $type $id port] == $port} {
+            return $id
+        }
+    }
+    fail "Instance $type port $port not found."
+}
+
+# Kill an instance of the specified type/id with SIGKILL.
+# This function will mark the instance PID as -1 to remember that this instance
+# is no longer running and will remove its PID from the list of pids that
+# we kill at cleanup.
+#
+# The instance can be restarted with restart-instance.
+proc kill_instance {type id} {
+    set pid [get_instance_attrib $type $id pid]
+    if {$pid == -1} {
+        error "You tried to kill $type $id twice."
+    }
+    exec kill -9 $pid
+    set_instance_attrib $type $id pid -1
+    set_instance_attrib $type $id link you_tried_to_talk_with_killed_instance
+
+    # Remove the PID from the list of pids to kill at exit.
+    set ::pids [lsearch -all -inline -not -exact $::pids $pid]
+}
+
+# Return true of the instance of the specified type/id is killed.
+proc instance_is_killed {type id} {
+    set pid [get_instance_attrib $type $id pid]
+    return $pid == -1
+}
+
+# Restart an instance previously killed by kill_instance
+proc restart_instance {type id} {
+    set dirname "${type}_${id}"
+    set cfgfile [file join $dirname $type.conf]
+    set port [get_instance_attrib $type $id port]
+
+    # Execute the instance with its old setup and append the new pid
+    # file for cleanup.
+    if {$type eq "redis"} {
+        set prgname redis-server
+    } else {
+        set prgname redis-sentinel
+    }
+    set pid [exec ../../../src/${prgname} $cfgfile &]
+    set_instance_attrib $type $id pid $pid
+    lappend ::pids $pid
+
+    # Check that the instance is running
+    if {[server_is_up 127.0.0.1 $port 100] == 0} {
+        abort_sentinel_test "Problems starting $type #$j: ping timeout"
+    }
+
+    # Connect with it with a fresh link
+    set_instance_attrib $type $id link [redis 127.0.0.1 $port]
+}
+
diff --git a/tests/sentinel/run.tcl b/tests/sentinel/run.tcl
new file mode 100644
index 000000000..3edde672e
--- /dev/null
+++ b/tests/sentinel/run.tcl
@@ -0,0 +1,21 @@
+# Sentinel test suite. Copyright (C) 2014 Salvatore Sanfilippo antirez@gmail.com
+# This softare is released under the BSD License. See the COPYING file for
+# more information.
+
+cd tests/sentinel
+source ../instances.tcl
+
+set ::instances_count 5 ; # How many instances we use at max.
+
+proc main {} {
+    parse_options
+    spawn_instance sentinel $::sentinel_base_port $::instances_count
+    spawn_instance redis $::redis_base_port $::instances_count
+    run_tests
+    cleanup
+}
+
+if {[catch main e]} {
+    puts $::errorInfo
+    cleanup
+}
diff --git a/tests/sentinel/tests/00-base.tcl b/tests/sentinel/tests/00-base.tcl
new file mode 100644
index 000000000..a79d0c371
--- /dev/null
+++ b/tests/sentinel/tests/00-base.tcl
@@ -0,0 +1,126 @@
+# Check the basic monitoring and failover capabilities.
+
+source "../tests/includes/init-tests.tcl"
+
+if {$::simulate_error} {
+    test "This test will fail" {
+        fail "Simulated error"
+    }
+}
+
+test "Basic failover works if the master is down" {
+    set old_port [RI $master_id tcp_port]
+    set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    assert {[lindex $addr 1] == $old_port}
+    kill_instance redis $master_id
+    foreach_sentinel_id id {
+        wait_for_condition 1000 50 {
+            [lindex [S $id SENTINEL GET-MASTER-ADDR-BY-NAME mymaster] 1] != $old_port
+        } else {
+            fail "At least one Sentinel did not received failover info"
+        }
+    }
+    restart_instance redis $master_id
+    set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    set master_id [get_instance_id_by_port redis [lindex $addr 1]]
+}
+
+test "New master [join $addr {:}] role matches" {
+    assert {[RI $master_id role] eq {master}}
+}
+
+test "All the other slaves now point to the new master" {
+    foreach_redis_id id {
+        if {$id != $master_id && $id != 0} {
+            wait_for_condition 1000 50 {
+                [RI $id master_port] == [lindex $addr 1]
+            } else {
+                fail "Redis ID $id not configured to replicate with new master"
+            }
+        }
+    }
+}
+
+test "The old master eventually gets reconfigured as a slave" {
+    wait_for_condition 1000 50 {
+        [RI 0 master_port] == [lindex $addr 1]
+    } else {
+        fail "Old master not reconfigured as slave of new master"
+    }
+}
+
+test "ODOWN is not possible without N (quorum) Sentinels reports" {
+    foreach_sentinel_id id {
+        S $id SENTINEL SET mymaster quorum [expr $sentinels+1]
+    }
+    set old_port [RI $master_id tcp_port]
+    set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    assert {[lindex $addr 1] == $old_port}
+    kill_instance redis $master_id
+
+    # Make sure failover did not happened.
+    set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    assert {[lindex $addr 1] == $old_port}
+    restart_instance redis $master_id
+}
+
+test "Failover is not possible without majority agreement" {
+    foreach_sentinel_id id {
+        S $id SENTINEL SET mymaster quorum $quorum
+    }
+
+    # Crash majority of sentinels
+    for {set id 0} {$id < $quorum} {incr id} {
+        kill_instance sentinel $id
+    }
+
+    # Kill the current master
+    kill_instance redis $master_id
+
+    # Make sure failover did not happened.
+    set addr [S $quorum SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    assert {[lindex $addr 1] == $old_port}
+    restart_instance redis $master_id
+
+    # Cleanup: restart Sentinels to monitor the master.
+    for {set id 0} {$id < $quorum} {incr id} {
+        restart_instance sentinel $id
+    }
+}
+
+test "Failover works if we configure for absolute agreement" {
+    foreach_sentinel_id id {
+        S $id SENTINEL SET mymaster quorum $sentinels
+    }
+
+    # Wait for Sentinels to monitor the master again
+    foreach_sentinel_id id {
+        wait_for_condition 1000 50 {
+            [dict get [S $id SENTINEL MASTER mymaster] info-refresh] < 100000
+        } else {
+            fail "At least one Sentinel is not monitoring the master"
+        }
+    }
+
+    kill_instance redis $master_id
+
+    foreach_sentinel_id id {
+        wait_for_condition 1000 50 {
+            [lindex [S $id SENTINEL GET-MASTER-ADDR-BY-NAME mymaster] 1] != $old_port
+        } else {
+            fail "At least one Sentinel did not received failover info"
+        }
+    }
+    restart_instance redis $master_id
+    set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    set master_id [get_instance_id_by_port redis [lindex $addr 1]]
+
+    # Set the min ODOWN agreement back to strict majority.
+    foreach_sentinel_id id {
+        S $id SENTINEL SET mymaster quorum $quorum
+    }
+}
+
+test "New master [join $addr {:}] role matches" {
+    assert {[RI $master_id role] eq {master}}
+}
diff --git a/tests/sentinel/tests/01-conf-update.tcl b/tests/sentinel/tests/01-conf-update.tcl
new file mode 100644
index 000000000..4998104d2
--- /dev/null
+++ b/tests/sentinel/tests/01-conf-update.tcl
@@ -0,0 +1,39 @@
+# Test Sentinel configuration consistency after partitions heal.
+
+source "../tests/includes/init-tests.tcl"
+
+test "We can failover with Sentinel 1 crashed" {
+    set old_port [RI $master_id tcp_port]
+    set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    assert {[lindex $addr 1] == $old_port}
+
+    # Crash Sentinel 1
+    kill_instance sentinel 1
+
+    kill_instance redis $master_id
+    foreach_sentinel_id id {
+        if {$id != 1} {
+            wait_for_condition 1000 50 {
+                [lindex [S $id SENTINEL GET-MASTER-ADDR-BY-NAME mymaster] 1] != $old_port
+            } else {
+                fail "Sentinel $id did not received failover info"
+            }
+        }
+    }
+    restart_instance redis $master_id
+    set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+    set master_id [get_instance_id_by_port redis [lindex $addr 1]]
+}
+
+test "After Sentinel 1 is restarted, its config gets updated" {
+    restart_instance sentinel 1
+    wait_for_condition 1000 50 {
+        [lindex [S 1 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster] 1] != $old_port
+    } else {
+        fail "Restarted Sentinel did not received failover info"
+    }
+}
+
+test "New master [join $addr {:}] role matches" {
+    assert {[RI $master_id role] eq {master}}
+}
diff --git a/tests/sentinel/tests/02-slaves-reconf.tcl b/tests/sentinel/tests/02-slaves-reconf.tcl
new file mode 100644
index 000000000..868bae5ec
--- /dev/null
+++ b/tests/sentinel/tests/02-slaves-reconf.tcl
@@ -0,0 +1,83 @@
+# Check that slaves are reconfigured at a latter time if they are partitioned.
+#
+# Here we should test:
+# 1) That slaves point to the new master after failover.
+# 2) That partitioned slaves point to new master when they are partitioned
+#    away during failover and return at a latter time.
+
+source "../tests/includes/init-tests.tcl"
+
+proc 03_test_slaves_replication {} {
+    uplevel 1 {
+        test "Check that slaves replicate from current master" {
+            set master_port [RI $master_id tcp_port]
+            foreach_redis_id id {
+                if {$id == $master_id} continue
+                if {[instance_is_killed redis $id]} continue
+                wait_for_condition 1000 50 {
+                    [RI $id master_port] == $master_port
+                } else {
+                    fail "Redis slave $id is replicating from wrong master"
+                }
+            }
+        }
+    }
+}
+
+proc 03_crash_and_failover {} {
+    uplevel 1 {
+        test "Crash the master and force a failover" {
+            set old_port [RI $master_id tcp_port]
+            set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+            assert {[lindex $addr 1] == $old_port}
+            kill_instance redis $master_id
+            foreach_sentinel_id id {
+                wait_for_condition 1000 50 {
+                    [lindex [S $id SENTINEL GET-MASTER-ADDR-BY-NAME mymaster] 1] != $old_port
+                } else {
+                    fail "At least one Sentinel did not received failover info"
+                }
+            }
+            restart_instance redis $master_id
+            set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
+            set master_id [get_instance_id_by_port redis [lindex $addr 1]]
+        }
+    }
+}
+
+03_test_slaves_replication
+03_crash_and_failover
+03_test_slaves_replication
+
+test "Kill a slave instance" {
+    foreach_redis_id id {
+        if {$id == $master_id} continue
+        set killed_slave_id $id
+        kill_instance redis $id
+        break
+    }
+}
+
+03_crash_and_failover
+03_test_slaves_replication
+
+test "Wait for failover to end" {
+    set inprogress 1
+    while {$inprogress} {
+        set inprogress 0
+        foreach_sentinel_id id {
+            if {[dict exists [S $id SENTINEL MASTER mymaster] failover-state]} {
+                incr inprogress
+            }
+        }
+        if {$inprogress} {after 100}
+    }
+}
+
+test "Restart killed slave and test replication of slaves again..." {
+    restart_instance redis $killed_slave_id
+}
+
+# Now we check if the slave rejoining the partition is reconfigured even
+# if the failover finished.
+03_test_slaves_replication
diff --git a/tests/sentinel/tests/03-runtime-reconf.tcl b/tests/sentinel/tests/03-runtime-reconf.tcl
new file mode 100644
index 000000000..426596c37
--- /dev/null
+++ b/tests/sentinel/tests/03-runtime-reconf.tcl
@@ -0,0 +1 @@
+# Test runtime reconfiguration command SENTINEL SET.
diff --git a/tests/sentinel/tests/04-slave-selection.tcl b/tests/sentinel/tests/04-slave-selection.tcl
new file mode 100644
index 000000000..3d2ca6484
--- /dev/null
+++ b/tests/sentinel/tests/04-slave-selection.tcl
@@ -0,0 +1,5 @@
+# Test slave selection algorithm.
+#
+# This unit should test:
+# 1) That when there are no suitable slaves no failover is performed.
+# 2) That among the available slaves, the one with better offset is picked.
diff --git a/tests/sentinel/tests/includes/init-tests.tcl b/tests/sentinel/tests/includes/init-tests.tcl
new file mode 100644
index 000000000..cb359ea1b
--- /dev/null
+++ b/tests/sentinel/tests/includes/init-tests.tcl
@@ -0,0 +1,67 @@
+# Initialization tests -- most units will start including this.
+
+test "(init) Restart killed instances" {
+    foreach type {redis sentinel} {
+        foreach_${type}_id id {
+            if {[get_instance_attrib $type $id pid] == -1} {
+                puts -nonewline "$type/$id "
+                flush stdout
+                restart_instance $type $id
+            }
+        }
+    }
+}
+
+set redis_slaves 4
+test "(init) Create a master-slaves cluster of [expr $redis_slaves+1] instances" {
+    create_redis_master_slave_cluster [expr {$redis_slaves+1}]
+}
+set master_id 0
+
+test "(init) Sentinels can start monitoring a master" {
+    set sentinels [llength $::sentinel_instances]
+    set quorum [expr {$sentinels/2+1}]
+    foreach_sentinel_id id {
+        catch {S $id SENTINEL REMOVE mymaster}
+        S $id SENTINEL MONITOR mymaster \
+              [get_instance_attrib redis $master_id host] \
+              [get_instance_attrib redis $master_id port] $quorum
+    }
+    foreach_sentinel_id id {
+        assert {[S $id sentinel master mymaster] ne {}}
+        S $id SENTINEL SET mymaster down-after-milliseconds 2000
+        S $id SENTINEL SET mymaster failover-timeout 20000
+        S $id SENTINEL SET mymaster parallel-syncs 10
+    }
+}
+
+test "(init) Sentinels can talk with the master" {
+    foreach_sentinel_id id {
+        wait_for_condition 100 50 {
+            [catch {S $id SENTINEL GET-MASTER-ADDR-BY-NAME mymaster}] == 0
+        } else {
+            fail "Sentinel $id can't talk with the master."
+        }
+    }
+}
+
+test "(init) Sentinels are able to auto-discover other sentinels" {
+    set sentinels [llength $::sentinel_instances]
+    foreach_sentinel_id id {
+        wait_for_condition 100 50 {
+            [dict get [S $id SENTINEL MASTER mymaster] num-other-sentinels] == ($sentinels-1)
+        } else {
+            fail "At least some sentinel can't detect some other sentinel"
+        }
+    }
+}
+
+test "(init) Sentinels are able to auto-discover slaves" {
+    foreach_sentinel_id id {
+        wait_for_condition 100 50 {
+            [dict get [S $id SENTINEL MASTER mymaster] num-slaves] == $redis_slaves
+        } else {
+            fail "At least some sentinel can't detect some slave"
+        }
+    }
+}
diff --git a/tests/sentinel/tmp/.gitignore b/tests/sentinel/tmp/.gitignore
new file mode 100644
index 000000000..f581f73e2
--- /dev/null
+++ b/tests/sentinel/tmp/.gitignore
@@ -0,0 +1,2 @@
+redis_*
+sentinel_*
diff --git a/tests/support/cluster.tcl b/tests/support/cluster.tcl
new file mode 100644
index 000000000..b007e3b05
--- /dev/null
+++ b/tests/support/cluster.tcl
@@ -0,0 +1,303 @@
+# Tcl redis cluster client as a wrapper of redis.rb.
+# Copyright (C) 2014 Salvatore Sanfilippo
+# Released under the BSD license like Redis itself
+#
+# Example usage:
+#
+# set c [redis_cluster 127.0.0.1 6379 127.0.0.1 6380]
+# $c set foo
+# $c get foo
+# $c close
+
+package require Tcl 8.5
+package provide redis_cluster 0.1
+
+namespace eval redis_cluster {}
+set ::redis_cluster::id 0
+array set ::redis_cluster::startup_nodes {}
+array set ::redis_cluster::nodes {}
+array set ::redis_cluster::slots {}
+
+# List of "plain" commands, which are commands where the sole key is always
+# the first argument.
+set ::redis_cluster::plain_commands {
+    get set setnx setex psetex append strlen exists setbit getbit
+    setrange getrange substr incr decr rpush lpush rpushx lpushx
+    linsert rpop lpop brpop llen lindex lset lrange ltrim lrem
+    sadd srem sismember scard spop srandmember smembers sscan zadd
+    zincrby zrem zremrangebyscore zremrangebyrank zremrangebylex zrange
+    zrangebyscore zrevrangebyscore zrangebylex zrevrangebylex zcount
+    zlexcount zrevrange zcard zscore zrank zrevrank zscan hset hsetnx
+    hget hmset hmget hincrby hincrbyfloat hdel hlen hkeys hvals
+    hgetall hexists hscan incrby decrby incrbyfloat getset move
+    expire expireat pexpire pexpireat type ttl pttl persist restore
+    dump bitcount bitpos pfadd pfcount
+}
+
+proc redis_cluster {nodes} {
+    set id [incr ::redis_cluster::id]
+    set ::redis_cluster::startup_nodes($id) $nodes
+    set ::redis_cluster::nodes($id) {}
+    set ::redis_cluster::slots($id) {}
+    set handle [interp alias {} ::redis_cluster::instance$id {} ::redis_cluster::__dispatch__ $id]
+    $handle refresh_nodes_map
+    return $handle
+}
+
+# Totally reset the slots / nodes state for the client, calls
+# CLUSTER NODES in the first startup node available, populates the
+# list of nodes ::redis_cluster::nodes($id) with an hash mapping node
+# ip:port to a representation of the node (another hash), and finally
+# maps ::redis_cluster::slots($id) with an hash mapping slot numbers
+# to node IDs.
+#
+# This function is called when a new Redis Cluster client is initialized
+# and every time we get a -MOVED redirection error.
+proc ::redis_cluster::__method__refresh_nodes_map {id} {
+    # Contact the first responding startup node.
+    set idx 0; # Index of the node that will respond.
+    set errmsg {}
+    foreach start_node $::redis_cluster::startup_nodes($id) {
+        lassign [split $start_node :] start_host start_port
+        if {[catch {
+            set r {}
+            set r [redis $start_host $start_port]
+            set nodes_descr [$r cluster nodes]
+            $r close
+        } e]} {
+            if {$r ne {}} {catch {$r close}}
+            incr idx
+            if {[string length $errmsg] < 200} {
+                append errmsg " $start_node: $e"
+            }
+            continue ; # Try next.
+        } else {
+            break; # Good node found.
+        }
+    }
+
+    if {$idx == [llength $::redis_cluster::startup_nodes($id)]} {
+        error "No good startup node found. $errmsg"
+    }
+
+    # Put the node that responded as first in the list if it is not
+    # already the first.
+    if {$idx != 0} {
+        set l $::redis_cluster::startup_nodes($id)
+        set left [lrange $l 0 [expr {$idx-1}]]
+        set right [lrange $l [expr {$idx+1}] end]
+        set l [concat [lindex $l $idx] $left $right]
+        set ::redis_cluster::startup_nodes($id) $l
+    }
+
+    # Parse CLUSTER NODES output to populate the nodes description.
+    set nodes {} ; # addr -> node description hash.
+    foreach line [split $nodes_descr "\n"] {
+        set line [string trim $line]
+        if {$line eq {}} continue
+        set args [split $line " "]
+        lassign $args nodeid addr flags slaveof pingsent pongrecv configepoch linkstate
+        set slots [lrange $args 8 end]
+        if {$addr eq {:0}} {
+            set addr $start_host:$start_port
+        }
+        lassign [split $addr :] host port
+
+        # Connect to the node
+        set link {}
+        catch {set link [redis $host $port]}
+
+        # Build this node description as an hash.
+        set node [dict create \
+            id $nodeid \
+            addr $addr \
+            host $host \
+            port $port \
+            flags $flags \
+            slaveof $slaveof \
+            slots $slots \
+            link $link \
+        ]
+        dict set nodes $addr $node
+        lappend ::redis_cluster::startup_nodes($id) $addr
+    }
+
+    # Close all the existing links in the old nodes map, and set the new
+    # map as current.
+    foreach n $::redis_cluster::nodes($id) {
+        catch {
+            [dict get $n link] close
+        }
+    }
+    set ::redis_cluster::nodes($id) $nodes
+
+    # Populates the slots -> nodes map.
+    dict for {addr node} $nodes {
+        foreach slotrange [dict get $node slots] {
+            lassign [split $slotrange -] start end
+            if {$end == {}} {set end $start}
+            for {set j $start} {$j <= $end} {incr j} {
+                dict set ::redis_cluster::slots($id) $j $addr
+            }
+        }
+    }
+
+    # Only retain unique entries in the startup nodes list
+    set ::redis_cluster::startup_nodes($id) [lsort -unique $::redis_cluster::startup_nodes($id)]
+}
+
+# Free a redis_cluster handle.
+proc ::redis_cluster::__method__close {id} {
+    catch {
+        set nodes $::redis_cluster::nodes($id)
+        dict for {addr node} $nodes {
+            catch {
+                [dict get $node link] close
+            }
+        }
+    }
+    catch {unset ::redis_cluster::startup_nodes($id)}
+    catch {unset ::redis_cluster::nodes($id)}
+    catch {unset ::redis_cluster::slots($id)}
+    catch {interp alias {} ::redis_cluster::instance$id {}}
+}
+
+proc ::redis_cluster::__dispatch__ {id method args} {
+    if {[info command ::redis_cluster::__method__$method] eq {}} {
+        # Get the keys from the command.
+        set keys [::redis_cluster::get_keys_from_command $method $args]
+        if {$keys eq {}} {
+            error "Redis command '$method' is not supported by redis_cluster."
+        }
+
+        # Resolve the keys in the corresponding hash slot they hash to.
+        set slot [::redis_cluster::get_slot_from_keys $keys]
+        if {$slot eq {}} {
+            error "Invalid command: multiple keys not hashing to the same slot."
+        }
+
+        # Get the node mapped to this slot.
+        set node_addr [dict get $::redis_cluster::slots($id) $slot]
+        if {$node_addr eq {}} {
+            error "No mapped node for slot $slot."
+        }
+
+        # Execute the command in the node we think is the slot owner.
+        set retry 100
+        while {[incr retry -1]} {
+            if {$retry < 5} {after 100}
+            set node [dict get $::redis_cluster::nodes($id) $node_addr]
+            set link [dict get $node link]
+            if {[catch {$link $method {*}$args} e]} {
+                if {$link eq {} || \
+                    [string range $e 0 4] eq {MOVED} || \
+                    [string range $e 0 2] eq {I/O} \
+                } {
+                    # MOVED redirection.
+                    ::redis_cluster::__method__refresh_nodes_map $id
+                    set node_addr [dict get $::redis_cluster::slots($id) $slot]
+                    continue
+                } elseif {[string range $e 0 2] eq {ASK}} {
+                    # ASK redirection.
+                    set node_addr [lindex $e 2]
+                    continue
+                } else {
+                    # Non redirecting error.
+                    error $e $::errorInfo $::errorCode
+                }
+            } else {
+                # OK query went fine
+                return $e
+            }
+        }
+        error "Too many redirections or failures contacting Redis Cluster."
+    } else {
+        uplevel 1 [list ::redis_cluster::__method__$method $id] $args
+    }
+}
+
+proc ::redis_cluster::get_keys_from_command {cmd argv} {
+    set cmd [string tolower $cmd]
+    # Most Redis commands get just one key as first argument.
+    if {[lsearch -exact $::redis_cluster::plain_commands $cmd] != -1} {
+        return [list [lindex $argv 0]]
+    }
+
+    # Special handling for other commands
+    switch -exact $cmd {
+        mget {return $argv}
+    }
+
+    # All the remaining commands are not handled.
+    return {}
+}
+
+# Returns the CRC16 of the specified string.
+# The CRC parameters are described in the Redis Cluster specification.
+set ::redis_cluster::XMODEMCRC16Lookup {
+    0x0000 0x1021 0x2042 0x3063 0x4084 0x50a5 0x60c6 0x70e7
+    0x8108 0x9129 0xa14a 0xb16b 0xc18c 0xd1ad 0xe1ce 0xf1ef
+    0x1231 0x0210 0x3273 0x2252 0x52b5 0x4294 0x72f7 0x62d6
+    0x9339 0x8318 0xb37b 0xa35a 0xd3bd 0xc39c 0xf3ff 0xe3de
+    0x2462 0x3443 0x0420 0x1401 0x64e6 0x74c7 0x44a4 0x5485
+    0xa56a 0xb54b 0x8528 0x9509 0xe5ee 0xf5cf 0xc5ac 0xd58d
+    0x3653 0x2672 0x1611 0x0630 0x76d7 0x66f6 0x5695 0x46b4
+    0xb75b 0xa77a 0x9719 0x8738 0xf7df 0xe7fe 0xd79d 0xc7bc
+    0x48c4 0x58e5 0x6886 0x78a7 0x0840 0x1861 0x2802 0x3823
+    0xc9cc 0xd9ed 0xe98e 0xf9af 0x8948 0x9969 0xa90a 0xb92b
+    0x5af5 0x4ad4 0x7ab7 0x6a96 0x1a71 0x0a50 0x3a33 0x2a12
+    0xdbfd 0xcbdc 0xfbbf 0xeb9e 0x9b79 0x8b58 0xbb3b 0xab1a
+    0x6ca6 0x7c87 0x4ce4 0x5cc5 0x2c22 0x3c03 0x0c60 0x1c41
+    0xedae 0xfd8f 0xcdec 0xddcd 0xad2a 0xbd0b 0x8d68 0x9d49
+    0x7e97 0x6eb6 0x5ed5 0x4ef4 0x3e13 0x2e32 0x1e51 0x0e70
+    0xff9f 0xefbe 0xdfdd 0xcffc 0xbf1b 0xaf3a 0x9f59 0x8f78
+    0x9188 0x81a9 0xb1ca 0xa1eb 0xd10c 0xc12d 0xf14e 0xe16f
+    0x1080 0x00a1 0x30c2 0x20e3 0x5004 0x4025 0x7046 0x6067
+    0x83b9 0x9398 0xa3fb 0xb3da 0xc33d 0xd31c 0xe37f 0xf35e
+    0x02b1 0x1290 0x22f3 0x32d2 0x4235 0x5214 0x6277 0x7256
+    0xb5ea 0xa5cb 0x95a8 0x8589 0xf56e 0xe54f 0xd52c 0xc50d
+    0x34e2 0x24c3 0x14a0 0x0481 0x7466 0x6447 0x5424 0x4405
+    0xa7db 0xb7fa 0x8799 0x97b8 0xe75f 0xf77e 0xc71d 0xd73c
+    0x26d3 0x36f2 0x0691 0x16b0 0x6657 0x7676 0x4615 0x5634
+    0xd94c 0xc96d 0xf90e 0xe92f 0x99c8 0x89e9 0xb98a 0xa9ab
+    0x5844 0x4865 0x7806 0x6827 0x18c0 0x08e1 0x3882 0x28a3
+    0xcb7d 0xdb5c 0xeb3f 0xfb1e 0x8bf9 0x9bd8 0xabbb 0xbb9a
+    0x4a75 0x5a54 0x6a37 0x7a16 0x0af1 0x1ad0 0x2ab3 0x3a92
+    0xfd2e 0xed0f 0xdd6c 0xcd4d 0xbdaa 0xad8b 0x9de8 0x8dc9
+    0x7c26 0x6c07 0x5c64 0x4c45 0x3ca2 0x2c83 0x1ce0 0x0cc1
+    0xef1f 0xff3e 0xcf5d 0xdf7c 0xaf9b 0xbfba 0x8fd9 0x9ff8
+    0x6e17 0x7e36 0x4e55 0x5e74 0x2e93 0x3eb2 0x0ed1 0x1ef0
+}
+
+proc ::redis_cluster::crc16 {s} {
+    set s [encoding convertto ascii $s]
+    set crc 0
+    foreach char [split $s {}] {
+        scan $char %c byte
+        set crc [expr {(($crc<<8)&0xffff) ^ [lindex $::redis_cluster::XMODEMCRC16Lookup [expr {(($crc>>8)^$byte) & 0xff}]]}]
+    }
+    return $crc
+}
+
+# Hash a single key returning the slot it belongs to, Implemented hash
+# tags as described in the Redis Cluster specification.
+proc ::redis_cluster::hash {key} {
+    # TODO: Handle hash slots.
+    expr {[::redis_cluster::crc16 $key] & 16383}
+}
+
+# Return the slot the specified keys hash to.
+# If the keys hash to multiple slots, an empty string is returned to
+# signal that the command can't be run in Redis Cluster.
+proc ::redis_cluster::get_slot_from_keys {keys} {
+    set slot {}
+    foreach k $keys {
+        set s [::redis_cluster::hash $k]
+        if {$slot eq {}} {
+            set slot $s
+        } elseif {$slot != $s} {
+            return {} ; # Error
+        }
+    }
+    return $slot
+}
diff --git a/tests/support/redis.tcl b/tests/support/redis.tcl
index 36b005a17..a30d1fd68 100644
--- a/tests/support/redis.tcl
+++ b/tests/support/redis.tcl
@@ -1,5 +1,5 @@
-# Tcl clinet library - used by test-redis.tcl script for now
-# Copyright (C) 2009 Salvatore Sanfilippo
+# Tcl client library - used by the Redis test
+# Copyright (C) 2009-2014 Salvatore Sanfilippo
 # Released under the BSD license like Redis itself
 #
 # Example usage:
@@ -170,7 +170,10 @@ proc ::redis::redis_read_reply fd {
         - {return -code error [redis_read_line $fd]}
         $ {redis_bulk_read $fd}
         * {redis_multi_bulk_read $fd}
-        default {return -code error "Bad protocol, '$type' as reply type byte"}
+        default {
+            if {$type eq {}} {return -code error "I/O error reading reply"}
+            return -code error "Bad protocol, '$type' as reply type byte"
+        }
     }
 }
 
diff --git a/tests/support/server.tcl b/tests/support/server.tcl
index e10c350ff..edcbbcc5d 100644
--- a/tests/support/server.tcl
+++ b/tests/support/server.tcl
@@ -40,6 +40,11 @@ proc kill_server config {
                     test "Check for memory leaks (pid $pid)" {
                         set output {0 leaks}
                         catch {exec leaks $pid} output
+                        if {[string match {*process does not exist*} $output] ||
+                            [string match {*cannot examine*} $output]} {
+                            # In a few tests we kill the server process.
+                            set output "0 leaks"
+                        }
                         set output
                     } {*0 leaks*}
                 }
@@ -79,7 +84,7 @@ proc is_alive config {
 proc ping_server {host port} {
     set retval 0
     if {[catch {
-        set fd [socket $::host $::port]
+        set fd [socket $host $port]
         fconfigure $fd -translation binary
         puts $fd "PING\r\n"
         flush $fd
@@ -101,6 +106,22 @@ proc ping_server {host port} {
     return $retval
 }
 
+# Return 1 if the server at the specified addr is reachable by PING, otherwise
+# returns 0. Performs a try every 50 milliseconds for the specified number
+# of retries.
+proc server_is_up {host port retrynum} {
+    after 10 ;# Use a small delay to make likely a first-try success.
+    set retval 0
+    while {[incr retrynum -1]} {
+        if {[catch {ping_server $host $port} ping]} {
+            set ping 0
+        }
+        if {$ping} {return 1}
+        after 50
+    }
+    return 0
+}
+
 # doesn't really belong here, but highly coupled to code in start_server
 proc tags {tags code} {
     set ::tags [concat $::tags $tags]
@@ -191,23 +212,13 @@ proc start_server {options {code undefined}} {
     # check that the server actually started
     # ugly but tries to be as fast as possible...
     if {$::valgrind} {set retrynum 1000} else {set retrynum 100}
-    set serverisup 0
 
     if {$::verbose} {
         puts -nonewline "=== ($tags) Starting server ${::host}:${::port} "
     }
 
-    after 10
     if {$code ne "undefined"} {
-        while {[incr retrynum -1]} {
-            catch {
-                if {[ping_server $::host $::port]} {
-                    set serverisup 1
-                }
-            }
-            if {$serverisup} break
-            after 50
-        }
+        set serverisup [server_is_up $::host $::port $retrynum]
     } else {
         set serverisup 1
     }
diff --git a/tests/support/test.tcl b/tests/support/test.tcl
index 480c674e0..bf2cb0e2f 100644
--- a/tests/support/test.tcl
+++ b/tests/support/test.tcl
@@ -53,41 +53,17 @@ proc assert_type {type key} {
 # executed.
 proc wait_for_condition {maxtries delay e _else_ elsescript} {
     while {[incr maxtries -1] >= 0} {
-        if {[uplevel 1 [list expr $e]]} break
+        set errcode [catch {uplevel 1 [list expr $e]} result]
+        if {$errcode == 0} {
+            if {$result} break
+        } else {
+            return -code $errcode $result
+        }
         after $delay
     }
     if {$maxtries == -1} {
-        uplevel 1 $elsescript
-    }
-}
-
-# Test if TERM looks like to support colors
-proc color_term {} {
-    expr {[info exists ::env(TERM)] && [string match *xterm* $::env(TERM)]}
-}
-
-proc colorstr {color str} {
-    if {[color_term]} {
-        set b 0
-        if {[string range $color 0 4] eq {bold-}} {
-            set b 1
-            set color [string range $color 5 end]
-        }
-        switch $color {
-            red {set colorcode {31}}
-            green {set colorcode {32}}
-            yellow {set colorcode {33}}
-            blue {set colorcode {34}}
-            magenta {set colorcode {35}}
-            cyan {set colorcode {36}}
-            white {set colorcode {37}}
-            default {set colorcode {37}}
-        }
-        if {$colorcode ne {}} {
-            return "\033\[$b;${colorcode};40m$str\033\[0m"
-        }
-    } else {
-        return $str
+        set errcode [catch [uplevel 1 $elsescript] result]
+        return -code $errcode $result
     }
 }
 
diff --git a/tests/support/util.tcl b/tests/support/util.tcl
index c5a6853b3..e49ea229c 100644
--- a/tests/support/util.tcl
+++ b/tests/support/util.tcl
@@ -312,3 +312,48 @@ proc csvstring s {
 proc roundFloat f {
     format "%.10g" $f
 }
+
+proc find_available_port start {
+    for {set j $start} {$j < $start+1024} {incr j} {
+        if {[catch {
+            set fd [socket 127.0.0.1 $j]
+        }]} {
+            return $j
+        } else {
+            close $fd
+        }
+    }
+    if {$j == $start+1024} {
+        error "Can't find a non busy port in the $start-[expr {$start+1023}] range."
+    }
+}
+
+# Test if TERM looks like to support colors
+proc color_term {} {
+    expr {[info exists ::env(TERM)] && [string match *xterm* $::env(TERM)]}
+}
+
+proc colorstr {color str} {
+    if {[color_term]} {
+        set b 0
+        if {[string range $color 0 4] eq {bold-}} {
+            set b 1
+            set color [string range $color 5 end]
+        }
+        switch $color {
+            red {set colorcode {31}}
+            green {set colorcode {32}}
+            yellow {set colorcode {33}}
+            blue {set colorcode {34}}
+            magenta {set colorcode {35}}
+            cyan {set colorcode {36}}
+            white {set colorcode {37}}
+            default {set colorcode {37}}
+        }
+        if {$colorcode ne {}} {
+            return "\033\[$b;${colorcode};40m$str\033\[0m"
+        }
+    } else {
+        return $str
+    }
+}
diff --git a/tests/test_helper.tcl b/tests/test_helper.tcl
index 058ea0c09..78b979469 100644
--- a/tests/test_helper.tcl
+++ b/tests/test_helper.tcl
@@ -47,6 +47,7 @@ set ::all_tests {
     unit/obuf-limits
     unit/bitops
     unit/memefficiency
+    unit/hyperloglog
 }
 # Index to the next test to run in the ::all_tests list.
 set ::next_test 0
@@ -164,21 +165,6 @@ proc cleanup {} {
     if {!$::quiet} {puts "OK"}
 }
 
-proc find_available_port start {
-    for {set j $start} {$j < $start+1024} {incr j} {
-        if {[catch {
-            set fd [socket 127.0.0.1 $j]
-        }]} {
-            return $j
-        } else {
-            close $fd
-        }
-    }
-    if {$j == $start+1024} {
-        error "Can't find a non busy port in the $start-[expr {$start+1023}] range."
-    }
-}
-
 proc test_server_main {} {
     cleanup
     set tclsh [info nameofexecutable]
@@ -204,6 +190,7 @@ proc test_server_main {} {
     # Setup global state for the test server
     set ::idle_clients {}
     set ::active_clients {}
+    array set ::active_clients_task {}
     array set ::clients_start_time {}
     set ::clients_time_history {}
     set ::failed_tests {}
@@ -217,9 +204,12 @@ proc test_server_main {} {
 # may be used in the future in order to detect test clients taking too much
 # time to execute the task.
 proc test_server_cron {} {
+    # Do some work here.
+    after 100 test_server_cron
 }
 
 proc accept_test_clients {fd addr port} {
+    fconfigure $fd -encoding binary
     fileevent $fd readable [list read_from_test_client $fd]
 }
 
@@ -253,14 +243,17 @@ proc read_from_test_client fd {
         puts "\[$completed_tests_count/$all_tests_count [colorstr yellow $status]\]: $data ($elapsed seconds)"
         lappend ::clients_time_history $elapsed $data
         signal_idle_client $fd
+        set ::active_clients_task($fd) DONE
     } elseif {$status eq {ok}} {
         if {!$::quiet} {
             puts "\[[colorstr green $status]\]: $data"
         }
+        set ::active_clients_task($fd) "(OK) $data"
     } elseif {$status eq {err}} {
         set err "\[[colorstr red $status]\]: $data"
         puts $err
         lappend ::failed_tests $err
+        set ::active_clients_task($fd) "(ERR) $data"
     } elseif {$status eq {exception}} {
         puts "\[[colorstr red $status]\]: $data"
         foreach p $::clients_pids {
@@ -268,7 +261,7 @@ proc read_from_test_client fd {
         }
         exit 1
     } elseif {$status eq {testing}} {
-        # No op
+        set ::active_clients_task($fd) "(IN PROGRESS) $data"
     } else {
         if {!$::quiet} {
             puts "\[$status\]: $data"
@@ -282,10 +275,24 @@ proc signal_idle_client fd {
     # Remove this fd from the list of active clients.
     set ::active_clients \
         [lsearch -all -inline -not -exact $::active_clients $fd]
+
+    if 0 {
+        # The following loop is only useful for debugging tests that may
+        # enter an infinite loop. Commented out normally.
+        foreach x $::active_clients {
+            if {[info exist ::active_clients_task($x)]} {
+                puts "$x => $::active_clients_task($x)"
+            } else {
+                puts "$x => ???"
+            }
+        }
+    }
+
     # New unit to process?
     if {$::next_test != [llength $::all_tests]} {
         if {!$::quiet} {
             puts [colorstr bold-white "Testing [lindex $::all_tests $::next_test]"]
+            set ::active_clients_task($fd) "ASSIGNED: $fd ([lindex $::all_tests $::next_test])"
         }
         set ::clients_start_time($fd) [clock seconds]
         send_data_packet $fd run [lindex $::all_tests $::next_test]
@@ -326,6 +333,7 @@ proc the_end {} {
 # to read the command, execute, reply... all this in a loop.
 proc test_client_main server_port {
     set ::test_server_fd [socket localhost $server_port]
+    fconfigure $::test_server_fd -encoding binary
     send_data_packet $::test_server_fd ready [pid]
     while 1 {
         set bytes [gets $::test_server_fd]
diff --git a/tests/unit/basic.tcl b/tests/unit/basic.tcl
index 1f46ba666..8da358968 100644
--- a/tests/unit/basic.tcl
+++ b/tests/unit/basic.tcl
@@ -261,6 +261,14 @@ start_server {tags {"basic"}} {
         assert_equal 20 [r get x]
     }
 
+    test "DEL against expired key" {
+        r debug set-active-expire 0
+        r setex keyExpire 1 valExpire
+        after 1100
+        assert_equal 0 [r del keyExpire]
+        r debug set-active-expire 1
+    }
+
     test {EXISTS} {
         set res {}
         r set newkey test
diff --git a/tests/unit/bitops.tcl b/tests/unit/bitops.tcl
index dade8923e..896310980 100644
--- a/tests/unit/bitops.tcl
+++ b/tests/unit/bitops.tcl
@@ -52,7 +52,7 @@ start_server {tags {"bitops"}} {
         }
     }
 
-    test {BITCOUNT fuzzing} {
+    test {BITCOUNT fuzzing without start/end} {
         for {set j 0} {$j < 100} {incr j} {
             set str [randstring 0 3000]
             r set str $str
@@ -60,6 +60,20 @@ start_server {tags {"bitops"}} {
         }
     }
 
+    test {BITCOUNT fuzzing with start/end} {
+        for {set j 0} {$j < 100} {incr j} {
+            set str [randstring 0 3000]
+            r set str $str
+            set l [string length $str]
+            set start [randomInt $l]
+            set end [randomInt $l]
+            if {$start > $end} {
+                lassign [list $end $start] start end
+            }
+            assert {[r bitcount str $start $end] == [count_bits [string range $str $start $end]]}
+        }
+    }
+
     test {BITCOUNT with start, end} {
         r set s "foobar"
         assert_equal [r bitcount s 0 -1] [count_bits "foobar"]
@@ -84,6 +98,18 @@ start_server {tags {"bitops"}} {
         }
     } {1}
 
+    test {BITCOUNT misaligned prefix} {
+        r del str
+        r set str ab
+        r bitcount str 1 -1
+    } {3}
+
+    test {BITCOUNT misaligned prefix + full words + remainder} {
+        r del str
+        r set str __PPxxxxxxxxxxxxxxxxRR__
+        r bitcount str 2 -3
+    } {74}
+
     test {BITOP NOT (empty string)} {
         r set s ""
         r bitop not dest s
@@ -177,4 +203,139 @@ start_server {tags {"bitops"}} {
         r set a "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
         r bitop or x a b
     } {32}
+
+    test {BITPOS bit=0 with empty key returns 0} {
+        r del str
+        r bitpos str 0
+    } {0}
+
+    test {BITPOS bit=1 with empty key returns -1} {
+        r del str
+        r bitpos str 1
+    } {-1}
+
+    test {BITPOS bit=0 with string less than 1 word works} {
+        r set str "\xff\xf0\x00"
+        r bitpos str 0
+    } {12}
+
+    test {BITPOS bit=1 with string less than 1 word works} {
+        r set str "\x00\x0f\x00"
+        r bitpos str 1
+    } {12}
+
+    test {BITPOS bit=0 starting at unaligned address} {
+        r set str "\xff\xf0\x00"
+        r bitpos str 0 1
+    } {12}
+
+    test {BITPOS bit=1 starting at unaligned address} {
+        r set str "\x00\x0f\xff"
+        r bitpos str 1 1
+    } {12}
+
+    test {BITPOS bit=0 unaligned+full word+reminder} {
+        r del str
+        r set str "\xff\xff\xff" ; # Prefix
+        # Followed by two (or four in 32 bit systems) full words
+        r append str "\xff\xff\xff\xff\xff\xff\xff\xff"
+        r append str "\xff\xff\xff\xff\xff\xff\xff\xff"
+        r append str "\xff\xff\xff\xff\xff\xff\xff\xff"
+        # First zero bit.
+        r append str "\x0f"
+        assert {[r bitpos str 0] == 216}
+        assert {[r bitpos str 0 1] == 216}
+        assert {[r bitpos str 0 2] == 216}
+        assert {[r bitpos str 0 3] == 216}
+        assert {[r bitpos str 0 4] == 216}
+        assert {[r bitpos str 0 5] == 216}
+        assert {[r bitpos str 0 6] == 216}
+        assert {[r bitpos str 0 7] == 216}
+        assert {[r bitpos str 0 8] == 216}
+    }
+
+    test {BITPOS bit=1 unaligned+full word+reminder} {
+        r del str
+        r set str "\x00\x00\x00" ; # Prefix
+        # Followed by two (or four in 32 bit systems) full words
+        r append str "\x00\x00\x00\x00\x00\x00\x00\x00"
+        r append str "\x00\x00\x00\x00\x00\x00\x00\x00"
+        r append str "\x00\x00\x00\x00\x00\x00\x00\x00"
+        # First zero bit.
+        r append str "\xf0"
+        assert {[r bitpos str 1] == 216}
+        assert {[r bitpos str 1 1] == 216}
+        assert {[r bitpos str 1 2] == 216}
+        assert {[r bitpos str 1 3] == 216}
+        assert {[r bitpos str 1 4] == 216}
+        assert {[r bitpos str 1 5] == 216}
+        assert {[r bitpos str 1 6] == 216}
+        assert {[r bitpos str 1 7] == 216}
+        assert {[r bitpos str 1 8] == 216}
+    }
+
+    test {BITPOS bit=1 returns -1 if string is all 0 bits} {
+        r set str ""
+        for {set j 0} {$j < 20} {incr j} {
+            assert {[r bitpos str 1] == -1}
+            r append str "\x00"
+        }
+    }
+
+    test {BITPOS bit=0 works with intervals} {
+        r set str "\x00\xff\x00"
+        assert {[r bitpos str 0 0 -1] == 0}
+        assert {[r bitpos str 0 1 -1] == 16}
+        assert {[r bitpos str 0 2 -1] == 16}
+        assert {[r bitpos str 0 2 200] == 16}
+        assert {[r bitpos str 0 1 1] == -1}
+    }
+
+    test {BITPOS bit=1 works with intervals} {
+        r set str "\x00\xff\x00"
+        assert {[r bitpos str 1 0 -1] == 8}
+        assert {[r bitpos str 1 1 -1] == 8}
+        assert {[r bitpos str 1 2 -1] == -1}
+        assert {[r bitpos str 1 2 200] == -1}
+        assert {[r bitpos str 1 1 1] == 8}
+    }
+
+    test {BITPOS bit=0 changes behavior if end is given} {
+        r set str "\xff\xff\xff"
+        assert {[r bitpos str 0] == 24}
+        assert {[r bitpos str 0 0] == 24}
+        assert {[r bitpos str 0 0 -1] == -1}
+    }
+
+    test {BITPOS bit=1 fuzzy testing using SETBIT} {
+        r del str
+        set max 524288; # 64k
+        set first_one_pos -1
+        for {set j 0} {$j < 1000} {incr j} {
+            assert {[r bitpos str 1] == $first_one_pos}
+            set pos [randomInt $max]
+            r setbit str $pos 1
+            if {$first_one_pos == -1 || $first_one_pos > $pos} {
+                # Update the position of the first 1 bit in the array
+                # if the bit we set is on the left of the previous one.
+                set first_one_pos $pos
+            }
+        }
+    }
+
+    test {BITPOS bit=0 fuzzy testing using SETBIT} {
+        set max 524288; # 64k
+        set first_zero_pos $max
+        r set str [string repeat "\xff" [expr $max/8]]
+        for {set j 0} {$j < 1000} {incr j} {
+            assert {[r bitpos str 0] == $first_zero_pos}
+            set pos [randomInt $max]
+            r setbit str $pos 0
+            if {$first_zero_pos > $pos} {
+                # Update the position of the first 0 bit in the array
+                # if the bit we clear is on the left of the previous one.
+                set first_zero_pos $pos
+            }
+        }
+    }
 }
diff --git a/tests/unit/hyperloglog.tcl b/tests/unit/hyperloglog.tcl
new file mode 100644
index 000000000..af86e68e5
--- /dev/null
+++ b/tests/unit/hyperloglog.tcl
@@ -0,0 +1,159 @@
+start_server {tags {"hll"}} {
+    test {HyperLogLog self test passes} {
+        catch {r pfselftest} e
+        set e
+    } {OK}
+
+    test {PFADD without arguments creates an HLL value} {
+        r pfadd hll
+        r exists hll
+    } {1}
+
+    test {Approximated cardinality after creation is zero} {
+        r pfcount hll
+    } {0}
+
+    test {PFADD returns 1 when at least 1 reg was modified} {
+        r pfadd hll a b c
+    } {1}
+
+    test {PFADD returns 0 when no reg was modified} {
+        r pfadd hll a b c
+    } {0}
+
+    test {PFADD works with empty string (regression)} {
+        r pfadd hll ""
+    }
+
+    # Note that the self test stresses much better the
+    # cardinality estimation error. We are testing just the
+    # command implementation itself here.
+    test {PFCOUNT returns approximated cardinality of set} {
+        r del hll
+        set res {}
+        r pfadd hll 1 2 3 4 5
+        lappend res [r pfcount hll]
+        # Call it again to test cached value invalidation.
+        r pfadd hll 6 7 8 8 9 10
+        lappend res [r pfcount hll]
+        set res
+    } {5 10}
+
+    test {HyperLogLogs are promote from sparse to dense} {
+        r del hll
+        r config set hll-sparse-max-bytes 3000
+        set n 0
+        while {$n < 100000} {
+            set elements {}
+            for {set j 0} {$j < 100} {incr j} {lappend elements [expr rand()]}
+            incr n 100
+            r pfadd hll {*}$elements
+            set card [r pfcount hll]
+            set err [expr {abs($card-$n)}]
+            assert {$err < (double($card)/100)*5}
+            if {$n < 1000} {
+                assert {[r pfdebug encoding hll] eq {sparse}}
+            } elseif {$n > 10000} {
+                assert {[r pfdebug encoding hll] eq {dense}}
+            }
+        }
+    }
+
+    test {HyperLogLog sparse encoding stress test} {
+        for {set x 0} {$x < 1000} {incr x} {
+            r del hll1 hll2
+            set numele [randomInt 100]
+            set elements {}
+            for {set j 0} {$j < $numele} {incr j} {
+                lappend elements [expr rand()]
+            }
+            # Force dense representation of hll2
+            r pfadd hll2
+            r pfdebug todense hll2
+            r pfadd hll1 {*}$elements
+            r pfadd hll2 {*}$elements
+            assert {[r pfdebug encoding hll1] eq {sparse}}
+            assert {[r pfdebug encoding hll2] eq {dense}}
+            # Cardinality estimated should match exactly.
+            assert {[r pfcount hll1] eq [r pfcount hll2]}
+        }
+    }
+
+    test {Corrupted sparse HyperLogLogs are detected: Additionl at tail} {
+        r del hll
+        r pfadd hll a b c
+        r append hll "hello"
+        set e {}
+        catch {r pfcount hll} e
+        set e
+    } {*INVALIDOBJ*}
+
+    test {Corrupted sparse HyperLogLogs are detected: Broken magic} {
+        r del hll
+        r pfadd hll a b c
+        r setrange hll 0 "0123"
+        set e {}
+        catch {r pfcount hll} e
+        set e
+    } {*WRONGTYPE*}
+
+    test {Corrupted sparse HyperLogLogs are detected: Invalid encoding} {
+        r del hll
+        r pfadd hll a b c
+        r setrange hll 4 "x"
+        set e {}
+        catch {r pfcount hll} e
+        set e
+    } {*WRONGTYPE*}
+
+    test {Corrupted dense HyperLogLogs are detected: Wrong length} {
+        r del hll
+        r pfadd hll a b c
+        r setrange hll 4 "\x00"
+        set e {}
+        catch {r pfcount hll} e
+        set e
+    } {*WRONGTYPE*}
+
+    test {PFADD, PFCOUNT, PFMERGE type checking works} {
+        r set foo bar
+        catch {r pfadd foo 1} e
+        assert_match {*WRONGTYPE*} $e
+        catch {r pfcount foo} e
+        assert_match {*WRONGTYPE*} $e
+        catch {r pfmerge bar foo} e
+        assert_match {*WRONGTYPE*} $e
+        catch {r pfmerge foo bar} e
+        assert_match {*WRONGTYPE*} $e
+    }
+
+    test {PFMERGE results on the cardinality of union of sets} {
+        r del hll hll1 hll2 hll3
+        r pfadd hll1 a b c
+        r pfadd hll2 b c d
+        r pfadd hll3 c d e
+        r pfmerge hll hll1 hll2 hll3
+        r pfcount hll
+    } {5}
+
+    test {PFCOUNT multiple-keys merge returns cardinality of union} {
+        r del hll1 hll2 hll3
+        for {set x 1} {$x < 10000} {incr x} {
+            # Force dense representation of hll2
+            r pfadd hll1 "foo-$x"
+            r pfadd hll2 "bar-$x"
+            r pfadd hll3 "zap-$x"
+
+            set card [r pfcount hll1 hll2 hll3]
+            set realcard [expr {$x*3}]
+            set err [expr {abs($card-$realcard)}]
+            assert {$err < (double($card)/100)*5}
+        }
+    }
+
+    test {PFDEBUG GETREG returns the HyperLogLog raw registers} {
+        r del hll
+        r pfadd hll 1 2 3
+        llength [r pfdebug getreg hll]
+    } {16384}
+}
diff --git a/tests/unit/scripting.tcl b/tests/unit/scripting.tcl
index ec5230bfe..8916d2351 100644
--- a/tests/unit/scripting.tcl
+++ b/tests/unit/scripting.tcl
@@ -311,8 +311,21 @@ start_server {tags {"scripting"}} {
         r config set slave-read-only yes
         r slaveof 127.0.0.1 0
         r debug loadaof
-        r get foo
+        set res [r get foo]
+        r slaveof no one
+        set res
     } {102}
+
+    test {We can call scripts rewriting client->argv from Lua} {
+        r del myset
+        r sadd myset a b c
+        r mset a 1 b 2 c 3 d 4
+        assert {[r spop myset] ne {}}
+        assert {[r spop myset] ne {}}
+        assert {[r spop myset] ne {}}
+        assert {[r mget a b c d] eq {1 2 3 4}}
+        assert {[r spop myset] eq {}}
+    }
 }
 
 # Start a new server since the last test in this stanza will kill the
@@ -326,6 +339,7 @@ start_server {tags {"scripting"}} {
         catch {r ping} e
         assert_match {BUSY*} $e
         r script kill
+        after 200 ; # Give some time to Lua to call the hook again...
         assert_equal [r ping] "PONG"
     }
 
@@ -417,5 +431,17 @@ start_server {tags {"scripting repl"}} {
             }
             set res
         } {a 1}
+
+        test {EVALSHA replication when first call is readonly} {
+            r del x
+            r eval {if tonumber(KEYS[1]) > 0 then redis.call('incr', 'x') end} 1 0
+            r evalsha 38fe3ddf5284a1d48f37f824b4c4e826879f3cb9 1 0
+            r evalsha 38fe3ddf5284a1d48f37f824b4c4e826879f3cb9 1 1
+            wait_for_condition 50 100 {
+                [r -1 get x] eq {1}
+            } else {
+                fail "Expected 1 in x, but value is '[r -1 get x]'"
+            }
+        }
     }
 }
diff --git a/tests/unit/type/zset.tcl b/tests/unit/type/zset.tcl
index 806f4c88b..9cc840be6 100644
--- a/tests/unit/type/zset.tcl
+++ b/tests/unit/type/zset.tcl
@@ -296,6 +296,62 @@ start_server {tags {"zset"}} {
             assert_error "*not*float*" {r zrangebyscore fooz 1 NaN}
         }
 
+        proc create_default_lex_zset {} {
+            create_zset zset {0 alpha 0 bar 0 cool 0 down
+                              0 elephant 0 foo 0 great 0 hill
+                              0 omega}
+        }
+
+        test "ZRANGEBYLEX/ZREVRANGEBYLEX/ZCOUNT basics" {
+            create_default_lex_zset
+
+            # inclusive range
+            assert_equal {alpha bar cool} [r zrangebylex zset - \[cool]
+            assert_equal {bar cool down} [r zrangebylex zset \[bar \[down]
+            assert_equal {great hill omega} [r zrangebylex zset \[g +]
+            assert_equal {cool bar alpha} [r zrevrangebylex zset \[cool -]
+            assert_equal {down cool bar} [r zrevrangebylex zset \[down \[bar]
+            assert_equal {omega hill great foo elephant down} [r zrevrangebylex zset + \[d]
+            assert_equal 3 [r zlexcount zset \[ele \[h]
+
+            # exclusive range
+            assert_equal {alpha bar} [r zrangebylex zset - (cool]
+            assert_equal {cool} [r zrangebylex zset (bar (down]
+            assert_equal {hill omega} [r zrangebylex zset (great +]
+            assert_equal {bar alpha} [r zrevrangebylex zset (cool -]
+            assert_equal {cool} [r zrevrangebylex zset (down (bar]
+            assert_equal {omega hill} [r zrevrangebylex zset + (great]
+            assert_equal 2 [r zlexcount zset (ele (great]
+
+            # inclusive and exclusive
+            assert_equal {} [r zrangebylex zset (az (b]
+            assert_equal {} [r zrangebylex zset (z +]
+            assert_equal {} [r zrangebylex zset - \[aaaa]
+            assert_equal {} [r zrevrangebylex zset \[elez \[elex]
+            assert_equal {} [r zrevrangebylex zset (hill (omega]
+        }
+
+        test "ZRANGEBYSLEX with LIMIT" {
+            create_default_lex_zset
+            assert_equal {alpha bar} [r zrangebylex zset - \[cool LIMIT 0 2]
+            assert_equal {bar cool} [r zrangebylex zset - \[cool LIMIT 1 2]
+            assert_equal {} [r zrangebylex zset \[bar \[down LIMIT 0 0]
+            assert_equal {} [r zrangebylex zset \[bar \[down LIMIT 2 0]
+            assert_equal {bar} [r zrangebylex zset \[bar \[down LIMIT 0 1]
+            assert_equal {cool} [r zrangebylex zset \[bar \[down LIMIT 1 1]
+            assert_equal {bar cool down} [r zrangebylex zset \[bar \[down LIMIT 0 100]
+            assert_equal {omega hill great foo elephant} [r zrevrangebylex zset + \[d LIMIT 0 5]
+            assert_equal {omega hill great foo} [r zrevrangebylex zset + \[d LIMIT 0 4]
+        }
+
+        test "ZRANGEBYLEX with invalid lex range specifiers" {
+            assert_error "*not*string*" {r zrangebylex fooz foo bar}
+            assert_error "*not*string*" {r zrangebylex fooz \[foo bar}
+            assert_error "*not*string*" {r zrangebylex fooz foo \[bar}
+            assert_error "*not*string*" {r zrangebylex fooz +x \[bar}
+            assert_error "*not*string*" {r zrangebylex fooz -x \[bar}
+        }
+
         test "ZREMRANGEBYSCORE basics" {
             proc remrangebyscore {min max} {
                 create_zset zset {1 a 2 b 3 c 4 d 5 e}
@@ -708,6 +764,111 @@ start_server {tags {"zset"}} {
             assert_equal {} $err
         }
 
+        test "ZRANGEBYLEX fuzzy test, 100 ranges in $elements element sorted set - $encoding" {
+            set lexset {}
+            r del zset
+            for {set j 0} {$j < $elements} {incr j} {
+                set e [randstring 0 30 alpha]
+                lappend lexset $e
+                r zadd zset 0 $e
+            }
+            set lexset [lsort -unique $lexset]
+            for {set j 0} {$j < 100} {incr j} {
+                set min [randstring 0 30 alpha]
+                set max [randstring 0 30 alpha]
+                set mininc [randomInt 2]
+                set maxinc [randomInt 2]
+                if {$mininc} {set cmin "\[$min"} else {set cmin "($min"}
+                if {$maxinc} {set cmax "\[$max"} else {set cmax "($max"}
+                set rev [randomInt 2]
+                if {$rev} {
+                    set cmd zrevrangebylex
+                } else {
+                    set cmd zrangebylex
+                }
+
+                # Make sure data is the same in both sides
+                assert {[r zrange zset 0 -1] eq $lexset}
+
+                # Get the Redis output
+                set output [r $cmd zset $cmin $cmax]
+                if {$rev} {
+                    set outlen [r zlexcount zset $cmax $cmin]
+                } else {
+                    set outlen [r zlexcount zset $cmin $cmax]
+                }
+
+                # Compute the same output via Tcl
+                set o {}
+                set copy $lexset
+                if {(!$rev && [string compare $min $max] > 0) ||
+                    ($rev && [string compare $max $min] > 0)} {
+                    # Empty output when ranges are inverted.
+                } else {
+                    if {$rev} {
+                        # Invert the Tcl array using Redis itself.
+                        set copy [r zrevrange zset 0 -1]
+                        # Invert min / max as well
+                        lassign [list $min $max $mininc $maxinc] \
+                            max min maxinc mininc
+                    }
+                    foreach e $copy {
+                        set mincmp [string compare $e $min]
+                        set maxcmp [string compare $e $max]
+                        if {
+                             ($mininc && $mincmp >= 0 || !$mininc && $mincmp > 0)
+                             &&
+                             ($maxinc && $maxcmp <= 0 || !$maxinc && $maxcmp < 0)
+                        } {
+                            lappend o $e
+                        }
+                    }
+                }
+                assert {$o eq $output}
+                assert {$outlen eq [llength $output]}
+            }
+        }
+
+        test "ZREMRANGEBYLEX fuzzy test, 100 ranges in $elements element sorted set - $encoding" {
+            set lexset {}
+            r del zset zsetcopy
+            for {set j 0} {$j < $elements} {incr j} {
+                set e [randstring 0 30 alpha]
+                lappend lexset $e
+                r zadd zset 0 $e
+            }
+            set lexset [lsort -unique $lexset]
+            for {set j 0} {$j < 100} {incr j} {
+                # Copy...
+                r zunionstore zsetcopy 1 zset
+                set lexsetcopy $lexset
+
+                set min [randstring 0 30 alpha]
+                set max [randstring 0 30 alpha]
+                set mininc [randomInt 2]
+                set maxinc [randomInt 2]
+                if {$mininc} {set cmin "\[$min"} else {set cmin "($min"}
+                if {$maxinc} {set cmax "\[$max"} else {set cmax "($max"}
+
+                # Make sure data is the same in both sides
+                assert {[r zrange zset 0 -1] eq $lexset}
+
+                # Get the range we are going to remove
+                set torem [r zrangebylex zset $cmin $cmax]
+                set toremlen [r zlexcount zset $cmin $cmax]
+                r zremrangebylex zsetcopy $cmin $cmax
+                set output [r zrange zsetcopy 0 -1]
+
+                # Remove the range with Tcl from the original list
+                if {$toremlen} {
+                    set first [lsearch -exact $lexsetcopy [lindex $torem 0]]
+                    set last [expr {$first+$toremlen-1}]
+                    set lexsetcopy [lreplace $lexsetcopy $first $last]
+                }
+                assert {$lexsetcopy eq $output}
+            }
+        }
+
         test "ZSETs skiplist implementation backlink consistency test - $encoding" {
             set diff 0
             for {set j 0} {$j < $elements} {incr j} {
diff --git a/utils/generate-command-help.rb b/utils/generate-command-help.rb
index f6ca8874b..47fbc645c 100755
--- a/utils/generate-command-help.rb
+++ b/utils/generate-command-help.rb
@@ -11,7 +11,8 @@
   "transactions",
   "connection",
   "server",
-  "scripting"
+  "scripting",
+  "hyperloglog"
 ].freeze
 
 GROUPS_BY_NAME = Hash[*
diff --git a/utils/hyperloglog/.gitignore b/utils/hyperloglog/.gitignore
new file mode 100644
index 000000000..2211df63d
--- /dev/null
+++ b/utils/hyperloglog/.gitignore
@@ -0,0 +1 @@
+*.txt
diff --git a/utils/hyperloglog/hll-err.rb b/utils/hyperloglog/hll-err.rb
new file mode 100644
index 000000000..75bb8e424
--- /dev/null
+++ b/utils/hyperloglog/hll-err.rb
@@ -0,0 +1,27 @@
+# hll-err.rb - Copyright (C) 2014 Salvatore Sanfilippo
+# BSD license, See the COPYING file for more information.
+#
+# Check error of HyperLogLog Redis implementation for different set sizes.
+
+require 'rubygems'
+require 'redis'
+require 'digest/sha1'
+
+r = Redis.new
+r.del('hll')
+i = 0
+while true do
+    100.times {
+        elements = []
+        1000.times {
+            ele = Digest::SHA1.hexdigest(i.to_s)
+            elements << ele
+            i += 1
+        }
+        r.pfadd('hll',*elements)
+    }
+    approx = r.pfcount('hll')
+    abs_err = (approx-i).abs
+    rel_err = 100.to_f*abs_err/i
+    puts "#{i} vs #{approx}: #{rel_err}%"
+end
diff --git a/utils/hyperloglog/hll-gnuplot-graph.rb b/utils/hyperloglog/hll-gnuplot-graph.rb
new file mode 100644
index 000000000..745baddcf
--- /dev/null
+++ b/utils/hyperloglog/hll-gnuplot-graph.rb
@@ -0,0 +1,88 @@
+# hll-err.rb - Copyright (C) 2014 Salvatore Sanfilippo
+# BSD license, See the COPYING file for more information.
+#
+# This program is suited to output average and maximum errors of
+# the Redis HyperLogLog implementation in a format suitable to print
+# graphs using gnuplot.
+
+require 'rubygems'
+require 'redis'
+require 'digest/sha1'
+
+# Generate an array of [cardinality,relative_error] pairs
+# in the 0 - max range, with the specified step.
+#
+# 'r' is the Redis object used to perform the queries.
+# 'seed' must be different every time you want a test performed
+# with a different set. The function guarantees that if 'seed' is the
+# same, exactly the same dataset is used, and when it is different,
+# a totally unrelated different data set is used (without any common
+# element in practice).
+def run_experiment(r,seed,max,step)
+    r.del('hll')
+    i = 0
+    samples = []
+    step = 1000 if step > 1000
+    while i < max do
+        elements = []
+        step.times {
+            ele = Digest::SHA1.hexdigest(i.to_s+seed.to_s)
+            elements << ele
+            i += 1
+        }
+        r.pfadd('hll',*elements)
+        approx = r.pfcount('hll')
+        err = approx-i
+        rel_err = 100.to_f*err/i
+        samples << [i,rel_err]
+    end
+    samples
+end
+
+def filter_samples(numsets,max,step,filter)
+    r = Redis.new
+    dataset = {}
+    (0...numsets).each{|i|
+        dataset[i] = run_experiment(r,i,max,step)
+        STDERR.puts "Set #{i}"
+    }
+    dataset[0].each_with_index{|ele,index|
+        if filter == :max
+            card=ele[0]
+            err=ele[1].abs
+            (1...numsets).each{|i|
+                err = dataset[i][index][1] if err < dataset[i][index][1]
+            }
+            puts "#{card} #{err}"
+        elsif filter == :avg
+            card=ele[0]
+            err = 0
+            (0...numsets).each{|i|
+                err += dataset[i][index][1]
+            }
+            err /= numsets
+            puts "#{card} #{err}"
+        elsif filter == :absavg
+            card=ele[0]
+            err = 0
+            (0...numsets).each{|i|
+                err += dataset[i][index][1].abs
+            }
+            err /= numsets
+            puts "#{card} #{err}"
+        elsif filter == :all
+            (0...numsets).each{|i|
+                card,err = dataset[i][index]
+                puts "#{card} #{err}"
+            }
+        else
+            raise "Unknown filter #{filter}"
+        end
+    }
+end
+
+if ARGV.length != 4
+    puts "Usage: hll-gnuplot-graph <samples> <max> <step> (max|avg|absavg|all)"
+    exit 1
+end
+filter_samples(ARGV[0].to_i,ARGV[1].to_i,ARGV[2].to_i,ARGV[3].to_sym)
diff --git a/utils/install_server.sh b/utils/install_server.sh
index c5ca944e5..15b60a08e 100755
--- a/utils/install_server.sh
+++ b/utils/install_server.sh
@@ -30,46 +30,48 @@
 # this scripts should be run as root
 
 die () {
-	echo "ERROR: $1. Aborting!" 
+	echo "ERROR: $1. Aborting!"
 	exit 1
 }
 
+
+#Absolute path to this script
+SCRIPT=$(readlink -f $0)
+#Absolute path this script is in
+SCRIPTPATH=$(dirname $SCRIPT)
+
 #Initial defaults
 _REDIS_PORT=6379
 
 echo "Welcome to the redis service installer"
-echo "This script will help you easily set up a running redis server
-
-"
+echo "This script will help you easily set up a running redis server"
+echo
 
-#check for root user TODO: replace this with a call to "id"
-if [ `whoami` != "root" ] ; then
+#check for root user
+if [ "$(id -u)" -ne 0 ] ; then
 	echo "You must run this script as root. Sorry!"
 	exit 1
 fi
 
-
 #Read the redis port
-read  -p "Please select the redis port for this instance: [$_REDIS_PORT] " REDIS_PORT 
-if [ ! `echo $REDIS_PORT | egrep "^[0-9]+\$"`  ] ; then
+read  -p "Please select the redis port for this instance: [$_REDIS_PORT] " REDIS_PORT
+if ! echo $REDIS_PORT | egrep -q '^[0-9]+$' ; then
 	echo "Selecting default: $_REDIS_PORT"
-	REDIS_PORT=$_REDIS_PORT 
+	REDIS_PORT=$_REDIS_PORT
 fi
 
 #read the redis config file
 _REDIS_CONFIG_FILE="/etc/redis/$REDIS_PORT.conf"
 read -p "Please select the redis config file name [$_REDIS_CONFIG_FILE] " REDIS_CONFIG_FILE
-if [ !"$REDIS_CONFIG_FILE" ] ; then
+if [ -z "$REDIS_CONFIG_FILE" ] ; then
 	REDIS_CONFIG_FILE=$_REDIS_CONFIG_FILE
 	echo "Selected default - $REDIS_CONFIG_FILE"
 fi
-#try and create it
-mkdir -p `dirname "$REDIS_CONFIG_FILE"` || die "Could not create redis config directory"
 
 #read the redis log file path
 _REDIS_LOG_FILE="/var/log/redis_$REDIS_PORT.log"
 read -p "Please select the redis log file name [$_REDIS_LOG_FILE] " REDIS_LOG_FILE
-if [ !"$REDIS_LOG_FILE" ] ; then
+if [ -z "$REDIS_LOG_FILE" ] ; then
 	REDIS_LOG_FILE=$_REDIS_LOG_FILE
 	echo "Selected default - $REDIS_LOG_FILE"
 fi
@@ -78,55 +80,71 @@ fi
 #get the redis data directory
 _REDIS_DATA_DIR="/var/lib/redis/$REDIS_PORT"
 read -p "Please select the data directory for this instance [$_REDIS_DATA_DIR] " REDIS_DATA_DIR
-if [ !"$REDIS_DATA_DIR" ] ; then
+if [ -z "$REDIS_DATA_DIR" ] ; then
 	REDIS_DATA_DIR=$_REDIS_DATA_DIR
 	echo "Selected default - $REDIS_DATA_DIR"
 fi
-mkdir -p $REDIS_DATA_DIR || die "Could not create redis data directory"
 
 #get the redis executable path
-_REDIS_EXECUTABLE=`which redis-server`
+_REDIS_EXECUTABLE=`command -v redis-server`
 read -p "Please select the redis executable path [$_REDIS_EXECUTABLE] " REDIS_EXECUTABLE
-if [ ! -f "$REDIS_EXECUTABLE" ] ; then
+if [ ! -x "$REDIS_EXECUTABLE" ] ; then
 	REDIS_EXECUTABLE=$_REDIS_EXECUTABLE
-	
-	if [ ! -f "$REDIS_EXECUTABLE" ] ; then
+
+	if [ ! -x "$REDIS_EXECUTABLE" ] ; then
 		echo "Mmmmm...  it seems like you don't have a redis executable. Did you run make install yet?"
 		exit 1
 	fi
-	
 fi
 
+#check the default for redis cli
+CLI_EXEC=`command -v redis-cli`
+if [ -z "$CLI_EXEC" ] ; then
+	CLI_EXEC=`dirname $REDIS_EXECUTABLE`"/redis-cli"
+fi
 
-#render the tmplates
-TMP_FILE="/tmp/$REDIS_PORT.conf"
-DEFAULT_CONFIG="../redis.conf"
-INIT_TPL_FILE="./redis_init_script.tpl"
-INIT_SCRIPT_DEST="/etc/init.d/redis_$REDIS_PORT"
-PIDFILE="/var/run/redis_$REDIS_PORT.pid"
+echo "Selected config:"
 
+echo "Port           : $REDIS_PORT"
+echo "Config file    : $REDIS_CONFIG_FILE"
+echo "Log file       : $REDIS_LOG_FILE"
+echo "Data dir       : $REDIS_DATA_DIR"
+echo "Executable     : $REDIS_EXECUTABLE"
+echo "Cli Executable : $CLI_EXEC"
 
+read -p "Is this ok? Then press ENTER to go on or Ctrl-C to abort." _UNUSED_
 
-#check the default for redis cli
-CLI_EXEC=`which redis-cli`
-if [ ! "$CLI_EXEC" ] ; then 
-	CLI_EXEC=`dirname $REDIS_EXECUTABLE`"/redis-cli"
+mkdir -p `dirname "$REDIS_CONFIG_FILE"` || die "Could not create redis config directory"
+mkdir -p `dirname "$REDIS_LOG_FILE"` || die "Could not create redis log dir"
+mkdir -p "$REDIS_DATA_DIR" || die "Could not create redis data directory"
+
+#render the templates
+TMP_FILE="/tmp/${REDIS_PORT}.conf"
+DEFAULT_CONFIG="${SCRIPTPATH}/../redis.conf"
+INIT_TPL_FILE="${SCRIPTPATH}/redis_init_script.tpl"
+INIT_SCRIPT_DEST="/etc/init.d/redis_${REDIS_PORT}"
+PIDFILE="/var/run/redis_${REDIS_PORT}.pid"
+
+if [ ! -f "$DEFAULT_CONFIG" ]; then
+	echo "Mmmmm... the default config is missing. Did you switch to the utils directory?"
+	exit 1
 fi
 
 #Generate config file from the default config file as template
 #changing only the stuff we're controlling from this script
 echo "## Generated by install_server.sh ##" > $TMP_FILE
 
-SED_EXPR="s#^port [0-9]{4}\$#port ${REDIS_PORT}#;\
-s#^logfile .+\$#logfile ${REDIS_LOG_FILE}#;\
-s#^dir .+\$#dir ${REDIS_DATA_DIR}#;\
-s#^pidfile .+\$#pidfile ${PIDFILE}#;\
-s#^daemonize no\$#daemonize yes#;" 
-echo $SED_EXPR
+read -r SED_EXPR <<-EOF
+s#^port [0-9]{4}\$#port ${REDIS_PORT}#; \
+s#^logfile .+\$#logfile ${REDIS_LOG_FILE}#; \
+s#^dir .+\$#dir ${REDIS_DATA_DIR}#; \
+s#^pidfile .+\$#pidfile ${PIDFILE}#; \
+s#^daemonize no\$#daemonize yes#;
+EOF
 sed -r "$SED_EXPR" $DEFAULT_CONFIG  >> $TMP_FILE
 
 #cat $TPL_FILE | while read line; do eval "echo \"$line\"" >> $TMP_FILE; done
-cp -f $TMP_FILE $REDIS_CONFIG_FILE || exit 1
+cp $TMP_FILE $REDIS_CONFIG_FILE || die "Could not write redis config file $REDIS_CONFIG_FILE"
 
 #Generate sample script from template file
 rm -f $TMP_FILE
@@ -138,7 +156,7 @@ REDIS_INIT_HEADER=\
 #Configurations injected by install_server below....\n\n
 EXEC=$REDIS_EXECUTABLE\n
 CLIEXEC=$CLI_EXEC\n
-PIDFILE=$PIDFILE\n
+PIDFILE=\"$PIDFILE\"\n
 CONF=\"$REDIS_CONFIG_FILE\"\n\n
 REDISPORT=\"$REDIS_PORT\"\n\n
 ###############\n\n"
@@ -146,45 +164,82 @@ REDISPORT=\"$REDIS_PORT\"\n\n
 REDIS_CHKCONFIG_INFO=\
 "# REDHAT chkconfig header\n\n
 # chkconfig: - 58 74\n
-# description: redis_6379 is the redis daemon.\n
+# description: redis_${REDIS_PORT} is the redis daemon.\n
 ### BEGIN INIT INFO\n
 # Provides: redis_6379\n
-# Required-Start: $network $local_fs $remote_fs\n
-# Required-Stop: $network $local_fs $remote_fs\n
+# Required-Start: \$network \$local_fs \$remote_fs\n
+# Required-Stop: \$network \$local_fs \$remote_fs\n
 # Default-Start: 2 3 4 5\n
 # Default-Stop: 0 1 6\n
-# Should-Start: $syslog $named\n
-# Should-Stop: $syslog $named\n
-# Short-Description: start and stop redis_6379\n
+# Should-Start: \$syslog \$named\n
+# Should-Stop: \$syslog \$named\n
+# Short-Description: start and stop redis_${REDIS_PORT}\n
 # Description: Redis daemon\n
 ### END INIT INFO\n\n"
 
-if [ !`which chkconfig` ] ; then 
-	#combine the header and the template (which is actually a static footer)
-	echo $REDIS_INIT_HEADER > $TMP_FILE && cat $INIT_TPL_FILE >> $TMP_FILE || die "Could not write init script to $TMP_FILE"
-else
+if command -v chkconfig >/dev/null; then
 	#if we're a box with chkconfig on it we want to include info for chkconfig
-	echo -e $REDIS_INIT_HEADER $REDIS_CHKCONFIG_INFO > $TMP_FILE && cat $INIT_TPL_FILE >> $TMP_FILE || die "Could not write init script to $TMP_FILE"
+	echo "$REDIS_INIT_HEADER" "$REDIS_CHKCONFIG_INFO" > $TMP_FILE && cat $INIT_TPL_FILE >> $TMP_FILE || die "Could not write init script to $TMP_FILE"
+else
+	#combine the header and the template (which is actually a static footer)
+	echo "$REDIS_INIT_HEADER" > $TMP_FILE && cat $INIT_TPL_FILE >> $TMP_FILE || die "Could not write init script to $TMP_FILE"
 fi
 
+###
+# Generate sample script from template file
+# - No need to check which system we are on. The init info are comments and
+#   do not interfere with update_rc.d systems. Additionally:
+#     Ubuntu/debian by default does not come with chkconfig, but does issue a
+#     warning if init info is not available.
+
+cat > ${TMP_FILE} <<EOT
+#/bin/sh
+#Configurations injected by install_server below....
+
+EXEC=$REDIS_EXECUTABLE
+CLIEXEC=$CLI_EXEC
+PIDFILE=$PIDFILE
+CONF="$REDIS_CONFIG_FILE"
+REDISPORT="$REDIS_PORT"
+###############
+# SysV Init Information
+# chkconfig: - 58 74
+# description: redis_${REDIS_PORT} is the redis daemon.
+### BEGIN INIT INFO
+# Provides: redis_${REDIS_PORT}
+# Required-Start: \$network \$local_fs \$remote_fs
+# Required-Stop: \$network \$local_fs \$remote_fs
+# Default-Start: 2 3 4 5
+# Default-Stop: 0 1 6
+# Should-Start: \$syslog \$named
+# Should-Stop: \$syslog \$named
+# Short-Description: start and stop redis_${REDIS_PORT}
+# Description: Redis daemon
+### END INIT INFO
+
+EOT
+cat ${INIT_TPL_FILE} >> ${TMP_FILE}
+
 #copy to /etc/init.d
-cp -f $TMP_FILE $INIT_SCRIPT_DEST && chmod +x $INIT_SCRIPT_DEST || die "Could not copy redis init script to  $INIT_SCRIPT_DEST"
+cp $TMP_FILE $INIT_SCRIPT_DEST && \
+	chmod +x $INIT_SCRIPT_DEST || die "Could not copy redis init script to  $INIT_SCRIPT_DEST"
 echo "Copied $TMP_FILE => $INIT_SCRIPT_DEST"
 
 #Install the service
 echo "Installing service..."
-if [ !`which chkconfig` ] ; then 
+if command -v chkconfig >/dev/null 2>&1; then
+	# we're chkconfig, so lets add to chkconfig and put in runlevel 345
+	chkconfig --add redis_${REDIS_PORT} && echo "Successfully added to chkconfig!"
+	chkconfig --level 345 redis_${REDIS_PORT} on && echo "Successfully added to runlevels 345!"
+elif command -v update-rc.d >/dev/null 2>&1; then
 	#if we're not a chkconfig box assume we're able to use update-rc.d
-	update-rc.d redis_$REDIS_PORT defaults && echo "Success!"
+	update-rc.d redis_${REDIS_PORT} defaults && echo "Success!"
 else
-	# we're chkconfig, so lets add to chkconfig and put in runlevel 345
-	chkconfig --add redis_$REDIS_PORT && echo "Successfully added to chkconfig!"
-	chkconfig --level 345 redis_$REDIS_PORT on && echo "Successfully added to runlevels 345!"
+	echo "No supported init tool found."
 fi
-	
+
 /etc/init.d/redis_$REDIS_PORT start || die "Failed starting service..."
 
 #tada
 echo "Installation successful!"
 exit 0
-
diff --git a/utils/lru/README b/utils/lru/README
new file mode 100644
index 000000000..288189e3e
--- /dev/null
+++ b/utils/lru/README
@@ -0,0 +1,13 @@
+The test-lru.rb program can be used in order to check the behavior of the
+Redis approximated LRU algorithm against the theoretical output of true
+LRU algorithm.
+
+In order to use the program you need to recompile Redis setting the define
+REDIS_LRU_CLOCK_RESOLUTION to 1, by editing redis.h.
+This allows to execute the program in a fast way since the 1 ms resolution
+is enough for all the objects to have a different enough time stamp during
+the test.
+
+The program is executed like this:
+
+    ruby test-lru.rb > /tmp/lru.html
diff --git a/utils/lru/test-lru.rb b/utils/lru/test-lru.rb
new file mode 100644
index 000000000..d4b0f88cf
--- /dev/null
+++ b/utils/lru/test-lru.rb
@@ -0,0 +1,112 @@
+require 'rubygems'
+require 'redis'
+
+r = Redis.new
+r.config("SET","maxmemory","2000000")
+r.config("SET","maxmemory-policy","allkeys-lru")
+r.config("SET","maxmemory-samples",5)
+r.config("RESETSTAT")
+r.flushall
+
+puts <<EOF
+<html>
+<body>
+<style>
+.box {
+    width:5px;
+    height:5px;
+    float:left;
+    margin: 1px;
+}
+
+.old {
+    border: 1px black solid;
+}
+
+.new {
+    border: 1px green solid;
+}
+
+.ex {
+    background-color: #666;
+}
+</style>
+<pre>
+EOF
+
+# Fill
+oldsize = r.dbsize
+id = 0
+while true
+    id += 1
+    r.set(id,"foo")
+    newsize = r.dbsize
+    break if newsize == oldsize
+    oldsize = newsize
+end
+
+inserted = r.dbsize
+first_set_max_id = id
+puts "#{r.dbsize} keys inserted"
+
+# Access keys sequencially
+
+puts "Access keys sequencially"
+(1..first_set_max_id).each{|id|
+    r.get(id)
+#    sleep 0.001
+}
+
+# Insert more 50% keys. We expect that the new keys
+half = inserted/2
+puts "Insert enough keys to evict half the keys we inserted"
+add = 0
+while true
+    add += 1
+    id += 1
+    r.set(id,"foo")
+    break if r.info['evicted_keys'].to_i >= half
+end
+
+puts "#{add} additional keys added."
+puts "#{r.dbsize} keys in DB"
+
+# Check if evicted keys respect LRU
+# We consider errors from 1 to N progressively more serious as they violate
+# more the access pattern.
+
+errors = 0
+e = 1
+edecr = 1.0/(first_set_max_id/2)
+(1..(first_set_max_id/2)).each{|id|
+    e -= edecr if e > 0
+    e = 0 if e < 0
+    if r.exists(id)
+        errors += e
+    end
+}
+
+puts "#{errors} errors!"
+puts "</pre>"
+
+# Generate the graphical representation
+(1..id).each{|id|
+    # Mark first set and added items in a different way.
+    c = "box"
+    if id <= first_set_max_id
+        c << " old"
+    else
+        c << " new"
+    end
+
+    # Add class if exists
+    c << " ex" if r.exists(id)
+    puts "<div class=\"#{c}\"></div>"
+}
+
+# Close HTML page
+
+puts <<EOF
+</body>
+</html>
+EOF
diff --git a/utils/redis_init_script.tpl b/utils/redis_init_script.tpl
index e2af1fe77..d65086312 100755
--- a/utils/redis_init_script.tpl
+++ b/utils/redis_init_script.tpl
@@ -3,29 +3,41 @@ case "$1" in
     start)
         if [ -f $PIDFILE ]
         then
-                echo "$PIDFILE exists, process is already running or crashed"
+            echo "$PIDFILE exists, process is already running or crashed"
         else
-                echo "Starting Redis server..."
-                $EXEC $CONF
+            echo "Starting Redis server..."
+            $EXEC $CONF
         fi
         ;;
     stop)
         if [ ! -f $PIDFILE ]
         then
-                echo "$PIDFILE does not exist, process is not running"
+            echo "$PIDFILE does not exist, process is not running"
         else
-                PID=$(cat $PIDFILE)
-                echo "Stopping ..."
-                $CLIEXEC -p $REDISPORT shutdown
-                while [ -x /proc/${PID} ]
-                do
-                    echo "Waiting for Redis to shutdown ..."
-                    sleep 1
-                done
-                echo "Redis stopped"
+            PID=$(cat $PIDFILE)
+            echo "Stopping ..."
+            $CLIEXEC -p $REDISPORT shutdown
+            while [ -x /proc/${PID} ]
+            do
+                echo "Waiting for Redis to shutdown ..."
+                sleep 1
+            done
+            echo "Redis stopped"
         fi
         ;;
+    status)
+        if [ ! -f $PIDFILE ]
+        then
+            echo 'Redis is not running'
+        else
+            echo "Redis is running ($(<$PIDFILE))"
+        fi
+        ;;
+    restart)
+        $0 stop
+        $0 start
+        ;;
     *)
-        echo "Please use start or stop as first argument"
+        echo "Please use start, stop, restart or status as first argument"
         ;;
 esac