Redis 源码走读（二）对象系统

Redis设计了多种数据结构，并以此为基础构建了多种对象，每种对象（除了新出的 stream 以外）都有超过一种的实现。

redisObject 这个结构体反应了 Redis 对象的内存布局

typedef struct redisObject {

    unsigned type:;//对象类型 4bit

    unsigned encoding:;//底层数据结构 4 bit

    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or

                            * LFU data (least significant 8 bits frequency

                            * and most significant 16 bits access time). */ // 24 bit

    int refcount; // 4 byte

    void *ptr;//指向数据结构的指针 // 8 byte

} robj;

可以看出，robj 用4个 bit 存储对象类型，4个 bit 存储对象的底层数据结构

以及 robj 的固定大小为 16 byte

其中对象类型有下面几种：

#define OBJ_STRING 0    /* String object. *///字符串类型

#define OBJ_LIST 1      /* List object. *///列表类型

#define OBJ_SET 2       /* Set object. *///集合对象

#define OBJ_ZSET 3      /* Sorted set object. *///有序集合对象

#define OBJ_HASH 4      /* Hash object. *///哈希对象

#define OBJ_MODULE 5    /* Module object. *///模块对象

#define OBJ_STREAM 6    /* Stream object. *///流对象，redis 5中新增

数据结构有下面几种：

#define OBJ_ENCODING_RAW 0     /* Raw representation *///基本 sds

#define OBJ_ENCODING_INT 1     /* Encoded as integer *///整数表示的字符串

#define OBJ_ENCODING_HT 2      /* Encoded as hash table *///字典

#define OBJ_ENCODING_ZIPMAP 3  /* Encoded as zipmap */ //废弃

#define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. *//废弃

#define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist *///压缩列表

#define OBJ_ENCODING_INTSET 6  /* Encoded as intset *///整数集合

#define OBJ_ENCODING_SKIPLIST 7  /* Encoded as skiplist *///跳跃表

#define OBJ_ENCODING_EMBSTR 8  /* Embedded sds string encoding *///embstr

#define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */

#define OBJ_ENCODING_STREAM 10 /* Encoded as a radix tree of listpacks */

其实观察 objectComputeSize 这个方法就看出对象与数据结构的关联关系

OBJ_STRING = OBJ_ENCODING_RAW + OBJ_ENCODING_INT + OBJ_ENCODING_EMBSTR

OBJ_LIST = OBJ_ENCODING_QUICKLIST + OBJ_ENCODING_ZIPLIST

OBJ_SET = OBJ_ENCODING_INTSET + OBJ_ENCODING_HT

OBJ_ZSET = OBJ_ENCODING_SKIPLIST + OBJ_ENCODING_ZIPLIST

OBJ_HASH = OBJ_ENCODING_HT + OBJ_ENCODING_ZIPLIST

OBJ_STREAM = OBJ_ENCODING_STREAM

为什么要设置这么复杂的对象系统呢，主要还是为了压缩内存。

以最最常见的字符串对象为例，它对应的数据结构是最多的，有三种，其目的在一个名为 tryObjectEncoding 的函数中可见一斑：

//尝试压缩 string

//1. 检查是否可以直接用 INT 存储，最好能用 shared.integers 来存

//2. 检查是否可以用 embstr 来存储

//3. 如果 sds 有1/10的空间空闲，则压缩空闲空间

/* Try to encode a string object in order to save space */

robj *tryObjectEncoding(robj *o) {

    long value;

    sds s = o->ptr;

    size_t len;

    ......

    /* Check if we can represent this string as a long integer.

     * Note that we are sure that a string larger than 20 chars is not

     * representable as a 32 nor 64 bit integer. */

    len = sdslen(s);

    if (len <=  && string2l(s,len,&value)) {//检查是否为长度<=20的整数

        /* This object is encodable as a long. Try to use a shared object.

         * Note that we avoid using shared integers when maxmemory is used

         * because every object needs to have a private LRU field for the LRU

         * algorithm to work well. */

        //检查 value 是否落在 [0， OBJ_SHARED_INTEGERS)这个区间里

        if ((server.maxmemory ==  ||

            !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&

            value >=  &&

            value < OBJ_SHARED_INTEGERS)

        {

            decrRefCount(o);

            incrRefCount(shared.integers[value]);

            return shared.integers[value];

        } else {

            if (o->encoding == OBJ_ENCODING_RAW) sdsfree(o->ptr);

            o->encoding = OBJ_ENCODING_INT;

            o->ptr = (void*) value;

            return o;

        }

    }

    /* If the string is small and is still RAW encoded,

     * try the EMBSTR encoding which is more efficient.

     * In this representation the object and the SDS string are allocated

     * in the same chunk of memory to save space and cache misses. */

    //是否可以用 embstr 来存储：检查string 的长度是否 <= 44

    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {

        robj *emb;

        if (o->encoding == OBJ_ENCODING_EMBSTR) return o;

        emb = createEmbeddedStringObject(s,sdslen(s));

        decrRefCount(o);

        return emb;

    }

    /* We can't encode the object...

     *

     * Do the last try, and at least optimize the SDS string inside

     * the string object to require little space, in case there

     * is more than 10% of free space at the end of the SDS string.

     *

     * We do that only for relatively large strings as this branch

     * is only entered if the length of the string is greater than

     * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */

    //尝试压缩 sds 的空间

    if (o->encoding == OBJ_ENCODING_RAW &&

        sdsavail(s) > len/)

    {

        o->ptr = sdsRemoveFreeSpace(o->ptr);

    }

    /* Return the original object. */

    return o;

}

可以看出 Redis 对内存的使用是非常克制的。

分析一个很有意思的细节：为什么 embstr 与 raw sds 的分界线在 44 这个长度呢？
看一下sdshdr8这个结构体

struct __attribute__ ((__packed__)) sdshdr8 {

    uint8_t len; /* used */ // 1 byte

    uint8_t alloc; /* excluding the header and null terminator */ // 1 byte

    unsigned char flags; /* 3 lsb of type, 5 unused bits */ // 1 byte

    char buf[];

};

可以看出len + alloc + flags = 3 byte

然后Redis 会默认在存储的字符串尾部加一个 '\0'，这个也会占据一个1 byte 的空间

也就是说一个 sdshdr8 除去内容以外至少要占 4个 byte 的空间

再加上 robj 头的大小 16 byte，那就是20 byte

而 jemalloc 会固定分配8/16/32/64 等大小的内存，所以以 44 为embstr 与 raw sds 的分界线，是有深意的（是否可以再细一点，将 12 作为另外一种更小的字符串的分界线呢？）

更有趣的是，如果往前翻几个版本，可以发现这个分界线是在 39 byte，这是因为老版本的 sds 只有一种：

struct sdshdr {

    unsigned int len;//4 byte

    unsigned int free;//4 byte

    char buf[];

};

可以看出sdshdr 的固定开销是4+4+1 = 9 byte，再加上 robj 的16byte就是25byte，所以分界线就只能定为39byte 了

新版本的sdshdr8 与之相比，硬是抠出了5个 byte 的空间，真的非常了不起

巴特西

Redis 源码走读（二）对象系统

最新文章

热门文章