HashMap的源码分析

字数统计: 4.6k字 | 阅读时长: 20分

2019-01-08 java 阅读量

HashMap 内部的结构

HashMap 它可以看作是数组（Node<K,V>[] table）和链表结合组成的复合结构，数组被分为一个个桶（bucket），通过哈希值决定了键值对在这个数组的寻址；哈希值相同的键值对，则以链表形式存储，你可以参考下面的示意图。这里需要注意的是，如果链表大小超过阈值（TREEIFY_THRESHOLD, 8），图中的链表就会被改造为树形结构。

HashMap的类加载

会有一些静态属性进行赋值。

// 初始容量值，必须是2的整数次幂
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// 当我们设置初始化默认值的时候，最大的容量值，必须是2的整数次幂
static final int MAXIMUM_CAPACITY = 1 << 30;

// 扩容的时候需要的加载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// 当链条需要进行树化的阈值
static final int TREEIFY_THRESHOLD = 8;

static final int UNTREEIFY_THRESHOLD = 6;

static final int MIN_TREEIFY_CAPACITY = 64;

从构造方法开始

默认构造方法
1
2
3
4
// 只给初始容量赋值
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted 1<<4 即16
}
从构造方法可以看得出来，HashMap 也许是按照 lazy-load 原则，在首次使用时才被初始化（拷贝构造函数除外）

带初始容量的构造方法

public HashMap(int initialCapacity) {
    // 会调用另一个构造方法，带初始容量和加载因子, DEFAULT_LOAD_FACTOR = 1<<4
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

带初始容量和加载因子的构造方法

public HashMap(int initialCapacity, float loadFactor) {
	// 判断传入的初始容量是否负数
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    // 判断初始容量是否大于最大容量值 1<<30 即是int的最大值
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    // 判断传入的加载因子是否正规
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    // 计算大于等于参数且最小2的幂
    this.threshold = tableSizeFor(initialCapacity);
}

tableSizeFor

可以看一下这个 tableSizeFor 算法，比较有艺术感

/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

这个方法是在找大于等于cap且最小2的幂
比如
cap=1   结果 2 0次方 1
cap=2  2
cap=3 4
cap=9  16
分析下等于9
cap - 1  第一步结果8
00000000000000000000000000001000    8
00000000000000000000000000000100    右移1位 

00000000000000000000000000001100    或运算 结果
00000000000000000000000000000011    右移2位
00000000000000000000000000001111    或运算 结果

00000000000000000000000000001111    右移 4 8 16没用全是0结果还是这个15
最终 +1   16

分析下等于大点 12345678
00000000101111000110000101001110  12345678
00000000101111000110000101001101  -1结果   12345677
00000000010111100011000010100110  右移1位 

00000000111111100111000111101111  或运算 结果
00000000001111111001110001111011  右移2位

00000000111111111111110111111111  差不多了在移0就没了都是1了，+1不是肯定是2的倍数了

再说开始-1原因这是为了防止，cap已经是2的幂。
如果cap已经是2的幂， 又没有执行这个减1操作，则执行完后面的几条无符号右移操作之后，返回的capacity将是这个cap的2倍。

Put操作

先看常用的 put键值对，这个学完了，那么其他的put方法就没什么问题了，比如 putAll、putIfAbsent、putMapEntries，取值就是一个反向就简单了

1
2
3

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

hash

先对key进行hash计算，学一下

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    // 1. 看出key是可以空的  hash为0
    // 2. hashcode是32位的，无符号右移16位，那生成的就是16位0加原高位的16位值， 就是对半了，异或计算也就变成了高16位和低16位进行异或，原高16位不变。
    // 这么干主要用于当hashmap 数组比较小的时候所有bit都参与运算了，防止hash冲突太大，所谓hash冲突是指不同的key计算出的hash是一样的，比如a和97，这个肯定是存在的没毛病,这样做可以有效避免类似情况下的哈希碰撞
}

putVal

/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value //如果为true就不改变原本的value
* @param evict if false, the table is in creation mode. //在hashmap中没用
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // 申明变量，可以在下方看 Node的键值对节点数据结构
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        // 如果表格是 null，resize 方法会负责初始化它，下方再介绍resize()他的两个职责
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)//通过计算的hash去映射，并检测数组头是否为空
        tab[i] = newNode(hash, key, value, null);//为空则直接创建新的节点
    else {
        // 如果有数据则hash相同,申明变量进行下方判断
        Node<K,V> e; K k;
        // 如果hash相同和key都相同,则获取此节点到下方判断是否覆盖
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)// 判断一下节点类型，如果是已经树化则调用红黑树的插入
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //对链表进行遍历匹配
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) { //如果下一个节点为空则插入
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st 
                        treeifyBin(tab, hash);// 如果链条的长度超过树化阈值则进行树化
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;//获取hash和key相同的节点
                p = e;
            }
        }
        if (e != null) { // existing mapping for key 判断是否获取到节点
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)// 判断传入参数是否要进行覆盖
                e.value = value;//如果是进行覆盖
            // 此方法是空代码 作用是给LinkedHashMap进行继承重写 
            afterNodeAccess(e);//包括afterNodeAccess、afterNodeRemoval
            return oldValue;
        }
    }
    ++modCount;// 每次操作对modCount维护，因为fail-fast机制
    // threshold = (capacity * load factor)，在resize()里面维护
    if (++size > threshold)
        resize();// 如果在放置新的键值对的过程中，如果发生上面条件，就会发生扩容
    //LinkedHashMap中被覆盖的afterNodeInsertion方法，用来回调移除最早放入Map的对象
    afterNodeInsertion(evict);
    return null;
}

Node键值对的节点数据结构

/**
* Basic hash bin node, used for most entries.  (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

resize()方法

/**
* Initializes or doubles table size.  If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
* 
* @return the table
*/
//主要肩负两个职责：负责初始化和扩容
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {// 旧容量的长度大于等于最大容量则设置阈值为1<<30 返回旧的table
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&//旧容量的长度小于最大容量和超过阈值 则扩容2倍
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else { // zero initial threshold signifies using defaults 如果是0就是初始化 计算默认值
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    //初始化table或者扩容, 实际上都是通过新建一个table来完成的 所以耗能也大
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // 移动到新的数组结构 e 数组结构 
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {//遍历
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;//清除旧表的node，节点还在e
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;//如果是null直接put进去
                else if (e instanceof TreeNode)//判断是节点类型 直接调用红黑树的方法
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // 下面这段代码很精妙, 是链表拆分，单独分一段详细来讲
                    Node<K,V> loHead = null, loTail = null;//一个lo头和lo链
                    Node<K,V> hiHead = null, hiTail = null;//一个hi头和hi链
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {//进行链表拆分
                            if (loTail == null)// 插入lo链表
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)// 插入hi链表
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {//如果lo链表非空, 我们就把整个lo链表放到新table的j位置上
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {//如果hi链表非空, 我们就把整个hi链表放到新table的j+oldCap位置上
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

看到上面的链表拆分，综上我们知道, 这段代码的意义就是将原来的链表拆分成两个链表, 并将这两个链表分别放到新的table的 j 位置和 j+oldCap 上, j位置就是原链表在原table中的位置, 拆分的标准就是:

1	(e.hash & oldCap) == 0

可以参考图片

说明一下 `(e.hash & oldCap) == 0`

首先需要明确的是：

oldCap一定是2的整数次幂, 这里假设是 2^m
newCap是oldCap的两倍, newCap = oldCap << 1 , 则会是 2^(m+1)
hash对数组大小取模 (n - 1) & hash 其实就是取 hash的低m位, 即n最低2的m整数次幂

我们假设 oldCap = 1 <<4, 即 2^4, 即16
二进制为：  00000000 00000000 00000000 00010000 即0x00010000
16 - 1为： 00000000 00000000 00000000 00001111  即0x00001111

若oldCap扩容2倍 oldCap << 1 即 2^5 即32
二进制为：  00000000 00000000 00000000 00100000 即0x00100000
32 - 1为： 00000000 00000000 00000000 00011111  即0x00011111

可见16中除了低4位, 其他位置都是0,（简洁起见，高位的0后面就不写了）则 (16-1) & hash 自然就是取hash值的低4位,我们假设低4位值为 0xabcd,在16即为0x1111.

当我们将oldCap扩大两倍后, 新的index的位置就变成了 (32-1) & hash, 其实就是取 hash值的低5位为 0x1abcd,在32即为0x11111。

那么对于同一个Node, 低5位的值无外乎下面两种情况:

2^m (m=4):0xabcd 高位为0，若取5位:  0x0abcd
2^(m+1) (m=4):0x1abcd              0x1abcd

16:0x01111
32:0x11111

其中, 0xabcd与原来的index值一致, 而 1abcd = 0abcd + 10000 (左移一位) = 0abcd + oldCap

故虽然数组大小扩大了一倍，但是同一个key在新旧table中对应的index却存在一定联系：要么一致，要么相差一个 oldCap。

故得出结论:

如果 (e.hash & oldCap) == 0 则该节点在新表的下标位置与旧表一致都为 j

如果 (e.hash & oldCap) == 1 则该节点在新表的下标位置 j + oldCap

根据这个条件, 我们将原位置的链表拆分成两个链表, 然后一次性将整个链表放到新的Table对应的位置上.

treeifyBin()树化方法

/**
* Replaces all linked nodes in bin at index for given hash unless
* table is too small, in which case resizes instead.
*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    //如果table的长度即容量小于 MIN_TREEIFY_CAPACITY，只会进行简单的扩容。 MIN_TREEIFY_CAPACITY = 64
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    //如果table的长度即容量大于 MIN_TREEIFY_CAPACITY ，则会进行树化改造。
    else if ((e = tab[index = (n - 1) & hash]) != null) {//根据hash值和数组长度进行取模运算后，得到链表的首节点
        TreeNode<K,V> hd = null, tl = null;// 定义首、尾节点
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);// 将该节点转换为 树节点
            if (tl == null)// 如果尾节点为空，说明还没有根节点
                hd = p; // 首节点（根节点）指向 当前节点
            else {// 尾节点不为空，以下两行是一个双向链表结构
                p.prev = tl; // 当前树节点的 前一个节点指向 尾节点
                tl.next = p; // 尾节点的 后一个节点指向 当前节点
            }
            tl = p; // 把当前节点设为尾节点
        } while ((e = e.next) != null);// 继续遍历链表

        // 到目前为止 也只是把Node对象转换成了TreeNode对象，把单向链表转换成了双向链表
        // 把转换后的双向链表，替换原来位置上的单向链表
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

treeify()方法

/**
* Forms tree of the nodes linked from this node.
* @return root of tree
*/
final void treeify(Node<K,V>[] tab) {//参数为HashMap的元素数组
    TreeNode<K,V> root = null;// 定义树的根节点
    for (TreeNode<K,V> x = this, next; x != null; x = next) {// 遍历链表，x指向当前节点、next指向下一个节点
        next = (TreeNode<K,V>)x.next;// 下一个节点
        x.left = x.right = null; // 设置当前节点的左右节点为空
        if (root == null) { // 如果还没有根节点
            x.parent = null; // 当前节点的父节点设为空
            x.red = false; // 当前节点的红色属性设为false（把当前节点设为黑色）
            root = x; // 根节点指向到当前节点
        }
        else { // 如果已经存在根节点了
            K k = x.key; // 取得当前链表节点的key
            int h = x.hash; // 取得当前链表节点的hash值
            Class<?> kc = null; // 定义key所属的Class
            for (TreeNode<K,V> p = root;;) { // 从根节点开始遍历，此遍历没有设置边界，只能从内部跳出
                // GOTO1
                int dir, ph; // dir 标识方向（左右）、ph标识当前树节点的hash值
                K pk = p.key; // 当前树节点的key
                if ((ph = p.hash) > h) // 如果当前树节点hash值 大于 当前链表节点的hash值
                    dir = -1; // 标识当前链表节点会放到当前树节点的左侧
                else if (ph < h)
                    dir = 1; // 右侧
 
                /*
                 * 如果两个节点的key的hash值相等，那么还要通过其他方式再进行比较
                 * 如果当前链表节点的key实现了comparable接口，并且当前树节点和链表节点是相同Class的实例，那么通过comparable的方式再比较两者。
                 * 如果还是相等，最后再通过tieBreakOrder比较一次
                 */
                else if ((kc == null &&
                            (kc = comparableClassFor(k)) == null) ||
                            (dir = compareComparables(kc, k, pk)) == 0)
                    dir = tieBreakOrder(k, pk);
 
                TreeNode<K,V> xp = p; // 保存当前树节点
 
                /*
                 * 如果dir 小于等于0 ： 当前链表节点一定放置在当前树节点的左侧，但不一定是该树节点的左孩子，也可能是左孩子的右孩子 或者 更深层次的节点。
                 * 如果dir 大于0 ： 当前链表节点一定放置在当前树节点的右侧，但不一定是该树节点的右孩子，也可能是右孩子的左孩子 或者 更深层次的节点。
                 * 如果当前树节点不是叶子节点，那么最终会以当前树节点的左孩子或者右孩子 为 起始节点  再从GOTO1 处开始 重新寻找自己（当前链表节点）的位置
                 * 如果当前树节点就是叶子节点，那么根据dir的值，就可以把当前链表节点挂载到当前树节点的左或者右侧了。
                 * 挂载之后，还需要重新把树进行平衡。平衡之后，就可以针对下一个链表节点进行处理了。
                 */
                if ((p = (dir <= 0) ? p.left : p.right) == null) {
                    x.parent = xp; // 当前链表节点 作为 当前树节点的子节点
                    if (dir <= 0)
                        xp.left = x; // 作为左孩子
                    else
                        xp.right = x; // 作为右孩子
                    root = balanceInsertion(root, x); // 重新平衡
                    break;
                }
            }
        }
    }
 
    // 把所有的链表节点都遍历完之后，最终构造出来的树可能经历多个平衡操作，根节点目前到底是链表的哪一个节点是不确定的
    // 因为我们要基于树来做查找，所以就应该把 tab[N] 得到的对象一定根节点对象，而目前只是链表的第一个节点对象，所以要做相应的处理。
    moveRootToFront(tab, root);
}

分析： https://me.csdn.net/weixin_42340670
重新平衡 balanceInsertion ： https://blog.csdn.net/weixin_42340670/article/details/80550932
moveRootToFront：https://blog.csdn.net/weixin_42340670/article/details/80555860

Get操作

public V get(Object key) {
    Node<K,V> e;
    // 判断调用getNode获取的节点是否为空，否则返回value
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

getNode()方法

/**
* Implements Map.get and related methods
*
* @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && //always check first node判断第一个节点的hash和key是否相同则直接返回
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            // 如果节点类型是红黑树则调用红黑树获取节点方法
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 遍历节点判断hash和key是否相同则直接返回
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

解决哈希冲突的常用方法

Hash一些理解

Hash，一般翻译做 “散列”，也有直接音译为 “哈希” 的，就是把任意长度的输入（又叫做预映射， pre-image）

通过散列算法，变换成固定长度的输出，该输出就是散列值。这种转换是一种压缩映射，也就是，散列值的空间通常远小于输入的空间，不同的输入可能会散列成相同的输出，而不可能从散列值来唯一的确定输入值。