# TLAB

TLAB(Thread Local Allocation Buffer)是jvm用来提升对象内存分配速度的机制。

TLAB和id生成器中很多地方又有类似的设计思想。

假如我们要设计一个内存分配器,给一个对象分配内存空间,可以通过一个top指针指向当前内存的位置,如果有对象要分配,则将 top加上对象的大小得到新的top值,则老的top到新的top之间的空间就分配给这个对象了。 但是在实际使用过程中,多线程都在申请内存,那么top的修改就会出现比较多的冲突,为了减少线程间的冲突,可以采用本地缓存的思想, 每个线程每次拿一大块内存,自己慢慢分配使用,用完了再来申请下一大块内存,由此就降低了线程间的冲突,提升了分配效率。 这就是Thread Local Allocation Buffer简称TLAB的核心思想。

在实际进行分配时,申请的TLAB的大小、TLAB什么时候更换新的,也是需要考虑的, TLAB太小,可能会出现对象频繁放不下,放不下我们只能回退到比较慢的到eden中去分配内存或者更换新的TLAB。 TLAB如果太大,线程比较多的情况下,可能会导致内存不够用。 TLAB什么时候更换新的也有讲究,TLAB最后剩下了一块内存,如果一直不更换新的TLAB,则可能会有大量的对象分配使用不了TLAB而降级到 避免慢的慢速分配,如果轻易就更换新的TLAB,则之前的TLAB中剩余的内存就浪费了,所以要在分配速度和内存浪费之间取得一个权衡,在jdk中 能够通过参数控制浪费的比例并且有自动调节机制。

# TLAB的配置

TLAB相关的配置可以通过java -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal -version | grep 'TLAB' 命令查看。

ResizeTLAB

# TLAB的实现

# UseTLAB

是否使用TLAB,默认为true。对于大多数应用,也不建议关闭。

# ResizeTLAB

是否动态调整TLAB的大小,默认为true。一般也不建议关闭,因为如果关闭,TLAB的大小固定,很考验TLAB大小的设置,而不能利用自动调整 机制中分配效率和内存浪费间进行权衡。

# MinTLABSize

TLAB的最小的大小,单位是bytes,默认2k。

# TLABSize

初始时TLAB的大小,默认值0,则

TLAB size size change

jvm word是什么含义

dummy object _word_size retire refill inside tlab outside tlab slow alloc fast alloc

TLAB保存、gc回收

和GC的关系

浪费的数量

# 关键概念、流程

对象内存分配过程

collectedHeap.inline.hpp中定义了在heap中创建对象申请内存的方法

inline oop CollectedHeap::obj_allocate(Klass* klass, size_t size, TRAPS) {
  ObjAllocator allocator(klass, size, THREAD);
  return allocator.allocate();
}

inline oop CollectedHeap::array_allocate(Klass* klass, size_t size, int length, bool do_zero, TRAPS) {
  ObjArrayAllocator allocator(klass, size, length, do_zero, THREAD);
  return allocator.allocate();
}

inline oop CollectedHeap::class_allocate(Klass* klass, size_t size, TRAPS) {
  ClassAllocator allocator(klass, size, THREAD);
  return allocator.allocate();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

ObjAllocator等都继承与MemAllocator,调用memAllocator.cpp中的allocate方法负责创建对象 mem_allocate方法判断如果启用了UseTLAB,先调用allocate_inside_tlab尝试中tlab中申请, 如果申请失败,调用allocate_outside_tlab中tlab之外申请。

oop MemAllocator::allocate() const {
  oop obj = NULL;
  {
    Allocation allocation(*this, &obj);
    HeapWord* mem = mem_allocate(allocation);
    if (mem != NULL) {
      obj = initialize(mem);
    } else {
      // The unhandled oop detector will poison local variable obj,
      // so reset it to NULL if mem is NULL.
      obj = NULL;
    }
  }
  return obj;
}
HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
  if (UseTLAB) {
    HeapWord* result = allocate_inside_tlab(allocation);
    if (result != NULL) {
      return result;
    }
  }

  return allocate_outside_tlab(allocation);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

allocate_inside_tlab方法先调用allocate_inside_tlab_fast尝试中当前tlab申请进行分配, 如果当前tlab剩余空间无法容纳要分配的对象大小,则返回NULL,否则分配成功返回。 分配失败后调用allocate_inside_tlab_slow,判断下是否可以更换当前的tlab,更换一块新的tlab内存, 如果更换成功,在新的里面分配对象,否则返回NULL。

HeapWord* MemAllocator::allocate_inside_tlab(Allocation& allocation) const {
  assert(UseTLAB, "should use UseTLAB");

  // Try allocating from an existing TLAB.
  HeapWord* mem = allocate_inside_tlab_fast();
  if (mem != NULL) {
    return mem;
  }

  // Try refilling the TLAB and allocating the object in it.
  return allocate_inside_tlab_slow(allocation);
}
1
2
3
4
5
6
7
8
9
10
11
12

allocate_inside_tlab_fast调用的是ThreadLocalAllocBuffer的allocate方法

HeapWord* MemAllocator::allocate_inside_tlab_fast() const {
  return _thread->tlab().allocate(_word_size);
}
1
2
3

threadLocalAllocBuffer.inline.hpp中定义了allocate的实现,判断当前 end和top之间的差也就是当前剩余的内存空间,是否大于等于size,即能否放下新对象, 如果放不下,返回NULL。如果可以,则向分配中的空间填充badHeapWordVal(header不会填充),然后更新top值(加上size)。

inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
  invariants();
  HeapWord* obj = top();
  if (pointer_delta(end(), obj) >= size) {
    // successful thread-local allocation
#ifdef ASSERT
    // Skip mangling the space corresponding to the object header to
    // ensure that the returned space is not considered parsable by
    // any concurrent GC thread.
    size_t hdr_size = oopDesc::header_size();
    Copy::fill_to_words(obj + hdr_size, size - hdr_size, badHeapWordVal);
#endif // ASSERT
    // This addition is safe because we know that top is
    // at least size below end, so the add can't wrap.
    set_top(obj + size);

    invariants();
    return obj;
  }
  return NULL;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

allocate_inside_tlab_slow方法负责中tlab_fast失败之后(即当前剩余空间不够分配新对象),判断是否要替换tlab。 should_post_sampled_object_alloc负责处理jvmti的sample逻辑,我们暂时不用关心。 然后回判断tlab.free()是否大于tlab.refill_waste_limit(),refill_waste_limit是 当前更换tlab允许浪费的内存的一个limit限制,小于等于它才能替换tlab,否则直接返回NULL,refill_waste_limit值也会动态调整稍后讲到。 如果free小于等于refill_waste_limit,则通过tlab.compute_size(_word_size)计算下一个要创建的tlab的大小 然后通过Universe::heap()->allocate_new_tlab创建新的tlab,然后在新的tlab中给对象分配内存并返回。

HeapWord* MemAllocator::allocate_inside_tlab_slow(Allocation& allocation) const {
  HeapWord* mem = NULL;
  ThreadLocalAllocBuffer& tlab = _thread->tlab();

  if (JvmtiExport::should_post_sampled_object_alloc()) {
    tlab.set_back_allocation_end();
    mem = tlab.allocate(_word_size);

    // We set back the allocation sample point to try to allocate this, reset it
    // when done.
    allocation._tlab_end_reset_for_sample = true;

    if (mem != NULL) {
      return mem;
    }
  }

  // Retain tlab and allocate object in shared space if
  // the amount free in the tlab is too large to discard.
  if (tlab.free() > tlab.refill_waste_limit()) {
    tlab.record_slow_allocation(_word_size);
    return NULL;
  }

  // Discard tlab and allocate a new one.
  // To minimize fragmentation, the last TLAB may be smaller than the rest.
  size_t new_tlab_size = tlab.compute_size(_word_size);

  tlab.retire_before_allocation();

  if (new_tlab_size == 0) {
    return NULL;
  }

  // Allocate a new TLAB requesting new_tlab_size. Any size
  // between minimal and new_tlab_size is accepted.
  size_t min_tlab_size = ThreadLocalAllocBuffer::compute_min_size(_word_size);
  mem = Universe::heap()->allocate_new_tlab(min_tlab_size, new_tlab_size, &allocation._allocated_tlab_size);
  if (mem == NULL) {
    assert(allocation._allocated_tlab_size == 0,
           "Allocation failed, but actual size was updated. min: " SIZE_FORMAT
           ", desired: " SIZE_FORMAT ", actual: " SIZE_FORMAT,
           min_tlab_size, new_tlab_size, allocation._allocated_tlab_size);
    return NULL;
  }
  assert(allocation._allocated_tlab_size != 0, "Allocation succeeded but actual size not updated. mem at: "
         PTR_FORMAT " min: " SIZE_FORMAT ", desired: " SIZE_FORMAT,
         p2i(mem), min_tlab_size, new_tlab_size);

  if (ZeroTLAB) {
    // ..and clear it.
    Copy::zero_to_words(mem, allocation._allocated_tlab_size);
  } else {
    // ...and zap just allocated object.
#ifdef ASSERT
    // Skip mangling the space corresponding to the object header to
    // ensure that the returned space is not considered parsable by
    // any concurrent GC thread.
    size_t hdr_size = oopDesc::header_size();
    Copy::fill_to_words(mem + hdr_size, allocation._allocated_tlab_size - hdr_size, badHeapWordVal);
#endif // ASSERT
  }

  tlab.fill(mem, mem + _word_size, allocation._allocated_tlab_size);
  return mem;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

在tlab.free()>tlab.refill_waste_limit()时,会调用record_slow_allocation调整refill_waste_limit并统计_slow_acclocations, 每次slow allocation(即tlab无法分配新对象的空间且free > refill_waste_limit),则给refill_waste_limit加上refill_waste_limit_increment。 避免TLAB出现卡在这里一直走out of TLAB的情况。

void ThreadLocalAllocBuffer::record_slow_allocation(size_t obj_size) {
  // Raise size required to bypass TLAB next time. Why? Else there's
  // a risk that a thread that repeatedly allocates objects of one
  // size will get stuck on this slow path.

  set_refill_waste_limit(refill_waste_limit() + refill_waste_limit_increment());

  _slow_allocations++;

  log_develop_trace(gc, tlab)("TLAB: %s thread: " INTPTR_FORMAT " [id: %2d]"
                              " obj: " SIZE_FORMAT
                              " free: " SIZE_FORMAT
                              " waste: " SIZE_FORMAT,
                              "slow", p2i(thread()), thread()->osthread()->thread_id(),
                              obj_size, free(), refill_waste_limit());
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

allocate_inside_tlab返回为NULL的时候,会调用allocate_outside_tlab在tlab之外分配内存。

HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
  if (UseTLAB) {
    HeapWord* result = allocate_inside_tlab(allocation);
    if (result != NULL) {
      return result;
    }
  }

  return allocate_outside_tlab(allocation);
}

HeapWord* MemAllocator::allocate_outside_tlab(Allocation& allocation) const {
  allocation._allocated_outside_tlab = true;
  HeapWord* mem = Universe::heap()->mem_allocate(_word_size, &allocation._overhead_limit_exceeded);
  if (mem == NULL) {
    return mem;
  }

  NOT_PRODUCT(Universe::heap()->check_for_non_bad_heap_word_value(mem, _word_size));
  size_t size_in_bytes = _word_size * HeapWordSize;
  _thread->incr_allocated_bytes(size_in_bytes);

  return mem;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

mem_allocate方法是在堆内存中分配内存的方法,对于g1的实现在G1CollectedHeap.cpp中,会先判断是否是Humongous巨型对象, 如果是进行巨型对象分配,巨型对象单独占用若干块连续region

HeapWord*
G1CollectedHeap::mem_allocate(size_t word_size,
                              bool*  gc_overhead_limit_was_exceeded) {
  assert_heap_not_locked_and_not_at_safepoint();

  if (is_humongous(word_size)) {
    return attempt_allocation_humongous(word_size);
  }
  size_t dummy = 0;
  return attempt_allocation(word_size, word_size, &dummy);
}
1
2
3
4
5
6
7
8
9
10
11

普通对象的分配,最终会调用到heapRegion.inline.hpp中,par_allocate_impl方法中会通过cas top指针的方式分配内存。

inline HeapWord* HeapRegion::par_allocate_impl(size_t min_word_size,
                                               size_t desired_word_size,
                                               size_t* actual_size) {
  do {
    HeapWord* obj = top();
    size_t available = pointer_delta(end(), obj);
    size_t want_to_allocate = MIN2(available, desired_word_size);
    if (want_to_allocate >= min_word_size) {
      HeapWord* new_top = obj + want_to_allocate;
      HeapWord* result = Atomic::cmpxchg(&_top, obj, new_top);
      // result can be one of two:
      //  the old top value: the exchange succeeded
      //  otherwise: the new value of the top is returned.
      if (result == obj) {
        assert(is_object_aligned(obj) && is_object_aligned(new_top), "checking alignment");
        *actual_size = want_to_allocate;
        return obj;
      }
    } else {
      return NULL;
    }
  } while (true);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

# 关于TLAB的大小size的计算

TLAB可以理解为线程每次从eden中获取的批量内存大小。为了在内存利用率和n

# TLAB结构定义

_start: 当前TLAB的地址起始位置 _top: 当前内存分配已经使用到的地方 _pf_top: prefetch watermark _end: 当前TLAB的地址结束位置 _allocation_end: 当前TLAB的地址结束位置 _desired_size: 目标大小 _refill_waste_limit: refill时waste的limit _allocated_before_last_gc: _bytes_since_last_sample_point:

_max_size: TLAB的最大大小 _reserve_for_allocation_prefetch: _target_refills:

_number_of_refills: _refill_waste: _gc_waste: _slow_allocations: _allocated_size: _allocation_fraction: 在eden分配内存时,分配中TLAB中的比例

class ThreadLocalAllocBuffer: public CHeapObj<mtThread> {
  friend class VMStructs;
  friend class JVMCIVMStructs;
private:
  HeapWord* _start;                              // address of TLAB
  HeapWord* _top;                                // address after last allocation
  HeapWord* _pf_top;                             // allocation prefetch watermark
  HeapWord* _end;                                // allocation end (can be the sampling end point or _allocation_end)
  HeapWord* _allocation_end;                     // end for allocations (actual TLAB end, excluding alignment_reserve)

  size_t    _desired_size;                       // desired size   (including alignment_reserve)
  size_t    _refill_waste_limit;                 // hold onto tlab if free() is larger than this
  size_t    _allocated_before_last_gc;           // total bytes allocated up until the last gc
  size_t    _bytes_since_last_sample_point;      // bytes since last sample point.

  static size_t   _max_size;                          // maximum size of any TLAB
  static int      _reserve_for_allocation_prefetch;   // Reserve at the end of the TLAB
  static unsigned _target_refills;                    // expected number of refills between GCs

  unsigned  _number_of_refills;
  unsigned  _refill_waste;
  unsigned  _gc_waste;
  unsigned  _slow_allocations;
  size_t    _allocated_size;

  AdaptiveWeightedAverage _allocation_fraction;  // fraction of eden allocated in tlabs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

# init size

# max size

# resize

# TLAB log查看

# 通过JFR监控TLAB

# 通过perf监控TLAB

# PLAB

# 总结

# 其他参考