# TLAB
TLAB(Thread Local Allocation Buffer)是jvm用来提升对象内存分配速度的机制。
TLAB和id生成器中很多地方又有类似的设计思想。
假如我们要设计一个内存分配器,给一个对象分配内存空间,可以通过一个top指针指向当前内存的位置,如果有对象要分配,则将 top加上对象的大小得到新的top值,则老的top到新的top之间的空间就分配给这个对象了。 但是在实际使用过程中,多线程都在申请内存,那么top的修改就会出现比较多的冲突,为了减少线程间的冲突,可以采用本地缓存的思想, 每个线程每次拿一大块内存,自己慢慢分配使用,用完了再来申请下一大块内存,由此就降低了线程间的冲突,提升了分配效率。 这就是Thread Local Allocation Buffer简称TLAB的核心思想。
在实际进行分配时,申请的TLAB的大小、TLAB什么时候更换新的,也是需要考虑的, TLAB太小,可能会出现对象频繁放不下,放不下我们只能回退到比较慢的到eden中去分配内存或者更换新的TLAB。 TLAB如果太大,线程比较多的情况下,可能会导致内存不够用。 TLAB什么时候更换新的也有讲究,TLAB最后剩下了一块内存,如果一直不更换新的TLAB,则可能会有大量的对象分配使用不了TLAB而降级到 避免慢的慢速分配,如果轻易就更换新的TLAB,则之前的TLAB中剩余的内存就浪费了,所以要在分配速度和内存浪费之间取得一个权衡,在jdk中 能够通过参数控制浪费的比例并且有自动调节机制。
# TLAB的配置
TLAB相关的配置可以通过java -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal -version | grep 'TLAB'
命令查看。
ResizeTLAB
# TLAB的实现
# UseTLAB
是否使用TLAB,默认为true。对于大多数应用,也不建议关闭。
# ResizeTLAB
是否动态调整TLAB的大小,默认为true。一般也不建议关闭,因为如果关闭,TLAB的大小固定,很考验TLAB大小的设置,而不能利用自动调整 机制中分配效率和内存浪费间进行权衡。
# MinTLABSize
TLAB的最小的大小,单位是bytes,默认2k。
# TLABSize
初始时TLAB的大小,默认值0,则
TLAB size size change
jvm word是什么含义
dummy object _word_size retire refill inside tlab outside tlab slow alloc fast alloc
TLAB保存、gc回收
和GC的关系
浪费的数量
# 关键概念、流程
对象内存分配过程
collectedHeap.inline.hpp中定义了在heap中创建对象申请内存的方法
inline oop CollectedHeap::obj_allocate(Klass* klass, size_t size, TRAPS) {
ObjAllocator allocator(klass, size, THREAD);
return allocator.allocate();
}
inline oop CollectedHeap::array_allocate(Klass* klass, size_t size, int length, bool do_zero, TRAPS) {
ObjArrayAllocator allocator(klass, size, length, do_zero, THREAD);
return allocator.allocate();
}
inline oop CollectedHeap::class_allocate(Klass* klass, size_t size, TRAPS) {
ClassAllocator allocator(klass, size, THREAD);
return allocator.allocate();
}
2
3
4
5
6
7
8
9
10
11
12
13
14
ObjAllocator等都继承与MemAllocator,调用memAllocator.cpp中的allocate方法负责创建对象 mem_allocate方法判断如果启用了UseTLAB,先调用allocate_inside_tlab尝试中tlab中申请, 如果申请失败,调用allocate_outside_tlab中tlab之外申请。
oop MemAllocator::allocate() const {
oop obj = NULL;
{
Allocation allocation(*this, &obj);
HeapWord* mem = mem_allocate(allocation);
if (mem != NULL) {
obj = initialize(mem);
} else {
// The unhandled oop detector will poison local variable obj,
// so reset it to NULL if mem is NULL.
obj = NULL;
}
}
return obj;
}
HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
if (UseTLAB) {
HeapWord* result = allocate_inside_tlab(allocation);
if (result != NULL) {
return result;
}
}
return allocate_outside_tlab(allocation);
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
allocate_inside_tlab方法先调用allocate_inside_tlab_fast尝试中当前tlab申请进行分配, 如果当前tlab剩余空间无法容纳要分配的对象大小,则返回NULL,否则分配成功返回。 分配失败后调用allocate_inside_tlab_slow,判断下是否可以更换当前的tlab,更换一块新的tlab内存, 如果更换成功,在新的里面分配对象,否则返回NULL。
HeapWord* MemAllocator::allocate_inside_tlab(Allocation& allocation) const {
assert(UseTLAB, "should use UseTLAB");
// Try allocating from an existing TLAB.
HeapWord* mem = allocate_inside_tlab_fast();
if (mem != NULL) {
return mem;
}
// Try refilling the TLAB and allocating the object in it.
return allocate_inside_tlab_slow(allocation);
}
2
3
4
5
6
7
8
9
10
11
12
allocate_inside_tlab_fast调用的是ThreadLocalAllocBuffer的allocate方法
HeapWord* MemAllocator::allocate_inside_tlab_fast() const {
return _thread->tlab().allocate(_word_size);
}
2
3
threadLocalAllocBuffer.inline.hpp中定义了allocate的实现,判断当前 end和top之间的差也就是当前剩余的内存空间,是否大于等于size,即能否放下新对象, 如果放不下,返回NULL。如果可以,则向分配中的空间填充badHeapWordVal(header不会填充),然后更新top值(加上size)。
inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
invariants();
HeapWord* obj = top();
if (pointer_delta(end(), obj) >= size) {
// successful thread-local allocation
#ifdef ASSERT
// Skip mangling the space corresponding to the object header to
// ensure that the returned space is not considered parsable by
// any concurrent GC thread.
size_t hdr_size = oopDesc::header_size();
Copy::fill_to_words(obj + hdr_size, size - hdr_size, badHeapWordVal);
#endif // ASSERT
// This addition is safe because we know that top is
// at least size below end, so the add can't wrap.
set_top(obj + size);
invariants();
return obj;
}
return NULL;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
allocate_inside_tlab_slow方法负责中tlab_fast失败之后(即当前剩余空间不够分配新对象),判断是否要替换tlab。
should_post_sampled_object_alloc负责处理jvmti的sample逻辑,我们暂时不用关心。
然后回判断tlab.free()是否大于tlab.refill_waste_limit(),refill_waste_limit是
当前更换tlab允许浪费的内存的一个limit限制,小于等于它才能替换tlab,否则直接返回NULL,refill_waste_limit值也会动态调整稍后讲到。
如果free小于等于refill_waste_limit,则通过tlab.compute_size(_word_size)
计算下一个要创建的tlab的大小
然后通过Universe::heap()->allocate_new_tlab创建新的tlab,然后在新的tlab中给对象分配内存并返回。
HeapWord* MemAllocator::allocate_inside_tlab_slow(Allocation& allocation) const {
HeapWord* mem = NULL;
ThreadLocalAllocBuffer& tlab = _thread->tlab();
if (JvmtiExport::should_post_sampled_object_alloc()) {
tlab.set_back_allocation_end();
mem = tlab.allocate(_word_size);
// We set back the allocation sample point to try to allocate this, reset it
// when done.
allocation._tlab_end_reset_for_sample = true;
if (mem != NULL) {
return mem;
}
}
// Retain tlab and allocate object in shared space if
// the amount free in the tlab is too large to discard.
if (tlab.free() > tlab.refill_waste_limit()) {
tlab.record_slow_allocation(_word_size);
return NULL;
}
// Discard tlab and allocate a new one.
// To minimize fragmentation, the last TLAB may be smaller than the rest.
size_t new_tlab_size = tlab.compute_size(_word_size);
tlab.retire_before_allocation();
if (new_tlab_size == 0) {
return NULL;
}
// Allocate a new TLAB requesting new_tlab_size. Any size
// between minimal and new_tlab_size is accepted.
size_t min_tlab_size = ThreadLocalAllocBuffer::compute_min_size(_word_size);
mem = Universe::heap()->allocate_new_tlab(min_tlab_size, new_tlab_size, &allocation._allocated_tlab_size);
if (mem == NULL) {
assert(allocation._allocated_tlab_size == 0,
"Allocation failed, but actual size was updated. min: " SIZE_FORMAT
", desired: " SIZE_FORMAT ", actual: " SIZE_FORMAT,
min_tlab_size, new_tlab_size, allocation._allocated_tlab_size);
return NULL;
}
assert(allocation._allocated_tlab_size != 0, "Allocation succeeded but actual size not updated. mem at: "
PTR_FORMAT " min: " SIZE_FORMAT ", desired: " SIZE_FORMAT,
p2i(mem), min_tlab_size, new_tlab_size);
if (ZeroTLAB) {
// ..and clear it.
Copy::zero_to_words(mem, allocation._allocated_tlab_size);
} else {
// ...and zap just allocated object.
#ifdef ASSERT
// Skip mangling the space corresponding to the object header to
// ensure that the returned space is not considered parsable by
// any concurrent GC thread.
size_t hdr_size = oopDesc::header_size();
Copy::fill_to_words(mem + hdr_size, allocation._allocated_tlab_size - hdr_size, badHeapWordVal);
#endif // ASSERT
}
tlab.fill(mem, mem + _word_size, allocation._allocated_tlab_size);
return mem;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
在tlab.free()>tlab.refill_waste_limit()时,会调用record_slow_allocation调整refill_waste_limit并统计_slow_acclocations, 每次slow allocation(即tlab无法分配新对象的空间且free > refill_waste_limit),则给refill_waste_limit加上refill_waste_limit_increment。 避免TLAB出现卡在这里一直走out of TLAB的情况。
void ThreadLocalAllocBuffer::record_slow_allocation(size_t obj_size) {
// Raise size required to bypass TLAB next time. Why? Else there's
// a risk that a thread that repeatedly allocates objects of one
// size will get stuck on this slow path.
set_refill_waste_limit(refill_waste_limit() + refill_waste_limit_increment());
_slow_allocations++;
log_develop_trace(gc, tlab)("TLAB: %s thread: " INTPTR_FORMAT " [id: %2d]"
" obj: " SIZE_FORMAT
" free: " SIZE_FORMAT
" waste: " SIZE_FORMAT,
"slow", p2i(thread()), thread()->osthread()->thread_id(),
obj_size, free(), refill_waste_limit());
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
allocate_inside_tlab返回为NULL的时候,会调用allocate_outside_tlab在tlab之外分配内存。
HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
if (UseTLAB) {
HeapWord* result = allocate_inside_tlab(allocation);
if (result != NULL) {
return result;
}
}
return allocate_outside_tlab(allocation);
}
HeapWord* MemAllocator::allocate_outside_tlab(Allocation& allocation) const {
allocation._allocated_outside_tlab = true;
HeapWord* mem = Universe::heap()->mem_allocate(_word_size, &allocation._overhead_limit_exceeded);
if (mem == NULL) {
return mem;
}
NOT_PRODUCT(Universe::heap()->check_for_non_bad_heap_word_value(mem, _word_size));
size_t size_in_bytes = _word_size * HeapWordSize;
_thread->incr_allocated_bytes(size_in_bytes);
return mem;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mem_allocate方法是在堆内存中分配内存的方法,对于g1的实现在G1CollectedHeap.cpp中,会先判断是否是Humongous巨型对象, 如果是进行巨型对象分配,巨型对象单独占用若干块连续region
HeapWord*
G1CollectedHeap::mem_allocate(size_t word_size,
bool* gc_overhead_limit_was_exceeded) {
assert_heap_not_locked_and_not_at_safepoint();
if (is_humongous(word_size)) {
return attempt_allocation_humongous(word_size);
}
size_t dummy = 0;
return attempt_allocation(word_size, word_size, &dummy);
}
2
3
4
5
6
7
8
9
10
11
普通对象的分配,最终会调用到heapRegion.inline.hpp中,par_allocate_impl方法中会通过cas top指针的方式分配内存。
inline HeapWord* HeapRegion::par_allocate_impl(size_t min_word_size,
size_t desired_word_size,
size_t* actual_size) {
do {
HeapWord* obj = top();
size_t available = pointer_delta(end(), obj);
size_t want_to_allocate = MIN2(available, desired_word_size);
if (want_to_allocate >= min_word_size) {
HeapWord* new_top = obj + want_to_allocate;
HeapWord* result = Atomic::cmpxchg(&_top, obj, new_top);
// result can be one of two:
// the old top value: the exchange succeeded
// otherwise: the new value of the top is returned.
if (result == obj) {
assert(is_object_aligned(obj) && is_object_aligned(new_top), "checking alignment");
*actual_size = want_to_allocate;
return obj;
}
} else {
return NULL;
}
} while (true);
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 关于TLAB的大小size的计算
TLAB可以理解为线程每次从eden中获取的批量内存大小。为了在内存利用率和n
# TLAB结构定义
_start: 当前TLAB的地址起始位置 _top: 当前内存分配已经使用到的地方 _pf_top: prefetch watermark _end: 当前TLAB的地址结束位置 _allocation_end: 当前TLAB的地址结束位置 _desired_size: 目标大小 _refill_waste_limit: refill时waste的limit _allocated_before_last_gc: _bytes_since_last_sample_point:
_max_size: TLAB的最大大小 _reserve_for_allocation_prefetch: _target_refills:
_number_of_refills: _refill_waste: _gc_waste: _slow_allocations: _allocated_size: _allocation_fraction: 在eden分配内存时,分配中TLAB中的比例
class ThreadLocalAllocBuffer: public CHeapObj<mtThread> {
friend class VMStructs;
friend class JVMCIVMStructs;
private:
HeapWord* _start; // address of TLAB
HeapWord* _top; // address after last allocation
HeapWord* _pf_top; // allocation prefetch watermark
HeapWord* _end; // allocation end (can be the sampling end point or _allocation_end)
HeapWord* _allocation_end; // end for allocations (actual TLAB end, excluding alignment_reserve)
size_t _desired_size; // desired size (including alignment_reserve)
size_t _refill_waste_limit; // hold onto tlab if free() is larger than this
size_t _allocated_before_last_gc; // total bytes allocated up until the last gc
size_t _bytes_since_last_sample_point; // bytes since last sample point.
static size_t _max_size; // maximum size of any TLAB
static int _reserve_for_allocation_prefetch; // Reserve at the end of the TLAB
static unsigned _target_refills; // expected number of refills between GCs
unsigned _number_of_refills;
unsigned _refill_waste;
unsigned _gc_waste;
unsigned _slow_allocations;
size_t _allocated_size;
AdaptiveWeightedAverage _allocation_fraction; // fraction of eden allocated in tlabs
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27