# jstack实现原理分析
jstack是大家经常使用的一个工具,通常用来观察服务状态、检查是否有死锁、查看热点代码等。但是在大家使用之后,是否有思考过jdk是如何实现这个命令的呢?
# jstack使用
Prints Java thread stack traces for a Java process, core file, or remote debug server.
jstack [ options ] pid
-F
Force a stack dump when jstack [-l] pid does not respond.
-l
Long listing. Prints additional information about locks such as a list of owned java.util.concurrent ownable synchronizers. See the AbstractOwnableSynchronizer class description at
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/AbstractOwnableSynchronizer.html
-m
Prints a mixed mode stack trace that has both Java and native C/C++ frames.
2
3
4
5
6
7
8
例如 jstack -l 900
# jstack如何实现的
# jstack入口实现
入口sun.tools.jstack.JStack 如果没有加-F参数 VirtualMachine.attach到目标进程后,调用HotSpotVirtualMachine.remoteDataDump方法,remoteDataDump方法调用了executeCommand("threaddump", ...)方法
// Attach to pid and perform a thread dump
private static void runThreadDump(String pid, String args[]) throws Exception {
VirtualMachine vm = null;
try {
vm = VirtualMachine.attach(pid);
} catch (Exception x) {
String msg = x.getMessage();
if (msg != null) {
System.err.println(pid + ": " + msg);
} else {
x.printStackTrace();
}
System.exit(1);
}
// Cast to HotSpotVirtualMachine as this is implementation specific
// method.
InputStream in = ((HotSpotVirtualMachine)vm).remoteDataDump((Object[])args);
// read to EOF and just print output
PrintStreamPrinter.drainUTF8(in, System.out);
vm.detach();
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public InputStream remoteDataDump(Object... var1) throws IOException {
return this.executeCommand("threaddump", var1);
}
2
3
# executeCommand如何实现的
execute的实现在VirtualMachineImpl中,每个平台有一个单独的实现,以linux为例。(src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java中) VirtualMachineImpl的构造函数如下。
- findSocketFile找到目标进程的socket文件,如果没有则通过发送QUIT信息触发创建
- 创建一个unix socket并连接上,提前检查验证权限等问题
VirtualMachineImpl(AttachProvider provider, String vmid)
throws AttachNotSupportedException, IOException
{
super(provider, vmid);
// This provider only understands pids
int pid;
try {
pid = Integer.parseInt(vmid);
if (pid < 1) {
throw new NumberFormatException();
}
} catch (NumberFormatException x) {
throw new AttachNotSupportedException("Invalid process identifier: " + vmid);
}
// Try to resolve to the "inner most" pid namespace
int ns_pid = getNamespacePid(pid);
// Find the socket file. If not found then we attempt to start the
// attach mechanism in the target VM by sending it a QUIT signal.
// Then we attempt to find the socket file again.
File socket_file = findSocketFile(pid, ns_pid);
socket_path = socket_file.getPath();
if (!socket_file.exists()) {
File f = createAttachFile(pid, ns_pid);
try {
sendQuitTo(pid);
// give the target VM time to start the attach mechanism
final int delay_step = 100;
final long timeout = attachTimeout();
long time_spend = 0;
long delay = 0;
do {
// Increase timeout on each attempt to reduce polling
delay += delay_step;
try {
Thread.sleep(delay);
} catch (InterruptedException x) { }
time_spend += delay;
if (time_spend > timeout/2 && !socket_file.exists()) {
// Send QUIT again to give target VM the last chance to react
sendQuitTo(pid);
}
} while (time_spend <= timeout && !socket_file.exists());
if (!socket_file.exists()) {
throw new AttachNotSupportedException(
String.format("Unable to open socket file %s: " +
"target process %d doesn't respond within %dms " +
"or HotSpot VM not loaded", socket_path, pid,
time_spend));
}
} finally {
f.delete();
}
}
// Check that the file owner/permission to avoid attaching to
// bogus process
checkPermissions(socket_path);
// Check that we can connect to the process
// - this ensures we throw the permission denied error now rather than
// later when we attempt to enqueue a command.
int s = socket();
try {
connect(s, socket_path);
} finally {
close(s);
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# execute如何实现的
- 再次创建unix socket并连接,socket file文件为 /tmp/.java_pid$pid,例如 /tmp/.java_pid123124
- 通过类似RPC的方式通讯,在socket上发送请求读取结果
/**
* Execute the given command in the target VM.
*/
InputStream execute(String cmd, Object ... args) throws AgentLoadException, IOException {
assert args.length <= 3; // includes null
// did we detach?
synchronized (this) {
if (socket_path == null) {
throw new IOException("Detached from target VM");
}
}
// create UNIX socket
int s = socket();
// connect to target VM
try {
connect(s, socket_path);
} catch (IOException x) {
close(s);
throw x;
}
IOException ioe = null;
// connected - write request
// <ver> <cmd> <args...>
try {
writeString(s, PROTOCOL_VERSION);
writeString(s, cmd);
for (int i = 0; i < 3; i++) {
if (i < args.length && args[i] != null) {
writeString(s, (String)args[i]);
} else {
writeString(s, "");
}
}
} catch (IOException x) {
ioe = x;
}
// Create an input stream to read reply
SocketInputStream sis = new SocketInputStream(s);
// Read the command completion status
int completionStatus;
try {
completionStatus = readInt(sis);
} catch (IOException x) {
sis.close();
if (ioe != null) {
throw ioe;
} else {
throw x;
}
}
if (completionStatus != 0) {
// read from the stream and use that as the error message
String message = readErrorMessage(sis);
sis.close();
// In the event of a protocol mismatch then the target VM
// returns a known error so that we can throw a reasonable
// error.
if (completionStatus == ATTACH_ERROR_BADVERSION) {
throw new IOException("Protocol mismatch with target VM");
}
// Special-case the "load" command so that the right exception is
// thrown.
if (cmd.equals("load")) {
String msg = "Failed to load agent library";
if (!message.isEmpty())
msg += ": " + message;
throw new AgentLoadException(msg);
} else {
if (message.isEmpty())
message = "Command failed in target VM";
throw new AttachOperationFailedException(message);
}
}
// Return the input stream so that the command output can be read
return sis;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# jvm内部是如何与jstack客户端通信的
jvm中有一个Attach Listener监听attach事件
"Attach Listener" #185398 daemon prio=9 os_prio=0 tid=0x00007fb11c15b8d0 nid=0xd9f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
2
不过这个Attach Listener默认是lazy启动的(除非加了+ReduseSignalUsage参数),当我们第一次attach的时候,通过上面提到的QUIT信号触发JVM启动Attach Listener。 jvm启动创建的时候,会启动一个线程,来监听SIGBREAK等信号事件
Threads::create_vm
// Signal Dispatcher needs to be started before VMInit event is posted
os::initialize_jdk_signal_support(CHECK_JNI_ERR);
2
3
接收到信号时,如果AttachListener没有初始化好,则会触发初始化
初始化调用栈 signal_thread_entry -> AttachListener::is_init_trigger -> AttachListener::init
AttachListener会创建AttachListener线程,监听一个socket请求队列,接收请求调用对应方法处理,返回结果。
- AttachListener::dequeue()解析出AttachOperation
- 通过AttachOperation.name()得到操作名称,例如对于jstack就是threaddump
- 操作对应的函数保存在一个二维数组中,找到对应的函数后执行,返回结果
static void attach_listener_thread_entry(JavaThread* thread, TRAPS) {
os::set_priority(thread, NearMaxPriority);
assert(thread == Thread::current(), "Must be");
assert(thread->stack_base() != NULL && thread->stack_size() > 0,
"Should already be setup");
if (AttachListener::pd_init() != 0) {
AttachListener::set_state(AL_NOT_INITIALIZED);
return;
}
AttachListener::set_initialized();
for (;;) {
AttachOperation* op = AttachListener::dequeue();
if (op == NULL) {
AttachListener::set_state(AL_NOT_INITIALIZED);
return; // dequeue failed or shutdown
}
ResourceMark rm;
bufferedStream st;
jint res = JNI_OK;
// handle special detachall operation
if (strcmp(op->name(), AttachOperation::detachall_operation_name()) == 0) {
AttachListener::detachall();
} else if (!EnableDynamicAgentLoading && strcmp(op->name(), "load") == 0) {
st.print("Dynamic agent loading is not enabled. "
"Use -XX:+EnableDynamicAgentLoading to launch target VM.");
res = JNI_ERR;
} else {
// find the function to dispatch too
AttachOperationFunctionInfo* info = NULL;
for (int i=0; funcs[i].name != NULL; i++) {
const char* name = funcs[i].name;
assert(strlen(name) <= AttachOperation::name_length_max, "operation <= name_length_max");
if (strcmp(op->name(), name) == 0) {
info = &(funcs[i]);
break;
}
}
// check for platform dependent attach operation
if (info == NULL) {
info = AttachListener::pd_find_operation(op->name());
}
if (info != NULL) {
// dispatch to the function that implements this operation
res = (info->func)(op, &st);
} else {
st.print("Operation %s not recognized!", op->name());
res = JNI_ERR;
}
}
// operation complete - send result and output to client
op->complete(res, &st);
}
ShouldNotReachHere();
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
static AttachOperationFunctionInfo funcs[] = {
{ "agentProperties", get_agent_properties },
{ "datadump", data_dump },
{ "dumpheap", dump_heap },
{ "load", load_agent },
{ "properties", get_system_properties },
{ "threaddump", thread_dump },
{ "inspectheap", heap_inspection },
{ "setflag", set_flag },
{ "printflag", print_flag },
{ "jcmd", jcmd },
{ NULL, NULL }
};
2
3
4
5
6
7
8
9
10
11
12
13
14
# 再看下"threaddump"对应的函数实现
- 处理-l -e参数
- 调用VMThread::execute(&op)来打印线程栈、死锁检测等
// Implementation of "threaddump" command - essentially a remote ctrl-break
// See also: ThreadDumpDCmd class
//
static jint thread_dump(AttachOperation* op, outputStream* out) {
bool print_concurrent_locks = false;
bool print_extended_info = false;
if (op->arg(0) != NULL) {
for (int i = 0; op->arg(0)[i] != 0; ++i) {
if (op->arg(0)[i] == 'l') {
print_concurrent_locks = true;
}
if (op->arg(0)[i] == 'e') {
print_extended_info = true;
}
}
}
// thread stacks
VM_PrintThreads op1(out, print_concurrent_locks, print_extended_info);
VMThread::execute(&op1);
// JNI global handles
VM_PrintJNI op2(out);
VMThread::execute(&op2);
// Deadlock detection
VM_FindDeadlocks op3(out);
VMThread::execute(&op3);
return JNI_OK;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# VMThread里是如何执行的
VMThread中保存一个操作队列,调用VMThread::execute就是向队列中添加VM_Operation对象,VMThread不断轮询操作队列执行操作。 很大一部分VM_Operation都是需要在safepooint内执行的(更熟悉的名字是StopTheWord),这里的threaddump就是。 如果需要在safepoint内执行,则需要先触发VM进入safepoint。
void VMThread::loop() {
SafepointSynchronize::init(_vm_thread);
while(true) {
VM_Operation* safepoint_ops = NULL;
//
// Wait for VM operation
//
...
while (!should_terminate() && _cur_vm_operation == NULL) {
// wait with a timeout to guarantee safepoints at regular intervals
// (if there is cleanup work to do)
(void)mu_queue.wait(GuaranteedSafepointInterval);
...
//
// Execute VM operation
//
{ HandleMark hm(VMThread::vm_thread());
// If we are at a safepoint we will evaluate all the operations that
// follow that also require a safepoint
if (_cur_vm_operation->evaluate_at_safepoint()) {
log_debug(vmthread)("Evaluating safepoint VM operation: %s", _cur_vm_operation->name());
_vm_queue->set_drain_list(safepoint_ops); // ensure ops can be scanned
SafepointSynchronize::begin();
if (_timeout_task != NULL) {
_timeout_task->arm();
}
evaluate_operation(_cur_vm_operation);
// now process all queued safepoint ops, iteratively draining
// the queue until there are none left
do {
_cur_vm_operation = safepoint_ops;
if (_cur_vm_operation != NULL) {
do {
EventMark em("Executing coalesced safepoint VM operation: %s", _cur_vm_operation->name());
log_debug(vmthread)("Evaluating coalesced safepoint VM operation: %s", _cur_vm_operation->name());
// evaluate_operation deletes the op object so we have
// to grab the next op now
VM_Operation* next = _cur_vm_operation->next();
_vm_queue->set_drain_list(next);
evaluate_operation(_cur_vm_operation);
_cur_vm_operation = next;
_coalesced_count++;
} while (_cur_vm_operation != NULL);
}
if (_vm_queue->peek_at_safepoint_priority()) {
// must hold lock while draining queue
MutexLocker mu_queue(VMOperationQueue_lock,
Mutex::_no_safepoint_check_flag);
safepoint_ops = _vm_queue->drain_at_safepoint_priority();
} else {
safepoint_ops = NULL;
}
} while(safepoint_ops != NULL);
_vm_queue->set_drain_list(NULL);
_cur_vm_operation = NULL;
}
}
//
// Notify (potential) waiting Java thread(s)
{ MonitorLocker mu(VMOperationRequest_lock, Mutex::_no_safepoint_check_flag);
mu.notify_all();
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
threaddump方法的具体实现在vmOperations.cpp内void VM_ThreadDump::doit()中 获取到所有的线程列表,挨个打印快照信息
void VM_ThreadDump::doit() {
ResourceMark rm;
_result->set_t_list();
ConcurrentLocksDump concurrent_locks(true);
if (_with_locked_synchronizers) {
concurrent_locks.dump_at_safepoint();
}
if (_num_threads == 0) {
// Snapshot all live threads
for (uint i = 0; i < _result->t_list()->length(); i++) {
JavaThread* jt = _result->t_list()->thread_at(i);
if (jt->is_exiting() ||
jt->is_hidden_from_external_view()) {
// skip terminating threads and hidden threads
continue;
}
ThreadConcurrentLocks* tcl = NULL;
if (_with_locked_synchronizers) {
tcl = concurrent_locks.thread_concurrent_locks(jt);
}
snapshot_thread(jt, tcl);
}
}
...
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# 如果jstack加上了-F参数
jdk8中会使用sa-jdi(service ability)工具获取,这个运行机制略有不同,借助的是SA的功能。
# jstack命令失败的一些情况
Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
2
- /tmp/.java_pidNNN文件被删。比如有一些定期清理脚本
- JVM启动设置了-XX:+DisableAttachMechanism
- 目标进程太忙,没法进入safepoint(例如长时间的gc停顿)
# 还得到了什么启发
threaddump会导致进入safepoint,频繁safepoint会影响进程吞吐。 触发threaddump的入口有jstack,java.lang.Thread#getAllStackTraces, java.lang.management.ThreadMXBean#getAllThreadIds等
# Resources
- https://github.com/jvm-profiling-tools/async-profiler/blob/master/README.md troubleshooting
- https://hongjiang.info/jdk1-6-0-24-jstack-unavailable/