# jstack实现原理分析

jstack是大家经常使用的一个工具,通常用来观察服务状态、检查是否有死锁、查看热点代码等。但是在大家使用之后,是否有思考过jdk是如何实现这个命令的呢?

# jstack使用

Prints Java thread stack traces for a Java process, core file, or remote debug server.
1
jstack [ options ] pid
-F
        Force a stack dump when jstack [-l] pid does not respond.
-l
        Long listing. Prints additional information about locks such as a list of owned java.util.concurrent ownable synchronizers. See the AbstractOwnableSynchronizer class description at
        http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/AbstractOwnableSynchronizer.html
-m
        Prints a mixed mode stack trace that has both Java and native C/C++ frames.
1
2
3
4
5
6
7
8

例如 jstack -l 900

# jstack如何实现的

# jstack入口实现

入口sun.tools.jstack.JStack 如果没有加-F参数 VirtualMachine.attach到目标进程后,调用HotSpotVirtualMachine.remoteDataDump方法,remoteDataDump方法调用了executeCommand("threaddump", ...)方法

// Attach to pid and perform a thread dump
private static void runThreadDump(String pid, String args[]) throws Exception {
    VirtualMachine vm = null;
    try {
        vm = VirtualMachine.attach(pid);
    } catch (Exception x) {
        String msg = x.getMessage();
        if (msg != null) {
            System.err.println(pid + ": " + msg);
        } else {
            x.printStackTrace();
        }
        System.exit(1);
    }

    // Cast to HotSpotVirtualMachine as this is implementation specific
    // method.
    InputStream in = ((HotSpotVirtualMachine)vm).remoteDataDump((Object[])args);
    // read to EOF and just print output
    PrintStreamPrinter.drainUTF8(in, System.out);
    vm.detach();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public InputStream remoteDataDump(Object... var1) throws IOException {
    return this.executeCommand("threaddump", var1);
}
1
2
3

# executeCommand如何实现的

execute的实现在VirtualMachineImpl中,每个平台有一个单独的实现,以linux为例。(src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java中) VirtualMachineImpl的构造函数如下。

  1. findSocketFile找到目标进程的socket文件,如果没有则通过发送QUIT信息触发创建
  2. 创建一个unix socket并连接上,提前检查验证权限等问题
VirtualMachineImpl(AttachProvider provider, String vmid)
        throws AttachNotSupportedException, IOException
{
    super(provider, vmid);

    // This provider only understands pids
    int pid;
    try {
        pid = Integer.parseInt(vmid);
        if (pid < 1) {
            throw new NumberFormatException();
        }
    } catch (NumberFormatException x) {
        throw new AttachNotSupportedException("Invalid process identifier: " + vmid);
    }

    // Try to resolve to the "inner most" pid namespace
    int ns_pid = getNamespacePid(pid);

    // Find the socket file. If not found then we attempt to start the
    // attach mechanism in the target VM by sending it a QUIT signal.
    // Then we attempt to find the socket file again.
    File socket_file = findSocketFile(pid, ns_pid);
    socket_path = socket_file.getPath();
    if (!socket_file.exists()) {
        File f = createAttachFile(pid, ns_pid);
        try {
            sendQuitTo(pid);

            // give the target VM time to start the attach mechanism
            final int delay_step = 100;
            final long timeout = attachTimeout();
            long time_spend = 0;
            long delay = 0;
            do {
                // Increase timeout on each attempt to reduce polling
                delay += delay_step;
                try {
                    Thread.sleep(delay);
                } catch (InterruptedException x) { }

                time_spend += delay;
                if (time_spend > timeout/2 && !socket_file.exists()) {
                    // Send QUIT again to give target VM the last chance to react
                    sendQuitTo(pid);
                }
            } while (time_spend <= timeout && !socket_file.exists());
            if (!socket_file.exists()) {
                throw new AttachNotSupportedException(
                    String.format("Unable to open socket file %s: " +
                        "target process %d doesn't respond within %dms " +
                        "or HotSpot VM not loaded", socket_path, pid,
                                    time_spend));
            }
        } finally {
            f.delete();
        }
    }

    // Check that the file owner/permission to avoid attaching to
    // bogus process
    checkPermissions(socket_path);

    // Check that we can connect to the process
    // - this ensures we throw the permission denied error now rather than
    // later when we attempt to enqueue a command.
    int s = socket();
    try {
        connect(s, socket_path);
    } finally {
        close(s);
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

# execute如何实现的

  1. 再次创建unix socket并连接,socket file文件为 /tmp/.java_pid$pid,例如 /tmp/.java_pid123124
  2. 通过类似RPC的方式通讯,在socket上发送请求读取结果
/**
* Execute the given command in the target VM.
*/
InputStream execute(String cmd, Object ... args) throws AgentLoadException, IOException {
    assert args.length <= 3;                // includes null

    // did we detach?
    synchronized (this) {
        if (socket_path == null) {
            throw new IOException("Detached from target VM");
        }
    }

    // create UNIX socket
    int s = socket();

    // connect to target VM
    try {
        connect(s, socket_path);
    } catch (IOException x) {
        close(s);
        throw x;
    }

    IOException ioe = null;

    // connected - write request
    // <ver> <cmd> <args...>
    try {
        writeString(s, PROTOCOL_VERSION);
        writeString(s, cmd);

        for (int i = 0; i < 3; i++) {
            if (i < args.length && args[i] != null) {
                writeString(s, (String)args[i]);
            } else {
                writeString(s, "");
            }
        }
    } catch (IOException x) {
        ioe = x;
    }


    // Create an input stream to read reply
    SocketInputStream sis = new SocketInputStream(s);

    // Read the command completion status
    int completionStatus;
    try {
        completionStatus = readInt(sis);
    } catch (IOException x) {
        sis.close();
        if (ioe != null) {
            throw ioe;
        } else {
            throw x;
        }
    }

    if (completionStatus != 0) {
        // read from the stream and use that as the error message
        String message = readErrorMessage(sis);
        sis.close();

        // In the event of a protocol mismatch then the target VM
        // returns a known error so that we can throw a reasonable
        // error.
        if (completionStatus == ATTACH_ERROR_BADVERSION) {
            throw new IOException("Protocol mismatch with target VM");
        }

        // Special-case the "load" command so that the right exception is
        // thrown.
        if (cmd.equals("load")) {
            String msg = "Failed to load agent library";
            if (!message.isEmpty())
                msg += ": " + message;
            throw new AgentLoadException(msg);
        } else {
            if (message.isEmpty())
                message = "Command failed in target VM";
            throw new AttachOperationFailedException(message);
        }
    }

    // Return the input stream so that the command output can be read
    return sis;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89

# jvm内部是如何与jstack客户端通信的

jvm中有一个Attach Listener监听attach事件

"Attach Listener" #185398 daemon prio=9 os_prio=0 tid=0x00007fb11c15b8d0 nid=0xd9f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
1
2

不过这个Attach Listener默认是lazy启动的(除非加了+ReduseSignalUsage参数),当我们第一次attach的时候,通过上面提到的QUIT信号触发JVM启动Attach Listener。 jvm启动创建的时候,会启动一个线程,来监听SIGBREAK等信号事件

Threads::create_vm
// Signal Dispatcher needs to be started before VMInit event is posted
os::initialize_jdk_signal_support(CHECK_JNI_ERR);
1
2
3

接收到信号时,如果AttachListener没有初始化好,则会触发初始化

初始化调用栈 signal_thread_entry -> AttachListener::is_init_trigger -> AttachListener::init

AttachListener会创建AttachListener线程,监听一个socket请求队列,接收请求调用对应方法处理,返回结果。

  1. AttachListener::dequeue()解析出AttachOperation
  2. 通过AttachOperation.name()得到操作名称,例如对于jstack就是threaddump
  3. 操作对应的函数保存在一个二维数组中,找到对应的函数后执行,返回结果
static void attach_listener_thread_entry(JavaThread* thread, TRAPS) {
  os::set_priority(thread, NearMaxPriority);

  assert(thread == Thread::current(), "Must be");
  assert(thread->stack_base() != NULL && thread->stack_size() > 0,
         "Should already be setup");

  if (AttachListener::pd_init() != 0) {
    AttachListener::set_state(AL_NOT_INITIALIZED);
    return;
  }
  AttachListener::set_initialized();

  for (;;) {
    AttachOperation* op = AttachListener::dequeue();
    if (op == NULL) {
      AttachListener::set_state(AL_NOT_INITIALIZED);
      return;   // dequeue failed or shutdown
    }

    ResourceMark rm;
    bufferedStream st;
    jint res = JNI_OK;

    // handle special detachall operation
    if (strcmp(op->name(), AttachOperation::detachall_operation_name()) == 0) {
      AttachListener::detachall();
    } else if (!EnableDynamicAgentLoading && strcmp(op->name(), "load") == 0) {
      st.print("Dynamic agent loading is not enabled. "
               "Use -XX:+EnableDynamicAgentLoading to launch target VM.");
      res = JNI_ERR;
    } else {
      // find the function to dispatch too
      AttachOperationFunctionInfo* info = NULL;
      for (int i=0; funcs[i].name != NULL; i++) {
        const char* name = funcs[i].name;
        assert(strlen(name) <= AttachOperation::name_length_max, "operation <= name_length_max");
        if (strcmp(op->name(), name) == 0) {
          info = &(funcs[i]);
          break;
        }
      }

      // check for platform dependent attach operation
      if (info == NULL) {
        info = AttachListener::pd_find_operation(op->name());
      }

      if (info != NULL) {
        // dispatch to the function that implements this operation
        res = (info->func)(op, &st);
      } else {
        st.print("Operation %s not recognized!", op->name());
        res = JNI_ERR;
      }
    }

    // operation complete - send result and output to client
    op->complete(res, &st);
  }

  ShouldNotReachHere();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
static AttachOperationFunctionInfo funcs[] = {
  { "agentProperties",  get_agent_properties },
  { "datadump",         data_dump },
  { "dumpheap",         dump_heap },
  { "load",             load_agent },
  { "properties",       get_system_properties },
  { "threaddump",       thread_dump },
  { "inspectheap",      heap_inspection },
  { "setflag",          set_flag },
  { "printflag",        print_flag },
  { "jcmd",             jcmd },
  { NULL,               NULL }
};

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# 再看下"threaddump"对应的函数实现

  1. 处理-l -e参数
  2. 调用VMThread::execute(&op)来打印线程栈、死锁检测等
// Implementation of "threaddump" command - essentially a remote ctrl-break
// See also: ThreadDumpDCmd class
//
static jint thread_dump(AttachOperation* op, outputStream* out) {
  bool print_concurrent_locks = false;
  bool print_extended_info = false;
  if (op->arg(0) != NULL) {
    for (int i = 0; op->arg(0)[i] != 0; ++i) {
      if (op->arg(0)[i] == 'l') {
        print_concurrent_locks = true;
      }
      if (op->arg(0)[i] == 'e') {
        print_extended_info = true;
      }
    }
  }

  // thread stacks
  VM_PrintThreads op1(out, print_concurrent_locks, print_extended_info);
  VMThread::execute(&op1);

  // JNI global handles
  VM_PrintJNI op2(out);
  VMThread::execute(&op2);

  // Deadlock detection
  VM_FindDeadlocks op3(out);
  VMThread::execute(&op3);

  return JNI_OK;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

# VMThread里是如何执行的

VMThread中保存一个操作队列,调用VMThread::execute就是向队列中添加VM_Operation对象,VMThread不断轮询操作队列执行操作。 很大一部分VM_Operation都是需要在safepooint内执行的(更熟悉的名字是StopTheWord),这里的threaddump就是。 如果需要在safepoint内执行,则需要先触发VM进入safepoint。

void VMThread::loop() {
    SafepointSynchronize::init(_vm_thread);
    while(true) {
    VM_Operation* safepoint_ops = NULL;
    //
    // Wait for VM operation
    //
    ...
      while (!should_terminate() && _cur_vm_operation == NULL) {
        // wait with a timeout to guarantee safepoints at regular intervals
        // (if there is cleanup work to do)
        (void)mu_queue.wait(GuaranteedSafepointInterval);
        ...
    //
    // Execute VM operation
    //
    { HandleMark hm(VMThread::vm_thread());
      // If we are at a safepoint we will evaluate all the operations that
      // follow that also require a safepoint
      if (_cur_vm_operation->evaluate_at_safepoint()) {
        log_debug(vmthread)("Evaluating safepoint VM operation: %s", _cur_vm_operation->name());

        _vm_queue->set_drain_list(safepoint_ops); // ensure ops can be scanned

        SafepointSynchronize::begin();

        if (_timeout_task != NULL) {
          _timeout_task->arm();
        }

        evaluate_operation(_cur_vm_operation);
        // now process all queued safepoint ops, iteratively draining
        // the queue until there are none left
        do {
          _cur_vm_operation = safepoint_ops;
          if (_cur_vm_operation != NULL) {
            do {
              EventMark em("Executing coalesced safepoint VM operation: %s", _cur_vm_operation->name());
              log_debug(vmthread)("Evaluating coalesced safepoint VM operation: %s", _cur_vm_operation->name());
              // evaluate_operation deletes the op object so we have
              // to grab the next op now
              VM_Operation* next = _cur_vm_operation->next();
              _vm_queue->set_drain_list(next);
              evaluate_operation(_cur_vm_operation);
              _cur_vm_operation = next;
              _coalesced_count++;
            } while (_cur_vm_operation != NULL);
          }
          if (_vm_queue->peek_at_safepoint_priority()) {
            // must hold lock while draining queue
            MutexLocker mu_queue(VMOperationQueue_lock,
                                 Mutex::_no_safepoint_check_flag);
            safepoint_ops = _vm_queue->drain_at_safepoint_priority();
          } else {
            safepoint_ops = NULL;
          }
        } while(safepoint_ops != NULL);

        _vm_queue->set_drain_list(NULL);

        _cur_vm_operation = NULL;
      }
    }

    //
    //  Notify (potential) waiting Java thread(s)
    { MonitorLocker mu(VMOperationRequest_lock, Mutex::_no_safepoint_check_flag);
      mu.notify_all();
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

threaddump方法的具体实现在vmOperations.cpp内void VM_ThreadDump::doit()中 获取到所有的线程列表,挨个打印快照信息

void VM_ThreadDump::doit() {
  ResourceMark rm;

  _result->set_t_list();

  ConcurrentLocksDump concurrent_locks(true);
  if (_with_locked_synchronizers) {
    concurrent_locks.dump_at_safepoint();
  }

  if (_num_threads == 0) {
    // Snapshot all live threads

    for (uint i = 0; i < _result->t_list()->length(); i++) {
      JavaThread* jt = _result->t_list()->thread_at(i);
      if (jt->is_exiting() ||
          jt->is_hidden_from_external_view())  {
        // skip terminating threads and hidden threads
        continue;
      }
      ThreadConcurrentLocks* tcl = NULL;
      if (_with_locked_synchronizers) {
        tcl = concurrent_locks.thread_concurrent_locks(jt);
      }
      snapshot_thread(jt, tcl);
    }
  } 
  ...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

# 如果jstack加上了-F参数

jdk8中会使用sa-jdi(service ability)工具获取,这个运行机制略有不同,借助的是SA的功能。

# jstack命令失败的一些情况

Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding 
1
2
  1. /tmp/.java_pidNNN文件被删。比如有一些定期清理脚本
  2. JVM启动设置了-XX:+DisableAttachMechanism
  3. 目标进程太忙,没法进入safepoint(例如长时间的gc停顿)

# 还得到了什么启发

threaddump会导致进入safepoint,频繁safepoint会影响进程吞吐。 触发threaddump的入口有jstack,java.lang.Thread#getAllStackTraces, java.lang.management.ThreadMXBean#getAllThreadIds等

# Resources

  • https://github.com/jvm-profiling-tools/async-profiler/blob/master/README.md troubleshooting
  • https://hongjiang.info/jdk1-6-0-24-jstack-unavailable/