Mar 20, 2010

Pre-python gdb scripting: dissecting a JVM core dump (part 1/2)

1. Introduction

This is going to be a 2-part article describing how one can develop a gdb script for examining tree-like structures in a core file. It also gives some insight on how to orient yourself in a core file / disassembly when you have no debugging symbols.

In the part 1, we will start by reproducing some random crash of Sun's JVM (taken from bugs.sun.com), exploring core file a little bit, and then proceed to indroducing a set of gdb script functions to semi-automate the process.

The target audience for this are gdb beginners, since explanations tend to be rather verbose. Also, as the title suggests, automating part describes pre-python gdb scripting (recently, it has become possible to write gdb scripts with Python, and it's NOT what is described here).

2. Getting a core to experiment with

Let's start by picking the bug. This one looks good enough - it has a rather small reproducer, and fails on the pretty much latest 1.6.0_16 JVM release (exactly what I happen to have installed on my machine).

Example 1. CrashTest.java - the failing test

public class CrashTest {
 
 public static void main(String[] args) {
   for (int i = 0; i < 2; i++)
     NestedLoop();
 }
 
 public static long NestedLoop() {
   final int n = 50;
 
   long startTime = System.currentTimeMillis();
 
   int x = 0;
   for (int a = 0; a < n; a++)
     for (int b = 0; b < n; b++)
       for (int c = 0; c < n; c++)
         for (int d = 0; d < n; d++)
           for (int e = 0; e < n; e++)
             for (int f = 0; f < n; f++)
               x++;
 
   long stopTime = System.currentTimeMillis();
 
   return stopTime - startTime;
 }
 
}


Now, I run it and get the core file:

Example 2. Reproducing the failure

$ javac CrashTest.java 
$ ulimit -c unlimited
$ java -server -Xcomp CrashTest 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb7062c2c, pid=9322, tid=2873416560
#
# JRE version: 6.0_16-b01
# Java VM: Java HotSpot(TM) Server VM (14.2-b01 compiled mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4bfc2c]
#
# An error report file with more information is saved as:
# .../hs_err_pid9322.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted (core dumped)
$


3. A bit of manual debugging

Ok, now we have a core file to attach the gdb to:

$ gdb java core

Let's see what the failing thead has been doing:

Example 3. Backtrace of the failing thread

(gdb) backtrace
         /  #0  0xb7863424 in __kernel_vsyscall ()
        |   #1  0xb770b8e0 in raise () from /lib/i686/cmov/libc.so.6
 error  |   #2  0xb770ee15 in abort () from /lib/i686/cmov/libc.so.6
repor-  |   #3  0xb70cfb9f in os::abort(bool) () from /usr/lib/jvm/java-6-sun-1.6.0.16/jre/lib/i386/server/libjvm.so
 ting   |   #4  0xb71f2b81 in VMError::report_and_die() () from ...
        |   #5  0xb70d63ac in JVM_handle_linux_signal () from ...
         \  #6  0xb70d2624 in signalHandler(int, siginfo*, void*) () from ...
            #7  <signal handler called>
         /  #8  0xb7062c2c in PhaseIdealLoop::get_early_ctrl(Node*) () from ...
        |   #9  0xb70680ea in IdealLoopTree::counted_loop(PhaseIdealLoop*) () from ...
        |  #10 0xb7067ed6 in IdealLoopTree::counted_loop(PhaseIdealLoop*) () from ...
 compi- |  #11 0xb7067ed6 in IdealLoopTree::counted_loop(PhaseIdealLoop*) () from ...
lation  |  #12 0xb7067ed6 in IdealLoopTree::counted_loop(PhaseIdealLoop*) () from ...
of java |  #13 0xb7067ed6 in IdealLoopTree::counted_loop(PhaseIdealLoop*) () from ...
method  |  #14 0xb7067ed6 in IdealLoopTree::counted_loop(PhaseIdealLoop*) () from ...
        |  #15 0xb706a31c in PhaseIdealLoop::PhaseIdealLoop(PhaseIterGVN&, PhaseIdealLoop const*, bool) () from ...
        |  #16 0xb6dfe34b in Compile::Optimize() () from ...
        |  #17 0xb6dfb5ff in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool) () from ...
        |  #18 0xb6d95c4f in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from ...
         \ #19 0xb6e03d0b in CompileBroker::invoke_compiler_on_method(CompileTask*) () from ...
           #20 0xb6e03629 in CompileBroker::compiler_thread_loop() () from ...
           #21 0xb71ba306 in compiler_thread_entry(JavaThread*, Thread*) () from ...
           #22 0xb71b397e in JavaThread::run() () from ...
           #23 0xb70d4cce in java_start(Thread*) () from ...
           #24 0xb783c585 in start_thread () from /lib/i686/cmov/libpthread.so.0
           #25 0xb77b029e in clone () from /lib/i686/cmov/libc.so.6
(gdb)


So, JVM crashed while trying to compile some Java method. We can obtain the name of that method from 3rd parameter passed to Compile::Compile() at frame #17 (and we will), but for now, an easier way is to look it up in the error report hs_err_pid9322.log (though it's kind of obvious here which method causes the failure):

...
Current CompileTask:
C2:306   b  CrashTest.NestedLoop()J (115 bytes)
...

3.1. Look up the sources

Obviously, we don't have sources from which JVM 1.6.0_16 was compiled. But, luckily for us, OpenJDK project exists, which is an open-sourced version of Java. What's important here is that such fundamental data structures as classes representing compilation, java methods, and stuff like that are stable enough and rarely change with respect to fields layout and control flow in general.

One can check out the sources from OpenJDK using Mercurial (we will need hotspot subdirectory only):

$hg clone http://hg.openjdk.java.net/jdk6/jdk6/hotspot hotspot

For example, now we could look up sources corresponding to frame #16 from the backtrace we obtained:

Example 4. Compile::Optimize() call, hotspot/src/share/vm/opto/compile.cpp, line 667

  ...
  // Now optimize
  Optimize();
  if (failing())  return;
  ...


3.2. Get hold of 'this'

First thing we need (to be able to examine object's fields), is to determine where the object itself resides in memory. This is easy if we have some code calling member method of the object, like Compile::Optimize() case discussed above - all we need to know is the value of this pointer. According to the x86 calling conventions, this pointer is pushed to stack last just before the call.

Let's look then at the disassembly of frame #17, where Optimize() was called:

(gdb) set print asm-demangle on
(gdb) frame 17
#17 0xb6dfb5ff in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool) () from ...
(gdb) info reg eip
eip            0xb6dfb5ff       0xb6dfb5ff <Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool)+1967>
(gdb) disassemble $eip-9 $eip
Dump of assembler code from 0xb6dfb5f6 to 0xb6dfb5ff:
0xb6dfb5f6 <Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool)+1958>:    mov    0x8(%ebp),%eax
0xb6dfb5f9 <Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool)+1961>:    push   %eax
0xb6dfb5fa <Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool)+1962>:    call   0xb6dfdf80 <Compile::Optimize()>
End of assembler dump.
(gdb)

As you can see, the last thing pushed is a value taken from address $ebp+0x8, and it's the value of this pointer. We can check that both [$ebp+0x8] and [$esp] are same at this point:

(gdb) print/x *(void**)($ebp+0x8)
$7 = 0xab44c900
(gdb) print/x *(void**)($esp+0x0)
$8 = 0xab44c900
(gdb)

Why it's taken precisely from 0x8 offset from $ebp is not very important for now, but that's simply because call instruction puts some stuff (like return address) on stack, and also in the called method prologue they store old value of bsp just before storing esp to ebp:

(gdb) disassemble 'Compile::Optimize()'
Dump of assembler code for function _ZN7Compile8OptimizeEv:
0xb6dfdf80 <Compile::Optimize()+0>:     push   %ebp
0xb6dfdf81 <Compile::Optimize()+1>:     mov    %esp,%ebp
...

Let's double check we got this right. From class Compile and class Phase sources we see that there is a field 'Compile * C' (hotspot/src/share/vm/opto/phase.hpp, line 105) which must be located in the beginning of any Compile instance in memory, and Compile::Compile() stores this to the field in the very beginning of the method. Indeed:

(gdb) print/x *0xab44c900@10        // see Compile class instance in memory
$14 = {0x0, 0xab44c900, 0x132, 0x100, 0x937e288, 0xffffffff, 0x93d19d4, 0x9304798, 0x0, 0x0}
(gdb)          ^^^ here it is, Compile *C == this

3.3. Examining data structures

Our ultimate goal is to examine 'InlineTree * _ilt' field of a Compile class instance. This is a tree-like structure holding information about what methods the compiler decided to inline during the compilation. The class is inherently recursive (children are listed in the _subtrees field):

Example 5. class InlineTree source code (hotspot/src/share/vm/opto/parse.hpp, line 32)

class InlineTree : public ResourceObj {
  Compile*    C;                  // cache
  JVMState*   _caller_jvms;       // state of caller
  ciMethod*   _method;            // method being called by the caller_jvms
  InlineTree* _caller_tree;
  uint        _count_inline_bcs;  // Accumulated count of inlined bytecodes
  // ... skipped ...
  GrowableArray<InlineTree*> _subtrees;
  ...


We will return to this InlineTree structure in the part 2, but for now let's concentrate on printing ciMethods, and subsequently, ciSymbol, ciInstanceKlass, and ciSignature:

Example 6. ciMethod class sources (hotspot/src/share/vm/ci/ciMethod.hpp, line 42)

 ...
 private:
  // General method information.
  ciFlags          _flags;
  ciSymbol*        _name;
  ciInstanceKlass* _holder;
  ciSignature*     _signature;
  ciMethodData*    _method_data;
  ...


4. Automating with gdb

We will now obtain a pointer to some ciMethod instance, figure out its layout in memory, obtain pointers to method's signature, name, klass, and figure out how to print all of them. The we will proceed to creating gsb script functions to print all of them.

4.1. ciMethod class layout

If we look at our backtrace, frame #17, we can see that some ciMethod* was passed to Compile() constructor as 3rd parameter. Knowing that 'this' pointer is at $ebp+0x8 address in memory, it's easy to tell where passed ciMethod* resides on stack:

(gdb) frame 17
#17 0xb6dfb5ff in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool) () from ...
(gdb) print/x *(void**)($ebp+0x8)@7
$18 = {0xab44c900, 0xab44ce00,   0x9343d50,   0x937e288,  0xffffffff,  0x1,  0x0  }
             this      ciEnv*  C2Compiler*    ciMethod*          int  bool  bool
(gdb)

So, 0x937e288 seems to be the address of some ciMethod object. We can verify it by checking out its virtual table:

(gdb) p/x *0x937e288       // get virtual table address
$31 = 0xb7297248
(gdb) p/x *0xb7297248@4    // get addresses of first 4 virtual methods
$32 = {0xb6db99d0, 0xb6db9ab0, 0xb6da8d70, 0xb6da8d80}
(gdb) x 0xb6db99d0         // check out a method at 1st address
0xb6db99d0 <ciMethod::print_impl(outputStream*)>:       0x57e58955
(gdb) x 0xb6db9ab0         // check out a method at 2nd address
0xb6db9ab0 <ciMethod::type_string()>:   0x53e58955
(gdb)

Now let's look at ciMethod object layout in memory:

Example 7. ciMethod layout, hotspot/src/share/vm/ci/ciMethod.hpp

(gdb) p/x *0x937e288@30
$33 = {0xb7297248,       // virtual table address
                         // ================ ciObject fields begin
       0x0938e3ec,       // jobject  _handle
              0x0,       // ciKlass* _klass
            0x4af,       // uint     _ident
                         // ================ ciObject fields end
                         // ================ ciFlags fields begin
       0xc1000009,       // jint     _flags
                         // ================ ciFlags fields end
        0x937e2e8,       // ciSymbol*         _name
        0x937e2f8,       // ciInstanceKlass*  _holder
        0x937e360,       // ciSignature*      _signature
        0x937e3a8,       // ciMethodData*     _method_data
              0x0,       // BCEscapeAnalyzer* _bcea
        0x937ef18,       // ciMethodBlocks*   _method_blocks
             0x73,       // int               _code_size
              0x4,       // int               _max_stack
              ...


Once again, we can check we deduced this correctly by examining virtual tables of 'ciSymbol* _name' and 'ciInstanceKlass* _holder':

(gdb) x *0x937e2e8       // ciSymbol*         _name
0xb7297d68 <vtable for ciSymbol+8>:             0xb6dc1b00
(gdb) x *0x937e2f8       // ciInstanceKlass*  _holder
0xb72970e8 <vtable for ciInstanceKlass+8>:      0xb6db0280
(gdb)

4.2. ciSymbol layout and printing

This is how a ciSymbol is printed in C++ code:

void ciSymbol::print_symbol_on(outputStream *st) {
  GUARDED_VM_ENTRY(get_symbolOop()->print_symbol_on(st);)
}

symbolOop ciSymbol::get_symbolOop() { return (symbolOop)get_oop(); }

oop ciObject::get_oop() const {
  assert(_handle != NULL, "null oop");
  return JNIHandles::resolve_non_null(_handle);
}

inline oop JNIHandles::resolve_non_null(jobject handle) {
  // ... skipped ...
  oop result = *(oop*)handle;
  // ... skipped ...
  return result;
};

And this is symbolOopDesc sources:

Example 8. hotspot/src/share/vm/oops/symbolOops.hpp

class symbolOopDesc : public oopDesc {
  friend class VMStructs;
 private:
  unsigned short _length; // number of UTF8 characters in the symbol
  jbyte _body[1];
  ...


So, if we could obtain symbolOopDesc object address from our ciSymbol, we will be able to print its _body. As seen from the snippet above, symbolOopDesc is just a dereferenced ciSymbol::_handle, so let's obtain symbolOopDesc for known ciSymbol* (0x937e2e8) and examine it:

(gdb) p/x *0x937e2e8@4
$40 = {0xb7297d68, 0x938e3f0,      0x0,     0x4a9}
           vtable    _handle    _klass     _ident
(gdb) p/x *0x938e3f0          // de-reference _handle, getting oop
$41 = 0xafb8c538
(gdb) p/x *0xafb8c538@20      // examine symbolOopDesc in memory
$42 = {                       // ================= oopDesc fields begin
                 0x1,         // markOop   _mark
          0xaf9d0230,         // _metadata union
                              // ================= oopDesc fields end
                              // this is where the actual symbol data starts
          0x654e000a,         // 000a is 'unsigned short _length' (offset 8 bytes from symbolOopDesc start)
          0x64657473,         // and jbyte _body offset is 10 bytes from the start
          0x706f6f4c, 
          ...

Knowing the offset to the first char, we can easily print the string with gdb:

(gdb) x/s (char*)0xafb8c538+10
0xafb8c542:      "NestedLoop"
(gdb)

So, finally, our first result - the method on which compilation failed, is "NestedLoop". Not a huge achievement, since we knew it right from the beginning - it's right there, in the generated error file hs_err_pid9322.log.

4.3. Gdb script for printing full name of a ciMethod

At this point we are ready to create first gdb script function, which will print a string for a given symbolOop. To do that, we create a file (let's call it ~/gdb_script) with the following content:

Example 9. Gdb script function for printing symbolOop*

define dump_symbol
  set $symboloop_length_offset = 8
  printf "%s", ($arg0)+$symboloop_length_offset+2
end
document dump_symbol
  Prints symbolOop at address arg0
end


Once loaded into gdb session, dump_symbol function can be used to print out any symbolOop automatically:

(gdb) source ~/gdb_script 
(gdb) help dump_symbol
  Prints symbolOop at address arg0
(gdb) dump_symbol 0xafb8c538
NestedLoop(gdb)

To be able to print a string having ciSymbol* as a starting point, we'll need a couple of extra functions:

Example 10. ciSymbol printing with gdb script

# offset of _handle field in class ciSymbol in bytes
set $cisymbol_handle_offset = 4

define read_4
  set $$arg0 = *($arg1)
end
document read_4
  Reads 4 bytes from address $arg1 to a variable named by $arg0
end

define dump_cisymbol
  read_4 tmp (($arg0)+$cisymbol_handle_offset)
  read_4 tmp $tmp
  dump_symbol $tmp
end
document dump_cisymbol
  Dumps ciSymbol* given as arg0
end

There is no concept of return variable in gdb script functions, so we need tricks like the one used in read_4 function - initializing a global variable with a returned value. What read_4 does is ultimately a de-referencing of some pointer:

(gdb) p/x *0x937e2e8             // de-reference by gdb
$51 = 0xb7297d68
(gdb) read_4 testvar 0x937e2e8   // de-reference by read_4
(gdb) p/x $testvar
$52 = 0xb7297d68                 // check result
(gdb)


Now we can print ciSymbol, but for printing ciMethod's full name, we'd also need to know how to print ciInstanceKlass (let's skip the signature for now). I'll omit the details here, since ciInstanceKlass has ciSymbol* somewhere inside its body, so it's only a matter of getting correct offsets:

Example 11. ciMethod printing

set $cimethod_holder_offset = 24
set $ciklass_name_offset    = 20
set $cimethod_name_offset   = 20

define dump_cimethod
  read_4 klass (($arg0)+$cimethod_holder_offset)
  read_4 name ($klass+$ciklass_name_offset)
  dump_cisymbol $name
  printf "."

  read_4 name (($arg0)+$cimethod_name_offset)
  dump_cisymbol $name
  printf "\n"
end
document dump_cimethod
  Dumps ciMethod* given as arg0
end


Note

All offsets, of course, are subject to change depending on a platform (size of void*, and other types), compiler (paddings) and JVM flavour (debug/product).

5. Conclusion

We have created several gdb script functions, which make digging through JVM's cores a little bit easier. Let's test:

(gdb) source ~/gdb_script 
(gdb) apropos dump_
dump_cimethod --   Dumps ciMethod* given as arg0
dump_cisymbol --   Dumps ciSymbol* given as arg0
dump_symbol --   Prints symbolOop at address arg0
(gdb) dump_cimethod 0x937e288
CrashTest.NestedLoop
(gdb)

Stay tuned for part 2 of this, where we will introduce stack-aware gdb script functions, and automate walking of tree-like structures such as InlineTree.

6. Further reading

No comments:

Post a Comment