Archive for March, 2008

instanceof and casting in Java.

Monday, March 24th, 2008

I write a lot of Java code. I like the fact that Java is statically typed and that the compiler can point out mistakes that I make early in the development process. What I don’t like, is having to repeat myself. Let me say that again, I don’t like to repeat myself. But the Java language makes me do it, a lot, even though in many cases it should be unnecessary. Here is a pattern that you will see a lot in Java code:


  if (object instanceof SomeClass) {
      SomeClass someValue = (SomeClass) object;
      ...
  }

Without getting into the whole “instanceof is evil” debate, let’s look at what this code is doing. First, it tests if object is an instance of SomeClass, and if it is, then it declares that object is an instance of SomeClass, while creating a second local variable named someValue to hold the exact some value already stored in object.

Within the scope of the if statement, object is necessarily an instance of SomeClass. Consequently, having to cast object to SomeClass is a form of “repeating oneself”. Additionally, having two variables both hold the same runtime value is another form of repeating oneself. Ideally, the Java language should simply allow us to treat object as an instance of SomeClass in any scope where it can be proven that that type constraint is true.

Is this a limitation of the Java language, or of the Java Virtual Machine, or both? Let’s take a closer look at what is going on under the hood. Given the following, rather contrived class:


public class Test {
    public void test(Object obj) {
        if (obj instanceof Test) {
            Test test = (Test) obj;
            test.doSomething();
        }
    }

    public void doSomething() {
        ...
    }
}

The Java compiler compiles the test method down to the following byte code:


public void test(java.lang.Object);
  Code:
   0:  aload_1
   1:  instanceof #2; //class Test
   4:  ifeq  16
   7:  aload_1
   8:  checkcast #2; //class Test
   11: astore_2
   12: aload_2
   13: invokevirtual #3; //Method doSomething:()V
   16: return

For those not versed in the language of the JVM, here is an english translation:

0: push variable 1 (obj) onto the stack
1: pop a value off the stack (obj), and push 1 on the stack if the value is an instance of Test, otherwise push 0
4: pop a value off the stack, and if the value is 0, goto instruction 16, otherwise continue to the next instruction
7: push variable 1 (obj) onto the stack
8: check if a cast of the value on the top of the stack to Test is valid (i.e., is the value null or an instance of Test), and if not valid, throw a ClassCastException
11: pop a value off the stack and store it at variable location 2
12: push variable 2 onto the stack
13: pop a value off the stack and call doSomething() on it
16: return

We can see that this is a fairly literal translation of the original source code into the stack oriented JVM instruction set, complete with both the instanceof test and the cast, plus the use of two local variables to store one value.

What would happen if we were to write our own byte code that omits the cast and second variable? Fortunately, there is a very cool tool called Jasmin that will allow us to write byte code instructions using an assembler like syntax, and then compile it directly to Java byte code, thus skipping the Java language completely.

Here is a my new definition of the test method written in Jasmin syntax:


.method public test(Ljava/lang/Object;)V
    .limit stack 1
    .limit locals 2
    aload_1
    instanceof Test
    ifeq Done
    aload_1
    invokevirtual Test/doSomething()V
    Done:
    return
.end method

In the test method, I push variable 1 onto the stack, test if it is an instance of Test, and if it is, I call doSomething() on it. What happens when I try to call this method?


Exception in thread "main" java.lang.VerifyError: (class: Test, method: test signature: (Ljava/lang/Object;)V) Incompatible object argument for function call

The JVM generates a VerifyError because it doesn’t believe that variable 1 is necessarily an instance of Test. If we read the Java Virtual Machine Specification, and think about the way byte code is organized, this makes sense. At the byte code level, the original structured, lexical scoping of the Java language is not preserved. Instead, we have a flat instruction list with branches and gotos. The instanceof instruction itself is not even a branching instruction, it is simply a boolean predicate that pushes a 1 or 0 onto the stack. This means that at all instructions following the instanceof instruction, the value in variable 1 may or may not be an instance of Test, and thus the instanceof test is not sufficient for the verifier to believe that all subsequent uses of the variable can treat the value as an instance of Test. The checkcast instruction, on the other hand, will only continue to the next instruction if the value can be cast to the indicated type.

We can now see that the cast is required by the JVM. But does that mean that the cast must trickle up and be required in the Java source code as well? And what about the use of two local variables to store just one value?

Ideally, the Java language would allow us to write the following code:


public class Test {
    public void test(Object obj) {
        if (obj instanceof Test) {
            obj.doSomething();
        }
    }

    public void doSomething() {
        ...
    }
}

And the Java compiler would generate the following valid Java byte code:


public void test(java.lang.Object);
  Code:
   0:  aload_1
   1:  instanceof #12; //class Test
   4:  ifeq 16
   7:  aload_1
   8:  checkcast #12; //class Test
   11: astore_1
   12: aload_1
   13: invokevirtual #15; //Method doSomething:()V
   16: return

In this byte code, the theoretical compiler automatically generated the checkcast instruction and eliminated the use of a second local variable. This is great, because we did not have to repeat ourself in the source file. The only annoyance is that variable 1 is twice checked if it is an instance of Test (once via instanceof and once via checkcast), and thus we incur an annoying, but minor, performance hit. Let’s look at a slightly more complicated example and see if this pattern holds up.


import java.io.IOException;

public class Test {
    public void test(Runnable r) throws IOException {
        if (r instanceof Appendable) {
            Appendable a = (Appendable) r;
            a.append('x');
        }
        r.run();
    }
}

If we compiled this code using the same pattern as above, of calling checkcast and then writing back to the same variable location, we would get:


public void test(java.lang.Runnable);
  Code:
   0:  aload_1
   1:  instanceof #21; //class java/lang/Appendable
   4:  ifeq 19
   7:  aload_1
   8:  checkcast #21; //class java/lang/Appendable
   11: astore_1
   12: aload_1
   13: bipush 120
   15: invokevirtual #12; //Method java/lang/Appendable.append:(C)Ljava/lang/Appendable;
   18: pop
   19: aload_1
   20: invokevirtual #13; //Method java/lang/Runnable.run:()V
   23: return
}

What is interesting here is that at instruction 8, we change the type associated with variable 1 from java.lang.Runnable to java.lang.Appendable, but at instruction 20, we make a method call that assumes that variable 1 is an instance of java.lang.Runnable. In fact, we can reach instruction 20 via two different routes, one that changes the type of variable 1 to java.lang.Appendable, and one that retains the type of java.lang.Runnable. It is the case that, in this example, the value in variable 1 at instruction 20 is necessarily an instance of java.lang.Runnable, even if we followed the branch in which we changed the type associated with variable location 1 to java.lang.Annotation. So the call to java.lang.Runnable.run() should be safe. But what does the JVM verifier think? Strangely, it doesn’t give us a VerifyError, but it does give us an IncompatibleClassChangeError (I’ll leave it as an exercise for the reader to figure out why).

Sadly, this means that the general pattern of having the compiler emit byte code that changes the type associated with a variable, rather than creating a separate variable to to hold the same value but with a different compile time type, just won’t work. But, that doesn’t mean that the two variable, one value pattern needs to be exposed at the Java language level. The Java Language Specification and the Java compiler could be changed to allow us to treat a single variable as having different types in different scopes, and to leave the details of how to represent that in byte code to the compiler.

© 2009 Synthesis Studios. All rights reserved. Terms & Conditions | Privacy Policy | Accessibility