Java基础二:常量池

然而,Java虚拟机不是每次都能理解这类过程,因此要想得到好的系统性能,避免不必要的装箱很关键。这也是
OptionalInt 和 IntStream
等特殊类型存在的原因。在这篇文章中,我将概述JVM很难消除自动装箱的一个原因。

 1 public class AutoUnboxingTest {
 2  public static void main(String[] args) {
 3         Integer a = new Integer(3);
 4         Integer b = 3;                  // 将3自动装箱成Integer类型
 5         int c = 3;
 6         System.out.println(a == b);     // false 两个引用没有引用同一对象
 7         System.out.println(a == c);     // true a自动拆箱成int类型再和c比较
 8        
 9         Integer f1 = 3, f2 = 3, f3 = 150, f4 = 150;
10         System.out.println(f1 == f2);//true
11         System.out.println(f3 == f4);//false
12        
13         Integer p1 = new Integer(3);
14         Integer p2 = new Integer(3);
15         Integer p3 = new Integer(0);
16         System.out.println(p1 == p2);//false,两个不同的对象
17         System.out.println(p1 == p2+p3); //true p2和p3自动拆箱为int类型,p1也会自动拆箱,本质为基本数据类型比较
18     }
19 }
Character boxed = Character.valueOf('a');
char unboxed = boxed.charValue();

目录:

编译器自动将它转换为

  源码中标红的部分是重点,在第一次自动装箱时,程序即会创建[-128 ~
127]区间的Integer,存在cache数组中,并且数组和256个Integer对象都在堆内存中。所有在该区间的比较对应结果为true,故(f1、f2)为true;超出该区间将重新创建对象,故(f3、f4)为false。其实这就涉及常量池的概念了。

基准测试

为了测试 distance()
方法的性能,需要做基准测试。Java中微基准测试很难保证准确,但幸好OpenJDK提供了JMH(Java
Microbenchmark
Harness),它可以帮我们解决大部分难题。如果感兴趣的话,推荐大家阅读文档和实例;它会很吸引你。以下是基准测试:

@State(Scope.Benchmark)
public class MyBenchmark {
private Levenshtein lev = new Levenshtein<>(StringAsList::new);

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public int timeLevenshtein() {
return lev.distance("autoboxing is fast", "autoboxing is slow");
}
}

(返回方法的结果,这样JMH就可以做一些操作让系统认为返回值会被使用到,防止冗余代码消除影响了结果。)

以下是结果:

$ java -jar target/benchmarks.jar -f 1 -wi 8 -i 8
# JMH 1.10.2 (released 3 days ago)
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options:
# Warmup: 8 iterations, 1 s each
# Measurement: 8 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.tavianator.boxperf.MyBenchmark.timeLevenshtein

# Run progress: 0.00% complete, ETA 00:00:16
# Fork: 1 of 1
# Warmup Iteration 1: 1517.495 ns/op
# Warmup Iteration 2: 1503.096 ns/op
# Warmup Iteration 3: 1402.069 ns/op
# Warmup Iteration 4: 1480.584 ns/op
# Warmup Iteration 5: 1385.345 ns/op
# Warmup Iteration 6: 1474.657 ns/op
# Warmup Iteration 7: 1436.749 ns/op
# Warmup Iteration 8: 1463.526 ns/op
Iteration 1: 1446.033 ns/op
Iteration 2: 1420.199 ns/op
Iteration 3: 1383.017 ns/op
Iteration 4: 1443.775 ns/op
Iteration 5: 1393.142 ns/op
Iteration 6: 1393.313 ns/op
Iteration 7: 1459.974 ns/op
Iteration 8: 1456.233 ns/op

Result "timeLevenshtein":
1424.461 ±(99.9%) 59.574 ns/op [Average]
(min, avg, max) = (1383.017, 1424.461, 1459.974), stdev = 31.158
CI (99.9%): [1364.887, 1484.034] (assumes normal distribution)

# Run complete. Total time: 00:00:16

Benchmark Mode Cnt Score Error Units
MyBenchmark.timeLevenshtein avgt 8 1424.461 ± 59.574 ns/op

 

解决方法

解决方法很简单:

@ @ -11,7 +11,7 @ @ public class StringAsList extends AbstractList {

@Override
public Character get(int index) {
- return str.charAt(index); // Autoboxing!
+ return new Character(str.charAt(index));
}

@Override

用显式的装箱代替自动装箱,就避免了调用Character.valueOf(),这样JVM就很容易理解代码:

private final char value;

public Character(char value) {
this.value = value;
}

public char charValue() {
return value;
}

虽然代码中加了一个内存分配,但JVM能理解代码的意义,会直接从String中获取char字符。性能提升很明显:

$ java -jar target/benchmarks.jar -f 1 -wi 8 -i 8
...
# Run complete. Total time: 00:00:16

Benchmark Mode Cnt Score Error Units
MyBenchmark.timeLevenshtein avgt 8 1221.151 ± 58.878 ns/op

速度提升了14%。用 -prof perfasm
命令可以显示,改进以后是直接从String中拿到char值并在寄存器中比较的:

movzwl 0x10(%rsi,%rdx,2),%r11d ;*caload
; - java.lang.String::charAt@27 (line 648)
; - com.tavianator.boxperf.StringAsList::get@9 (line 14)
; - com.tavianator.boxperf.StringAsList::get @ 2 (line 5)
; - com.tavianator.boxperf.Levenshtein::distance@121 (line 32)
cmp %r11d,%r10d
je 0x00007faa8d404792 ;*if_icmpne
; - java.lang.Character::equals@18 (line 4621)
; - com.tavianator.boxperf.Levenshtein::distance@137 (line 33)

  Object中的equals()方法:

总结

装箱是HotSpot的一个弱项,希望它能做到越来越好。它应该多利用装箱类型的语义,消除装箱操作,这样以上的解决办法就没有必要了。

以上的基准测试代码都可以在GitHub上访问。

  3. ==与equals()区别

Character boxed = 'a';
char unboxed = boxed;

  下一篇博客将介绍String与常量池的内容。

Java 的基本数据类型(int、double、
char)都不是对象。但由于很多Java代码需要处理的是对象(Object),Java给所有基本类型提供了包装类(Integer、Double、Character)。有了自动装箱,你可以写如下的代码

 1 /**
 2      * Compares this string to the specified object.  The result is {@code
 3      * true} if and only if the argument is not {@code null} and is a {@code
 4      * String} object that represents the same sequence of characters as this
 5      * object.
 6      *
 7      * @param  anObject
 8      *         The object to compare this {@code String} against
 9      *
10      * @return  {@code true} if the given object represents a {@code String}
11      *          equivalent to this string, {@code false} otherwise
12      *
13      * @see  #compareTo(String)
14      * @see  #equalsIgnoreCase(String)
15      */
16     public boolean equals(Object anObject) {
17         if (this == anObject) {
18             return true;
19         }
20         if (anObject instanceof String) {
21             String anotherString = (String) anObject;
22             int n = value.length;
23             if (n == anotherString.value.length) {
24                 char v1[] = value;
25                 char v2[] = anotherString.value;
26                 int i = 0;
27                 while (n-- != 0) {
28                     if (v1[i] != v2[i])
29                             return false;
30                     i++;
31                 }
32                 return true;
33             }
34         }
35         return false;
36     }

分析

为了查看代码热路径(hot
path)上的结果,JMH集成了Linux工具perf,可以查看最热代码块的JIT编译结果。(要想查看汇编代码,需要安装hsdis插件。我在AUR上提供了下载,Arch用户可以直接获取。)在JMH命令行添加
-prof perfasm 命令,就可以看到结果:

$ java -jar target/benchmarks.jar -f 1 -wi 8 -i 8 -prof perfasm
...
cmp $0x7f,%eax
jg 0x00007fde989a6148 ;*if_icmpgt
; - java.lang.Character::valueOf@3 (line 4570)
; - com.tavianator.boxperf.StringAsList::get@8 (line 14)
; - com.tavianator.boxperf.StringAsList::get@2; (line 5)
; - com.tavianator.boxperf.Levenshtein::distance@121 (line 32)
cmp $0x80,%eax
jae 0x00007fde989a6103 ;*aaload
; - java.lang.Character::valueOf @ 10 (line 4571)
; - com.tavianator.boxperf.StringAsList::get@8 (line 14)
; - com.tavianator.boxperf.StringAsList::get @ 2 (line 5)
; - com.tavianator.boxperf.Levenshtein::distance@121 (line 32)
...

输出内容很多,但上面的一点内容就说明装箱没有被优化。为什么要和0x7f/0×80的内容做比较呢?原因在于Character.valueOf()的取值来源:

private static class CharacterCache {
private CharacterCache(){}

static final Character cache[] = new Character[127 + 1];

static {
for (int i = 0; i < cache.length; i++)
cache[i] = new Character((char)i);
}
}

public static Character valueOf(char c) {
if (c return CharacterCache.cache[(int)c];
}
return new Character(c);
}

可以看出,Java语法标准规定前127个char的Character对象放在缓冲池中,Character.valueOf()的结果在其中时,直接返回缓冲池的对象。这样做的目的是减少内存分配和垃圾回收,但在我看来这是过早的优化。而且它妨碍了其他优化。JVM无法确定
Character.valueOf(c).charValue() ==
c,因为它不知道缓冲池的内容。所以JVM从缓冲池中取了一个Character对象并读取它的值,结果得到的就是和
c 一样的内容。

澳门新葡亰3522平台游戏,   1. 自动装箱与拆箱

实例

例如,我们想要计算任意一类数据的编辑距离(Levenshtein距离),只要这些数据可以被看作一个序列:

public class Levenshtein{
private final Function> asList;

public Levenshtein(Function> asList) {
this.asList = asList;
}

public int distance(T a, T b) {
// Wagner-Fischer algorithm, with two active rows

List aList = asList.apply(a);
List bList = asList.apply(b);

int bSize = bList.size();
int[] row0 = new int[bSize + 1];
int[] row1 = new int[bSize + 1];

for (int i = 0; i row0[i] = i;
}

for (int i = 0; i < bSize; ++i) {
U ua = aList.get(i);
row1[0] = row0[0] + 1;

for (int j = 0; j < bSize; ++j) {
U ub = bList.get(j);
int subCost = row0[j] + (ua.equals(ub) ? 0 : 1);
int delCost = row0[j + 1] + 1;
int insCost = row1[j] + 1;
row1[j + 1] = Math.min(subCost, Math.min(delCost, insCost));
}

int[] temp = row0;
row0 = row1;
row1 = temp;
}

return row0[bSize];
}
}

只要两个对象可以被看作List,这个类就可以计算它们的编辑距离。如果想计算String类型的距离,那么就需要把String转变为List类型:

public class StringAsList extends AbstractList{
private final String str;

public StringAsList(String str) {
this.str = str;
}

@Override
public Character get(int index) {
return str.charAt(index); // Autoboxing! }

@Override
public int size() {
return str.length();
}
}

...

Levenshteinlev = new Levenshtein<>(StringAsList::new);
lev.distance("autoboxing is fast", "autoboxing is slow"); // 4

由于Java泛型的实现方式,不能有List类型,所以要提供List和装箱操作。(注:Java10中,这个限制也许会被取消。)

 1 /**
 2      * Cache to support the object identity semantics of autoboxing for values between
 3      * -128 and 127 (inclusive) as required by JLS.
 4      *
 5      * The cache is initialized on first usage.  The size of the cache
 6      * may be controlled by the -XX:AutoBoxCacheMax=<size> option.
 7      * During VM initialization, java.lang.Integer.IntegerCache.high property
 8      * may be set and saved in the private system properties in the
 9      * sun.misc.VM class.
10      */
11     private static class IntegerCache {
12         static final int low = -128;
13         static final int high;
14         static final Integer cache[];
15         static {
16             // high value may be configured by property
17             int h = 127;
18             String integerCacheHighPropValue =
19                 sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
20             if (integerCacheHighPropValue != null) {
21                 int i = parseInt(integerCacheHighPropValue);
22                 i = Math.max(i, 127);
23                 // Maximum array size is Integer.MAX_VALUE
24                 h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
25             }
26             high = h;
27             cache = new Integer[(high - low) + 1];
28             int j = low;
29             for(int k = 0; k < cache.length; k++)
30                 cache[k] = new Integer(j++);
31         }
32         private IntegerCache() {}
33     }

   用new关键字创建一个新对象,它们的内容可以相同,但其内存中存放的地址不同。而==对对象来说,是比较的内存地址,不同的对象自然不同,故(a、b)(p1、p2)比较为false。那么f1、f2、f3、f4四个变量都是Integer对象引用,为什么会出现一个false一个true呢?这就涉及装箱的本质,当我们给一个Integer对象赋一个int值的时候,会调用Integer类的静态方法valueOf,想知道发生了什么,直接上源码:

  其调用了Integer的内部类IntegerCache,源码如下:

  Java中多数的类都继承于Object基类,其中有定义equals()方法,该方法的初始行为是比较对象的内存地址。一些类库重写了该方法,如String、Integer、Date等,equals()直接对其值进行比较,而不是类在堆内存中的存放地址;对没有重写该方法的类,比较的仍然是内存地址,相当于==。上源码:

   2. 常量池

  

 1 /**
 2      * Returns an {@code Integer} instance representing the specified
 3      * {@code int} value.  If a new {@code Integer} instance is not
 4      * required, this method should generally be used in preference to
 5      * the constructor {@link #Integer(int)}, as this method is likely
 6      * to yield significantly better space and time performance by
 7      * caching frequently requested values.
 8      *
 9      * This method will always cache values in the range -128 to 127,
10      * inclusive, and may cache other values outside of this range.
11      *
12      * @param  i an {@code int} value.
13      * @return an {@code Integer} instance representing {@code i}.
14      * @since  1.5
15      */
16     public static Integer valueOf(int i) {
17         assert IntegerCache.high >= 127;
18         if (i >= IntegerCache.low && i <= IntegerCache.high)
19             return IntegerCache.cache[i + (-IntegerCache.low)];
20         return new Integer(i);
21     }
1 public boolean equals(Object obj) {
2         return (this == obj);
3     }