最近我们的应用后台动不动就被卡住,所以警告就上线一看,CPU时不时飙高,curl请求一下,发现不接受请求了,所以想Thread Dump看一下线程都在干啥。
首先:
kill -3 <pid>然后tomcat会把Thread Dump打印到其安装目录的logs/catalina.out里查看一下发现果不其然,死锁了,其表征为大量tomcat的http线程都WAITING在同一个地方,如下:
"http-nio-9086-exec-163" daemon prio=10 tid=0x00007f8ab00ab800 nid=0x18e7 waiting on condition [0x00007f8a48644000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000d3643028> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at com.fasterxml.jackson.databind.util.LRUMap.get(LRUMap.java:56) at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:707) at com.fasterxml.jackson.databind.type.TypeFactory._constructType(TypeFactory.java:387) at com.fasterxml.jackson.databind.type.TypeFactory.constructType(TypeFactory.java:367) 需要注意的是LRUMap这个类的get方法,这里上了读锁,所以继续找到了写锁的线程,如下: "http-nio-9086-exec-31" daemon prio=10 tid=0x00007f8ab0026000 nid=0x795c runnable [0x00007f8a5b471000] java.lang.Thread.State: RUNNABLE at java.util.LinkedHashMap.transfer(LinkedHashMap.java:253) at java.util.HashMap.resize(HashMap.java:581) at java.util.HashMap.addEntry(HashMap.java:879) at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427) at java.util.HashMap.put(HashMap.java:505) at com.fasterxml.jackson.databind.util.LRUMap.put(LRUMap.java:68) at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:738) at com.fasterxml.jackson.databind.type.TypeFactory._constructType(TypeFactory.java:387) at com.fasterxml.jackson.databind.type.TypeFactory.constructType(TypeFactory.java:358) at com.fasterxml.jackson.databind.cfg.MapperConfig.constructType(MapperConfig.java:268) 当看见Map的时候就隐约感觉有问题,因为多线程下Map有很多问题。需要注意的是LRUMap这个类的put方法。
看源码(版本2.4.1)发发现LRUMap继承自LinkedHashMap,所以详细看看他的put和get方法,发现get会更改内部状态(recordAccess),虽然LRUMap重载了get方法,加了锁,但是这个锁是读锁,所以get和put一来一往就才出现状态问题了。
然后到Jackson的github上看了LRUMap的提交历史找到一个相关issue:链接
