Alibaba Java troubleshooting tool list!

We often encounter many difficult problems in our daily work. While solving the problems, some tools have played a considerable role. Write it down here. First, it can be used as a note to make us forget later and can be read quickly. Second, it is to share. We hope that the students who see this article can take out the tools they feel are very helpful in daily life and make progress together.

Linux command class


The most commonly used tail -f

tail -300f shopbase.log #Count down 300 lines and enter the real-time listening file writing mode


grep forest f.txt     #File search
grep forest f.txt cpf.txt #Multi file lookup
grep 'log' /home/admin -r -n #Find all files that match the keyword in the directory
cat f.txt | grep -i shopbase    
grep 'shopbase' /home/admin -r -n --include *.{vm,java} #Specify file suffix
grep 'shopbase' /home/admin -r -n --exclude *.{vm,java} #Inverse matching
seq 10 | grep 5 -A 3    #Upper match
seq 10 | grep 5 -B 3    #Lower matching
seq 10 | grep 5 -C 3    #Match up and down. It's appropriate to use this at ordinary times
cat f.txt | grep -c 'SHOPBASE'


1 basic command

awk '{print $4,$6}' f.txt
awk '{print NR,$0}' f.txt cpf.txt    
awk '{print FNR,$0}' f.txt cpf.txt
awk '{print FNR,FILENAME,$0}' f.txt cpf.txt
awk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txt
echo 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'

2 matching

awk '/ldb/ {print}' f.txt   #Match ldb
awk '!/ldb/ {print}' f.txt  #Mismatched ldb
awk '/ldb/ && /LISTEN/ {print}' f.txt   #Match ldb and LISTEN
awk '$5 ~ /ldb/ {print}' f.txt #The fifth column matches ldb

3 built in variables

  • NR:NR indicates the number of data read according to the record separator after awk. The default record separator is line feed, so the default is the number of data rows read. NR can be understood as the abbreviation of Number of Record.

  • FNR: when awk processing multiple input files, after the first file is processed, the NR does not start from 1, but continues to accumulate. Therefore, FNR appears. Whenever a new file is processed, the FNR counts from 1. FNR can be understood as File Number of Record.

  • NF: NF indicates the number of fields divided by the current record. NF can be understood as Number of Field.


sudo -u admin find /home/admin /tmp /usr -name \*.log(Multiple directories to find)
find . -iname \*.txt(Match case)
find . -type d(All subdirectories under the current directory)
find /usr -type l(All symbolic links in the current directory)
find /usr -type l -name "z*" -ls(Details of symbolic links eg:inode,catalogue)
find /home/admin -size +250000k(Over 250000 k Of course+Change to-Is less than)
find /home/admin f -perm 777 -exec ls -l {} \; (Query files according to permissions)
find /home/admin -atime -1  1 Files accessed within days
find /home/admin -ctime -1  1 Files whose status has changed within days    
find /home/admin -mtime -1  1 Documents modified within days
find /home/admin -amin -1  1 Files accessed in minutes
find /home/admin -cmin -1  1 Files whose status has changed in minutes    
find /home/admin -mmin -1  1 Files modified within minutes


Batch query VM shopbase logs that meet the conditions

pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'


tsar is our company's own collection tool. It's easy to use. The data collected in the history is persisted on the disk, so we can quickly query the historical system data. Of course, real-time applications can also be queried. It is installed on most machines.

tsar  ###You can view the indicators of the last day
tsar --live ###You can view real-time indicators. By default, you can brush every five seconds
tsar -d 20161218 ###Specify to view the data of a certain day. It seems that you can only view the data of four months at most
tsar --memtsar --loadtsar --cpu###Of course, this can also be combined with the - d parameter to query the situation of a single indicator on a certain day 


In addition to looking at some basic information, the rest of top is to cooperate to query various problems of vm

ps -ef | grep java
top -H -p pid

After getting the thread from hexadecimal to hexadecimal, jstack goes to catch it and see what the thread is doing


netstat -nat|awk  '{print $6}'|sort|uniq -c|sort -rn #Check the current connection and pay attention to close_ High wait

Check sharp weapon


The first thing to say is btrace. It's really a problem killer in the production environment. I won't say anything about the introduction. Go straight to the code

1. Check who has called the add method of ArrayList, and print only the thread call stack with the size greater than 500 of the current ArrayList

    @OnMethod(clazz = "java.util.ArrayList", method="add", location = @Location(value = Kind.CALL, clazz = "/.*/", method = "/.*/"))
public static void m(@ProbeClassName String probeClass, @ProbeMethodName String probeMethod, @TargetInstance Object instance, @TargetMethodOrField String method) {
   if(getInt(field("java.util.ArrayList", "size"), instance) > 479){
       println("check who ArrayList.add method:" + probeClass + "#" + probeMethod  + ", method:" + method + ", size:" + getInt(field("java.util.ArrayList", "size"), instance));

2. Monitor the value returned when the current service method is called and the requested parameters

@OnMethod(clazz = "", method="nav", location = @Location(value = Kind.RETURN))
public static void mt(long userId, int current, int relation, String check, String redirectUrl, @Return AnyType result) {
   println("parameter# userId:" + userId + ", current:" + current + ", relation:" + relation + ", check:" + check + ", redirectUrl:" + redirectUrl + ", result:" + result);

Some tools of other functional groups are more or less available, so I won't say. If you are interested, please move.

be careful:

  1. After observation, the release output of 1.3.9 is unstable. You need to trigger several times to see the correct result
  2. When the regular expression matches the trace class, the range must be controlled, otherwise the application may get stuck due to running full of CPU
  3. Due to the principle of bytecode injection, if you want the application to return to normal, you need to restart the application.


Greys is @ Du Kun's masterpiece. Say some great functions (some functions coincide with btrace):

  • sc -df xxx: output the details of the current class, including the source location and classloader structure
  • I like this function very much! You can see this function in jpprofiler a long time ago. Print out the time consumption of the current method call and subdivide it into each method. It is very helpful to check the performance of methods. For example, this article uses the trace command to: .

Other functional parts coincide with btrace and can be selected. Please move if you are interested.

In addition, Arthas is associated. It is based on Greys and is interested in moving again


Let's say a function classes: by modifying the bytecode, the content of the class is changed and takes effect immediately. So you can quickly log somewhere to see the output. The disadvantage is that it is too intrusive to the code. But if you know what you're doing, it's a good thing.

Other functions Greys and btrace can easily do, no more.

Take a look at an introduction to javOSize Please move to the official website


This is a powerful troubleshooting tool of Alibaba open source soon. It is very convenient. More operations and steps can be taken


Before, many problems had to be judged through jpprofiler, but now Greys and btrace can basically solve them. In addition, the problem is basically the production environment (network isolation), so it is not used much, but it still needs to be marked. Please move to the official website

Big killer


It can be used as a plug-in of eclipse or as a separate program. Please move for details


The development within the group should be known by everyone. In a nutshell: what do you need to mat with zprofiler? Please move to zprofiler for details

java three board axe, oh, no, it's seven


I only have one command:

sudo -u admin /opt/taobao/java/bin/jps -mlvV


Common usage:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815

native+java stack:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815


You can see the system startup parameters as follows:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815


Two purposes

1. Check the heap

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815


sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815


sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815

3. Look who occupied the pile? With zprofiler and btrace, troubleshooting problems is like a tiger's wings

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10


There are many jstat parameters, but one is enough

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000 


Today, jdb is often used. jdb can be used to pre send debug, assuming that you pre send JAVA_HOME is * * / opt/taobao/java / * *, and the remote debugging port is 8000 that

sudo -u admin /opt/taobao/java/bin/jdb -attach 8000

You can set breakpoints for debugging later. Specific parameters can be seen Official description of oracle


CHLSDB feels that more interesting things can be seen in many cases. I won't describe them in detail. It is said that tools such as jstack and jmap are based on it.

sudo -u admin /opt/taobao/java/bin/java -classpath /opt/taobao/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB

More detailed It can be seen that R big this post

VM options

Which file is your class loaded from?

-XX:+TraceClassLoading The result is as follows[Loaded java.lang.invoke.MethodHandleImpl$Lazy from D:\programme\jdk\jdk8U74\jre\lib\rt.jar]

The application hung the output dump file

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/java.hprof Group vm This option is basically available in the parameters

jar package conflict

Isn't it too much to write this in a separate headline? Everyone has dealt with this annoying case more or less. I have so many plans below. Can't you believe it?

mvn dependency:tree > ~/dependency.txt

Play all dependencies

mvn dependency:tree -Dverbose -Dincludes=groupId:artifactId

Only the dependencies between the specified groupId and artifactId are typed


vm startup script is added. The details of the loaded class can be seen in the tomcat startup script


vm startup script is added. The details of the loaded class can be seen in the tomcat startup script


The sc command of greys can also clearly see where the current class is loaded from


You can find out from the following url where the current class is loaded

curl http://localhost:8006/classloader/locate?class=org.apache.xerces.xs.XSObject

Surprise from ALI-TOMCAT (thanks @ Wu Guan)

List the jar s loaded by the container

curl http://localhost:8006/classloader/jars

Lists the actual jar package locations currently loaded by the current class, which is useful for resolving class conflicts

curl http://localhost:8006/classloader/locate?class=org.apache.xerces.xs.XSObject




If you find that your java process has quietly disappeared without leaving any clues, then dmesg is likely to have what you want.

sudo dmesg|grep -i kill|less

Find the keyword oom_killer. The results found are similar to the following:

[6710782.021013] java invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_scoe_adj=0[6710782.070639] [<ffffffff81118898>] ? oom_kill_process+0x68/0x140 [6710782.257588] Task in /LXC011175068174 killed as a result of limit of /LXC011175068174 [6710784.698347] Memory cgroup out of memory: Kill process 215701 (java) score 854 or sacrifice child [6710784.707978] Killed process 215701, UID 679, (java) total-vm:11017300kB, anon-rss:7152432kB, file-rss:1232kB

The above shows that the corresponding java process was killed by the system's OOM Killer, with a score of 854.

Explain the out of memory killer, which monitors the memory resource consumption of the machine. Before the machine runs out of memory, the mechanism will scan all processes (calculated according to certain rules, memory occupation, time, etc.), select the process with the highest score, and then kill it to protect the machine.

Dmesg log time conversion formula: log actual time = Greenwich 1970-01-01 + (current time seconds - seconds since system startup + log time printed by dmesg) seconds:

date -d "1970-01-01 UTC `echo "$(date +%s)-$(cat /proc/uptime|cut -f 1 -d' ')+12288812.926194"|bc ` seconds"

Tags: Java Linux

Posted by gte806e on Tue, 24 May 2022 02:09:42 +0300