打造一个简单的Java字节码反编译器
简介
本文示范了一种反编译Java字节码的方法,首先通过解析class文件,然后将解析的结果转成java代码。但是本文并没有覆盖所有的class文件的特性和指令,只针对部分规范进行解析。
所有的代码代码都是示范性的,追求功能实现,没有太多的软件工程方面的考量。
解析Class文件
本节定义一系列数据结构用来将二进制class数据用java代码来描述。并简述一些基本概念,由于class文件定义项非常多,如果要详细了解,请查看《ava虚拟机规范》 [https://docs.oracle.com/javase/specs/jvms/se8/html/]。
ClassFile
public class ClassFile {
private int magic;
private int minorVersion;
private int majorVersion;
private ConstantPool constantPool;
private AccessFlags accessFlags;
private int thisClass;
private int superClass;
private int interfacesCount;
private int interfaces[];
private int fieldsCount;
private FieldInfo fields[];
private int methodsCount;
private MethodInfo methods[];
private int attributesCount;
private Attribute attributes[];
}
ConstantPool
常量池中包含了类的所有符号信息,包括类名,方法名,常量等。常量池项包含了多种类型,每项使用一个tag来识别是哪个常量。定义基类如下:
public abstract class CPInfo {
protected ConstantPool constantPool;
}
具体常量池定义如下:
public class ConstantUtf8Info extends CPInfo {
private String value;
}
public class ConstantStringInfo extends CPInfo {
private int stringIndex;
}
public class ConstantClassInfo extends CPInfo {
private int nameIndex;
}
由于常量池类型多达10几种,这里不一一列出。具体参考《Java虚拟机规范》。定义一个ConstantPool类来简化对常量池的操作:
public class ConstantPool {
private int poolCount;
private CPInfo[] pool;
public ConstantPool(DataInputStream dataInputStream) throws IOException {
this.poolCount = dataInputStream.readUnsignedShort();
this.pool = new CPInfo[this.poolCount];
//注意,从下表为1开始访问常量池
for (int i = 1; i < this.poolCount; i++) {
int tag = dataInputStream.readUnsignedByte();
this.pool[i] = CPInfoFactory.getInstance().createCPInfo(tag, dataInputStream, this);
}
}
public int getPoolCount() {
return poolCount;
}
public <T extends CPInfo> T getCPInfo(int index) {
return (T) pool[index];
}
public ConstantUtf8Info getUtf8Info(int index) {
return (ConstantUtf8Info) pool[index];
}
}
FieldInfo
FieldInfo
用来描述类里的Field,定义如下:
class FieldInfo {
private int accessFlags; //修饰符号
private int nameIndex; //field名称常量在常量池中的索引
private int descriptorIndex;
private int attributesCount;
private Attribute attributeInfo[];
}
MethodInfo
用于描述类中方法的数据结构定义如下:
class MethodInfo {
private AccessFlags accessFlags;
private int nameIndex;
private int descriptorIndex;
private int attributesCount;
private Attribute attributes[];
}
Attribute
在ClassFile,FieldInfo,MethodInfo里面都定义了一个Attribute数组,Attribute类型也不少,本文只关注MethodInfo里面的CodeAttribute类型。这个类型包含了一个方法的操作数栈大小,本地变量表大小,指令码:
public class CodeAttribute extends Attribute {
private int maxStack;
private int maxLocals;
private int codeLength;
private byte code[];
private int exceptionTableLength;
private ExceptionData exceptionTable[];
private int attributeCount;
private Attribute attributes[];
}
Descriptor
Descriptor
是一个字符串,可以用来描述一个方法的参数,返回类型。如下:
(Ljava/lang/Object;[Ljava/lang/Object;)I
括号中表述参数,括号外表示返回类型。这个Descriptor可以解析成 :
int XXX(Object,Object);
L表示引用类型,I表示int类型,具体对应如下:
char2TypeMap.put('B', "byte");
char2TypeMap.put('C', "char");
char2TypeMap.put('D', "double");
char2TypeMap.put('F', "float");
char2TypeMap.put('I', "int");
char2TypeMap.put('J', "long");
char2TypeMap.put('S', "short");
char2TypeMap.put('Z', "boolean");
可以定义一个DescriptorParser
来解析Descriptor:
private List<TypeDescriptor> parameterTypeDescriptors;
private TypeDescriptor returnTypeDescriptor;
public DescriptorParser(String descriptor) {
this.parameterTypeDescriptors = new ArrayList<>();
char ch;
boolean isParsingParameter = true;
boolean isParsingArray = false;
boolean isParsingReference = false;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < descriptor.length(); i++) {
ch = descriptor.charAt(i);
if (isParsingParameter) {
switch (ch) {
case 'B':
case 'C':
case 'D':
case 'F':
case 'I':
case 'S':
case 'Z':
case 'J':
if (isParsingReference) {
sb.append(ch);
} else {
parameterTypeDescriptors.add(new TypeDescriptor(char2TypeMap.get(ch), isParsingArray));
isParsingArray = false;
}
break;
case 'L':
if (isParsingReference) {
sb.append(ch);
} else {
isParsingReference = true;
}
break;
case '/':
sb.append(".");
break;
case '[':
isParsingArray = true;
break;
case ';':
parameterTypeDescriptors.add(new TypeDescriptor(sb.toString(), isParsingArray));
isParsingArray = false;
isParsingReference = false;
sb = new StringBuilder();
break;
case '(':
break;
case ')':
isParsingParameter = false;
isParsingArray = false;
isParsingReference = false;
break;
default:
sb.append(ch);
}
} else {
switch (ch) {
case 'V':
returnTypeDescriptor = new TypeDescriptor("void", false);
break;
case 'B':
case 'C':
case 'D':
case 'F':
case 'I':
case 'S':
case 'Z':
case 'J':
if (isParsingReference) {
sb.append(ch);
} else {
returnTypeDescriptor = new TypeDescriptor(char2TypeMap.get(ch), isParsingArray);
isParsingArray = false;
}
break;
case 'L':
isParsingReference = true;
break;
case '/':
sb.append(".");
break;
case '[':
isParsingArray = true;
case ';':
returnTypeDescriptor = new TypeDescriptor(char2TypeMap.get(ch), isParsingArray);
break;
default:
sb.append(ch);
}
}
}
}
class文件的解析不复杂,但是比较繁琐,本文不全部列出,更多class文件的定义还是要参考《Java虚拟机规范》。
将ClassFile解析成Java代码
ClassFile对象解析出来后,可以开始生成Java代码了。首先构造class的头部:
//生成class头部:public class Test extends Base
private String generateClassHead() {
StringBuilder javaCode = new StringBuilder();
if (classFile.getAccessFlags().hasFlag(AccessFlags.ACC_PUBLIC)) {
javaCode.append("public ");
} else if (classFile.getAccessFlags().hasFlag(AccessFlags.ACC_PRIVATE)) {
javaCode.append("private ");
} else {
javaCode.append("protected ");
}
if (classFile.getAccessFlags().hasFlag(AccessFlags.ACC_INTERFACE)) {
javaCode.append("interface ");
} else {
javaCode.append("class ");
}
javaCode.append(this.className).append(" ");
//解析实现的接口名
if (classFile.getInterfaces().length > 0) {
javaCode.append(" implements ");
}
for (int i = 0; i < classFile.getInterfaces().length; i++) {
ConstantClassInfo interfaceClassInfo = classFile.getConstantPool().getCPInfo(classFile.getInterfaces()[i]);
javaCode.append(interfaceClassInfo.getName());
boolean isLast = i == (classFile.getInterfaces().length - 1);
if (!isLast) {
javaCode.append(",");
}
}
return javaCode.toString();
}
class的头部代码构造比较简单,最复杂的在于class body部分,本文只实现了MethodInfo的解析,也就是只生成方法,在构造class body之前,要先看一下,如果为MethodInfo生成代码,
首先解析方法头部,方法头部解析比较简单,就是拼凑方法modifiers,方法名称,方法参数,返回等:
private String generateMethodHeadCode(MethodInfo methodInfo, List<TypeDescriptor> parametersTypeDescriptors,
TypeDescriptor returnTypeDescriptor) {
StringBuilder javaCode = new StringBuilder();
AccessFlags accessFlags = methodInfo.getAccessFlags();
String methodName = classFile.getConstantPool().getUtf8Info(methodInfo.getNameIndex()).getValue();
javaCode.append(" ");
if (accessFlags.hasFlag(AccessFlags.ACC_PUBLIC)) {
javaCode.append("public ");
} else if (accessFlags.hasFlag(AccessFlags.ACC_PRIVATE)) {
javaCode.append("private ");
} else {
javaCode.append("protected ");
}
if (accessFlags.hasFlag(AccessFlags.ACC_STATIC)) {
javaCode.append("static ");
}
if (accessFlags.hasFlag(AccessFlags.ACC_FINAL)) {
javaCode.append("final ");
}
if (methodName.equals("<init>")) {
javaCode.append(className);
} else {
javaCode.append(returnTypeDescriptor.getName())
.append(" ")
.append(methodName);
}
javaCode.append("(");
for (int j = 0; j < parametersTypeDescriptors.size(); j++) {
TypeDescriptor parameterTypeDescriptor = parametersTypeDescriptors.get(j);
javaCode.append(parameterTypeDescriptor.getName());
if (parameterTypeDescriptor.isArray()) {
javaCode.append("[]");
}
javaCode.append(" var").append(j + 1);
boolean isLast = j == (parametersTypeDescriptors.size() - 1);
if (!isLast) {
javaCode.append(", ");
}
}
javaCode.append((") "));
return javaCode.toString();
}
解析方法body,方法body里面包含了具体的指令码。需要解析CodeAttribute:
private String generateMethodBodyCode(MethodInfo methodInfo, List<TypeDescriptor> parametersTypeDescriptors,
TypeDescriptor returnTypeDescriptor) throws IOException {
StringBuilder javaCode = new StringBuilder();
String currentMethodName = classFile.getConstantPool().getUtf8Info(methodInfo.getNameIndex()).getValue();
//寻找CodeAttribute
CodeAttribute codeAttribute = findAttribute(methodInfo, Attribute.Code);
if (codeAttribute == null) {
throw new RuntimeException("无法在Method里找到CodeAttribute");
}
Stack<Object> opStack = new Stack<>(/*codeAttribute.getMaxStack()*/);
List<String> localVariableNames = new ArrayList<>(codeAttribute.getMaxLocals());
//初始化本地变量表名,首先如果是实例方法,需要把this放入第一个,然后依次将方法参数名放入
boolean isStaticMethod = methodInfo.getAccessFlags().hasFlag(AccessFlags.ACC_STATIC);
if (!isStaticMethod) {
localVariableNames.add("this");
}
for (int x = 0; x < parametersTypeDescriptors.size(); x++) {
localVariableNames.add("var" + (x + 1));
}
DataInputStream byteCodeInputStream = new DataInputStream(new ByteArrayInputStream(codeAttribute.getCode()));
while (byteCodeInputStream.available() > 0) {
int opCode = byteCodeInputStream.readByte() & 0xff;
switch (opCode) {
case OP_aload_0:
System.out.println("aload_0");
opStack.push(localVariableNames.get(0));
break;
case OP_invokevirtual: {
int methodRefIndex = byteCodeInputStream.readUnsignedShort();
System.out.println("invokevirtual #" + methodRefIndex);
ConstantMethodRefInfo methodRefInfo = classFile.getConstantPool().getCPInfo(methodRefIndex);
ConstantNameAndTypeInfo nameAndTypeInfo = classFile.getConstantPool().getCPInfo(methodRefInfo.getNameAndTypeIndex());
String methodName = classFile.getConstantPool().getUtf8Info(nameAndTypeInfo.getNameIndex()).getValue();
String typeDescriptor = classFile.getConstantPool().getUtf8Info(nameAndTypeInfo.getDescriptorIndex()).getValue();
int methodParameterSize = new DescriptorParser(typeDescriptor).getParameterTypeDescriptors().size();
Object targetClassName;
Object parameterNames[] = new Object[methodParameterSize];
for (int x = 0; x < methodParameterSize; x++) {
parameterNames[methodParameterSize - x - 1] = opStack.pop();
}
targetClassName = opStack.pop();
StringBuilder line = new StringBuilder();
line.append(targetClassName).append(".").append(methodName).append("(");
for (int x = 0; x < methodParameterSize; x++) {
line.append(parameterNames[x]);
if ((x != methodParameterSize - 1)) {
line.append(",");
}
}
line.append(");");
opStack.push(line.toString());
break;
}
case OP_invokespecial: {
int methodRefIndex = byteCodeInputStream.readUnsignedShort();
System.out.println("invokespecial #" + methodRefIndex);
ConstantMethodRefInfo methodRefInfo = classFile.getConstantPool().getCPInfo(methodRefIndex);
ConstantNameAndTypeInfo nameAndTypeInfo = classFile.getConstantPool().getCPInfo(methodRefInfo.getNameAndTypeIndex());
String typeDescriptor = classFile.getConstantPool().getUtf8Info(nameAndTypeInfo.getDescriptorIndex()).getValue();
int methodParameterSize = new DescriptorParser(typeDescriptor).getParameterTypeDescriptors().size();
Object targetClassName;
Object parameterNames[] = new Object[methodParameterSize];
if (methodParameterSize > 0) {
for (int x = 0; x < methodParameterSize; x++) {
parameterNames[methodParameterSize - x - 1] = opStack.pop();
}
}
targetClassName = opStack.pop();
StringBuilder line = new StringBuilder();
if (currentMethodName.equals("<init>") && targetClassName.equals("this")) {
line.append("super");
} else {
line.append("new ").append(targetClassName);
}
line.append("(");
for (int x = 0; x < methodParameterSize; x++) {
line.append(parameterNames[x]);
if ((x != methodParameterSize - 1)) {
line.append(",");
}
}
line.append(");");
opStack.push(line.toString());
break;
}
case OP_getstatic:
System.out.println("getstatic");
break;
case OP_return:
System.out.println("return");
break;
case OP_new: {
int classIndex = byteCodeInputStream.readUnsignedShort();
System.out.println("new #" + classIndex);
ConstantClassInfo classInfo = classFile.getConstantPool().getCPInfo(classIndex);
opStack.push(classInfo.getName());
break;
}
case OP_dup:
System.out.println("dup");
Object top = opStack.pop();
opStack.push(top);
opStack.push(top);
break;
case OP_ldc:
int stringIndex = byteCodeInputStream.readByte() & 0xff;
System.out.println("ldc #" + stringIndex);
ConstantStringInfo stringInfo = classFile.getConstantPool().getCPInfo(stringIndex);
String value = classFile.getConstantPool().getUtf8Info(stringInfo.getStringIndex()).getValue();
opStack.push(value);
break;
case OP_iload_1:
System.out.println("iload_1");
opStack.push(localVariableNames.get(1));
break;
case OP_iload_2:
System.out.println("iload_2");
opStack.push(localVariableNames.get(2));
break;
case OP_iadd:
System.out.println("iadd");
opStack.push(opStack.pop() + "+" + opStack.pop());
break;
case OP_ireturn:
System.out.println("ireturn");
opStack.push("return " + opStack.pop());
break;
case OP_iconst_0:
System.out.println("iconst_0");
opStack.push("0");
break;
case OP_iconst_1:
System.out.println("iconst_1");
opStack.push("1");
break;
case OP_iconst_2:
System.out.println("iconst_2");
opStack.push("2");
break;
case OP_astore_1: {
System.out.println("astore_1");
String obj = opStack.pop().toString();
String className = opStack.pop().toString();
localVariableNames.add(1, "localVar1");
opStack.push(className + " localVar1=" + obj);
break;
}
case OP_astore_2: {
System.out.println("astore_2");
String obj = opStack.pop().toString();
String className = opStack.pop().toString();
localVariableNames.add(1, "localVar2");
opStack.push(className + " localVar2=" + obj);
break;
}
case OP_astore_3: {
System.out.println("astore_3");
String obj = opStack.pop().toString();
String className = opStack.pop().toString();
localVariableNames.add(1, "localVar3");
opStack.push(className + " localVar3=" + obj);
break;
}
case OP_aload_1:
System.out.println("aload_1");
opStack.push(localVariableNames.get(1));
break;
case OP_pop:
System.out.println("pop");
//opStack.pop();
break;
default:
throw new RuntimeException("Unknow opCode:0x" + opCode + " " + currentMethodName);
}
}
for (Object s : opStack) {
javaCode.append(" ").append(s).append("\r\n");
}
return javaCode.toString();
}
最后生成类的body:
private String generateClassBody() throws IOException {
StringBuilder javaCode = new StringBuilder();
javaCode.append(" { ").append(System.lineSeparator());
for (MethodInfo methodInfo : classFile.getMethods()) {
DescriptorParser descriptorParser = new DescriptorParser(methodInfo.getDescriptor());
javaCode.append(generateMethodHeadCode(methodInfo, descriptorParser.getParameterTypeDescriptors(), descriptorParser.getReturnTypeDescriptor()));
javaCode.append("{ ").append(System.lineSeparator());
javaCode.append(generateMethodBodyCode(methodInfo, descriptorParser.getParameterTypeDescriptors(), descriptorParser.getReturnTypeDescriptor()));
javaCode.append(System.lineSeparator()).append(" }").append(System.lineSeparator());
}
javaCode.append(System.lineSeparator()).append("}").append(System.lineSeparator());
return javaCode.toString();
}
总结
本文涉及的代码只实现了class规范的一部分,并不能反编译所有的class文件(需要补全未识别的指令),只测试了以下类:
public class Test {
public Test(String s) {
}
public int sum(int i, int j) {
return i + j;
}
public int search(Object o, Object[] objects) {
return 0;
}
public static void main(String[] args) {
Test test = new Test("hello");
test.sum(1, 2);
Object o = new Object();
}
}
上述对应的字节码如下:
public class com.mypackage.Test
minor version: 0
major version: 52
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #6.#34 // java/lang/Object."<init>":()V
#2 = Class #35 // com/mypackage/Test
#3 = String #36 // hello
#4 = Methodref #2.#37 // com/mypackage/Test."<init>":(Ljava/lang/String;)V
#5 = Methodref #2.#38 // com/mypackage/Test.sum:(II)I
#6 = Class #39 // java/lang/Object
#7 = Utf8 <init>
#8 = Utf8 (Ljava/lang/String;)V
#9 = Utf8 Code
#10 = Utf8 LineNumberTable
#11 = Utf8 LocalVariableTable
#12 = Utf8 this
#13 = Utf8 Lcom/mypackage/Test;
#14 = Utf8 s
#15 = Utf8 Ljava/lang/String;
#16 = Utf8 sum
#17 = Utf8 (II)I
#18 = Utf8 i
#19 = Utf8 I
#20 = Utf8 j
#21 = Utf8 search
#22 = Utf8 (Ljava/lang/Object;[Ljava/lang/Object;)I
#23 = Utf8 o
#24 = Utf8 Ljava/lang/Object;
#25 = Utf8 objects
#26 = Utf8 [Ljava/lang/Object;
#27 = Utf8 main
#28 = Utf8 ([Ljava/lang/String;)V
#29 = Utf8 args
#30 = Utf8 [Ljava/lang/String;
#31 = Utf8 test
#32 = Utf8 SourceFile
#33 = Utf8 Test.java
#34 = NameAndType #7:#40 // "<init>":()V
#35 = Utf8 com/mypackage/Test
#36 = Utf8 hello
#37 = NameAndType #7:#8 // "<init>":(Ljava/lang/String;)V
#38 = NameAndType #16:#17 // sum:(II)I
#39 = Utf8 java/lang/Object
#40 = Utf8 ()V
{
public com.mypackage.Test(java.lang.String);
descriptor: (Ljava/lang/String;)V
flags: ACC_PUBLIC
Code:
stack=1, locals=2, args_size=2
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 8: 0
line 10: 4
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lcom/mypackage/Test;
0 5 1 s Ljava/lang/String;
public int sum(int, int);
descriptor: (II)I
flags: ACC_PUBLIC
Code:
stack=2, locals=3, args_size=3
0: iload_1
1: iload_2
2: iadd
3: ireturn
LineNumberTable:
line 13: 0
LocalVariableTable:
Start Length Slot Name Signature
0 4 0 this Lcom/mypackage/Test;
0 4 1 i I
0 4 2 j I
public int search(java.lang.Object, java.lang.Object[]);
descriptor: (Ljava/lang/Object;[Ljava/lang/Object;)I
flags: ACC_PUBLIC
Code:
stack=1, locals=3, args_size=3
0: iconst_0
1: ireturn
LineNumberTable:
line 17: 0
LocalVariableTable:
Start Length Slot Name Signature
0 2 0 this Lcom/mypackage/Test;
0 2 1 o Ljava/lang/Object;
0 2 2 objects [Ljava/lang/Object;
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=3, locals=3, args_size=1
0: new #2 // class com/mypackage/Test
3: dup
4: ldc #3 // String hello
6: invokespecial #4 // Method "<init>":(Ljava/lang/String;)V
9: astore_1
10: aload_1
11: iconst_1
12: iconst_2
13: invokevirtual #5 // Method sum:(II)I
16: pop
17: new #6 // class java/lang/Object
20: dup
21: invokespecial #1 // Method java/lang/Object."<init>":()V
24: astore_2
25: return
LineNumberTable:
line 21: 0
line 22: 10
line 23: 17
line 24: 25
LocalVariableTable:
Start Length Slot Name Signature
0 26 0 args [Ljava/lang/String;
10 16 1 test Lcom/mypackage/Test;
25 1 2 o Ljava/lang/Object;
}
SourceFile: "Test.java"
反编译的结果如下:
public class Test {
public Test(java.lang.String var1) {
super();
}
public int sum(int var1, int var2) {
return var2+var1
}
public int search(java.lang.Object var1, java.lang.Object[] var2) {
return 0
}
public static void main(java.lang.String[] var1) {
com.mypackage.Test localVar1=new com.mypackage.Test(hello);
localVar1.sum(1,2);
java.lang.Object localVar2=new java.lang.Object();
}
}
最新文章
- C++中的事件分发
- Springboot快速入门创建
- js中的一些容易混淆的方法!
- VS2010运行正常的控制台程序在VS2015中出现乱码的解决方法
- 关于LR中的EXTRARES
- Linux下nl命令的用法详解
- OSGI在Eclipse中执行-console出错的问题
- Selenium2+Python自动化测试实战
- meteor学习
- 【Linux】lvm基础操作
- role &;#39;PLUSTRACE&;#39; does not exist
- 系统管理员必须掌握的20个Linux监控工具
- ntopng-一款流量审计框架的安装以及应用
- css 滚动视差 之 水波纹效果
- 转://oracle deadlock死锁trace file分析之一
- μC/Probe尝鲜
- Flutter - 自动引用pub.dartlang.org/packages上最新的packages
- linux内核分析第二周
- Atitit vue.js 把ajax数据 绑定到form表单
- URI Scheme注册伪协议实现远程命令执行