⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 internal.txt

📁 简单的虚拟机
💻 TXT
字号:
Internals of the My Virtual Machine========================================The MyVM is intended to be an MS-DOS excutables loader, which runs onthe GNU/Linux 2.6 kernels. MS-DOS format files are binary executables,so the VM is designed at machine code level. It simulates the 16-bitreal mode of x86. Currently only the MZ-format files are supported.		|-------------------attention-----------------|	|   Some functions of the VM have not been    |	|    implemented yet due to time reason.      |	|-------------------attention-----------------|Tested platform:	Intel Celeron M processor little endianOperating System:	Debian GNU/Linux 2.6.26Complier:		gcc version 4.3.2 (Debian 4.3.2-1)The little endian assumption is enforced.The VM is composed of modules listed following. The interfaces betweenthem look like:	+---cmdline.c---+				|		|			 main.c-+----init.c-----+---dbgdata.c---+		|		|				+-----run.c-----+---decode.c----+---mp/*.h---+main.c--------The main control.cmdline.c--------The Commond line parser is not fully implemented. Currently the VM onlyaccepted one argument which specifies the MS-DOS executable to be loaded.init.c--------This is the VM initializer. It validates the MS-DOS executable filespecified by Commond line parser. Then it reads the MZ header andrelocation table if any. Allocates memory according to the MZ header.Finally the initializer gets the CPU status ready to run the MZ file,ie, initialize general registers, point cs and ip register to code entry.It also sets the word at bottom of stack plus 1 to "0xCD20" to emulatethe first word of PSP in MS-DOS, and then set Data Segment "ds" to thisword. I take this trick for some programs to exit smoothly. If theexecutable file does not use stack, then it wouldn't exit in this way. run.c-------The main execution and exit of the MS-DOS file is here. It detects theexit flag "halt" which is manipulated by the decode engine, and increasethe cpu ticks.decode.c-------The is the main decoding engine. Every binary opcode is read from memoryand decoded. The engine is implemented in switch-like style. This willincrease the code size invisibly, but it's the most obvisous way. Thestructure of the engine is fairly clear.The purpose of the engine is to interpret the meaning of each machine code,so it has to behave like a disassembler, then emulating the execution ofthe code becomes possible. At the present time, only part of the instructionsets are decodable in MyVM, even only part of them are simulated. So onlysome simple DOS programs written in Intel-style assembly are tested. For moredetails of the decode engine, see sections below.Head files:=============cmdline.h-------Interface of cmdline.c, currently not entirely fulfilled.dbgdata.h-------Interface of dbgdata.cdecode.h-------Interface of decode.c, data struction definition of decode engine.The global hlt flag is defined here.enenv.h-------Execution enevironment definition. The CPU structure is defined here:eight 16-bit general-purpose registers, six 16-bit segment registers,instruction pointer ip, 16-bit flags, pointer to memory, etc. CPU isdefined as global variable here.init.h-----Interface of init.crun.h----Interface of run.cmp/arithmetic.h--------------Macro definitions of arithmetic manipulaton. Currently implementedinstructions: add, or, adc, sbb, and, sub, xor, cmp, inc, dec.mp/interrupt.h-------------Macro definitions of interruptions. Currently only software interrupt:20H, sub functions 00H,01H,02H,09H,0Ah,4CH of 21H implemented.mp/logicalshift.h----------------Macro definitions of logical shift manipulation.Implemented:rol, ror, rcl, rcr, shl, shr, sal, sar.mp/move.h-------Data transfer instructions.mp/pushpop.h----------Stack manipulation is implemented here.Most of the manipulations implemented in mp directory are in macro forms.Some are coded in gcc inline assembly.Some Technical issues======================MS-DOS MZ file header---------------------Note: all multi-byte values are stored LSB first. One block is 512 bytes,one paragraph is 16 bytes.Offset		(hex)		Meaning00-01		0x4d,0x5a	This is the "magic number" of an EXE file.02-03				The number of bytes in the last block of the					program that are actually used.04-05				Number of blocks in the file that are part					of the EXE file.08-09				Number of paragraphs in the header.0A-0B				Number of paragraphs of additional memory					that the program will need.0C-0D				Maximum number of paragraphs of additional memory.0E-0F				Relative value of the stack segment.10-11				Initial value of the SP register.12-13				Word checksum.14-15				Initial value of the IP register.16-17				Initial value of the CS register, relative to					the segment the program was loaded at.18-19				Offset of the first relocation item in the file.1A-1B				Overlay number. Normally zero, meaning that					it's the main program.Here is a structure that can be used to represend the EXE header andrelocation entries, assuming a 16-bit LSB machine:	struct HeaderRec	{		word magic;		word lpbytes;		word pages;		word relocs;		word hdsize;		word minparas;		word maxparas;		word ss;		word sp;		word chksum;		word ip;		word cs;		word roffset;		word overlay;	};	struct RelocRec	{		word offset;		word segment;	};	The offset of the beginning of the executable data is computed like this:		bytecodeStart = HeaderRec.hdsize * 16;About CPU structure-------------------CPU is the core hardware of a computer, the functions of CPU determine theperformance of a computer. Normally, cpu is composed of a controller andan arithmetic operation unit. Besides, cpu also includes some specific and general purpose registers. In MyVM, the 16-bit real mode of x86 is simulated.The CPU structure is represented as follows:		typedef unsigned short word;	struct general_reg	{		word	ax;		word	cx;		word	dx;		word	bx;		word	sp;		word	bp;		word	si;		word	di;	};	struct segment_reg	{		word	es;		word	cs;		word	ss;		word	ds;		word	fs;		word	gs;	};	struct CPU	{		struct general_reg greg;		struct segment_reg sreg;		word	ip;		int	flags;		bool	status;		byte*	ram;	};The ip is the instruction pointer, flags is the 16-bit flags, ram pointsto the memory allocated for the VM.Details about decode engine============================MyVM is designed to be at machine code level, thus, decoding is somethinglike disassembling. To be simple, the decode engine has to translate themachine code to understandable assembly mnemonics, then the VM is able toknow what an instruction does, what it operates, memory or registers, etc.For Intel Instruction Opcode format, refer to [1],"Chapter 2 INSTRUCTIONFORMAT".Introduction.------------Instruction format:See [1], Page 2-1, Figure 2-1.As to data structure description of instructions, I take the obvise structdefinition as follows:	struct INSTRUCTION	{		/* prefixes */		char RepeatPrefix; /* rep, repz...*/		char SegmentPrefix;		char OperandPrefix; /* byte, word */		char AddressPrefix; /* ptr word, ptr byte */		unsigned int	Opcode;		char	ModRM;		char	SIB;		unsigned int	Displacement;		unsigned int	Immediate;		unsigned int	LinearAddress;		/* dFlag: direction flag, indicating source or destination operand */		/* wFlag: bit width flag, indicating byte or word */		/* sFlag: */		char dFlag, wFlag, sFlag;	};Prefix decode-------------To be added.Opcode decode-------------A primary opcode can be 1,2, or 3 bytes in length. An additional 3-bit opcode fieldis sometimes encoded in the ModR/M byte. ModR/M decode-------------To be added.Reference:=========[1]. Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: InstructionSet Reference, A-M.[2]. Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: InstructionSet Reference, N-Z[3]. Programmer 程序员.2003.06-07. 一个小型虚拟机的实现. 高文强.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -