📄 qemu-tech.html
字号:
because it is known at translation time.<P>In order to increase performances, a backward pass is performed on thegenerated simple instructions (see<CODE>target-i386/translate.c:optimize_flags()</CODE>). When it can be proved thatthe condition codes are not needed by the next instructions, nocondition codes are computed at all.<H2><A NAME="SEC12" HREF="qemu-tech.html#TOC12">2.5 CPU state optimisations</A></H2><P>The x86 CPU has many internal states which change the way it evaluatesinstructions. In order to achieve a good speed, the translation phaseconsiders that some state information of the virtual x86 CPU cannotchange in it. For example, if the SS, DS and ES segments have a zerobase, then the translator does not even generate an addition for thesegment base.<P>[The FPU stack pointer register is not handled that way yet].<H2><A NAME="SEC13" HREF="qemu-tech.html#TOC13">2.6 Translation cache</A></H2><P>A 16 MByte cache holds the most recently used translations. Forsimplicity, it is completely flushed when it is full. A translation unitcontains just a single basic block (a block of x86 instructionsterminated by a jump or by a virtual CPU state change which thetranslator cannot deduce statically).<H2><A NAME="SEC14" HREF="qemu-tech.html#TOC14">2.7 Direct block chaining</A></H2><P>After each translated basic block is executed, QEMU uses the simulatedProgram Counter (PC) and other cpu state informations (such as the CSsegment base value) to find the next basic block.<P>In order to accelerate the most common cases where the new simulated PCis known, QEMU can patch a basic block so that it jumps directly to thenext one.<P>The most portable code uses an indirect jump. An indirect jump makesit easier to make the jump target modification atomic. On some hostarchitectures (such as x86 or PowerPC), the <CODE>JUMP</CODE> opcode isdirectly patched so that the block chaining has no overhead.<H2><A NAME="SEC15" HREF="qemu-tech.html#TOC15">2.8 Self-modifying code and translated code invalidation</A></H2><P>Self-modifying code is a special challenge in x86 emulation because noinstruction cache invalidation is signaled by the application when codeis modified.<P>When translated code is generated for a basic block, the correspondinghost page is write protected if it is not already read-only (with thesystem call <CODE>mprotect()</CODE>). Then, if a write access is done to thepage, Linux raises a SEGV signal. QEMU then invalidates all thetranslated code in the page and enables write accesses to the page.<P>Correct translated code invalidation is done efficiently by maintaininga linked list of every translated block contained in a given page. Otherlinked lists are also maintained to undo direct block chaining. <P>Although the overhead of doing <CODE>mprotect()</CODE> calls is important,most MSDOS programs can be emulated at reasonnable speed with QEMU andDOSEMU.<P>Note that QEMU also invalidates pages of translated code when it detectsthat memory mappings are modified with <CODE>mmap()</CODE> or <CODE>munmap()</CODE>.<P>When using a software MMU, the code invalidation is more efficient: ifa given code page is invalidated too often because of write accesses,then a bitmap representing all the code inside the page isbuilt. Every store into that page checks the bitmap to see if the codereally needs to be invalidated. It avoids invalidating the code whenonly data is modified in the page.<H2><A NAME="SEC16" HREF="qemu-tech.html#TOC16">2.9 Exception support</A></H2><P>longjmp() is used when an exception such as division by zero isencountered. <P>The host SIGSEGV and SIGBUS signal handlers are used to get invalidmemory accesses. The exact CPU state can be retrieved because all thex86 registers are stored in fixed host registers. The simulated programcounter is found by retranslating the corresponding basic block and bylooking where the host program counter was at the exception point.<P>The virtual CPU cannot retrieve the exact <CODE>EFLAGS</CODE> register becausein some cases it is not computed because of condition codeoptimisations. It is not a big concern because the emulated code canstill be restarted in any cases.<H2><A NAME="SEC17" HREF="qemu-tech.html#TOC17">2.10 MMU emulation</A></H2><P>For system emulation, QEMU uses the mmap() system call to emulate thetarget CPU MMU. It works as long the emulated OS does not use an areareserved by the host OS (such as the area above 0xc0000000 on x86Linux).<P>In order to be able to launch any OS, QEMU also supports a softMMU. In that mode, the MMU virtual to physical address translation isdone at every memory access. QEMU uses an address translation cache tospeed up the translation.<P>In order to avoid flushing the translated code each time the MMUmappings change, QEMU uses a physically indexed translation cache. Itmeans that each basic block is indexed with its physical address. <P>When MMU mappings change, only the chaining of the basic blocks isreset (i.e. a basic block can no longer jump directly to another one).<H2><A NAME="SEC18" HREF="qemu-tech.html#TOC18">2.11 Hardware interrupts</A></H2><P>In order to be faster, QEMU does not check at every basic block if anhardware interrupt is pending. Instead, the user must asynchrouslycall a specific function to tell that an interrupt is pending. Thisfunction resets the chaining of the currently executing basicblock. It ensures that the execution will return soon in the main loopof the CPU emulator. Then the main loop can test if the interrupt ispending and handle it.<H2><A NAME="SEC19" HREF="qemu-tech.html#TOC19">2.12 User emulation specific details</A></H2><H3><A NAME="SEC20" HREF="qemu-tech.html#TOC20">2.12.1 Linux system call translation</A></H3><P>QEMU includes a generic system call translator for Linux. It means thatthe parameters of the system calls can be converted to fix theendianness and 32/64 bit issues. The IOCTLs are converted with a generictype description system (see <TT>`ioctls.h'</TT> and <TT>`thunk.c'</TT>).<P>QEMU supports host CPUs which have pages bigger than 4KB. It records allthe mappings the process does and try to emulated the <CODE>mmap()</CODE>system calls in cases where the host <CODE>mmap()</CODE> call would failbecause of bad page alignment.<H3><A NAME="SEC21" HREF="qemu-tech.html#TOC21">2.12.2 Linux signals</A></H3><P>Normal and real-time signals are queued along with their information(<CODE>siginfo_t</CODE>) as it is done in the Linux kernel. Then an interruptrequest is done to the virtual CPU. When it is interrupted, one queuedsignal is handled by generating a stack frame in the virtual CPU as theLinux kernel does. The <CODE>sigreturn()</CODE> system call is emulated to returnfrom the virtual signal handler.<P>Some signals (such as SIGALRM) directly come from the host. Othersignals are synthetized from the virtual CPU exceptions such as SIGFPEwhen a division by zero is done (see <CODE>main.c:cpu_loop()</CODE>).<P>The blocked signal mask is still handled by the host Linux kernel sothat most signal system calls can be redirected directly to the hostLinux kernel. Only the <CODE>sigaction()</CODE> and <CODE>sigreturn()</CODE> systemcalls need to be fully emulated (see <TT>`signal.c'</TT>).<H3><A NAME="SEC22" HREF="qemu-tech.html#TOC22">2.12.3 clone() system call and threads</A></H3><P>The Linux clone() system call is usually used to create a thread. QEMUuses the host clone() system call so that real host threads are createdfor each emulated thread. One virtual CPU instance is created for eachthread.<P>The virtual x86 CPU atomic operations are emulated with a global lock sothat their semantic is preserved.<P>Note that currently there are still some locking issues in QEMU. Inparticular, the translated cache flush is not protected yet againstreentrancy.<H3><A NAME="SEC23" HREF="qemu-tech.html#TOC23">2.12.4 Self-virtualization</A></H3><P>QEMU was conceived so that ultimately it can emulate itself. Althoughit is not very useful, it is an important test to show the power of theemulator.<P>Achieving self-virtualization is not easy because there may be addressspace conflicts. QEMU solves this problem by being an executable ELFshared object as the ld-linux.so ELF interpreter. That way, it can berelocated at load time.<H2><A NAME="SEC24" HREF="qemu-tech.html#TOC24">2.13 Bibliography</A></H2><DL COMPACT><DT><A NAME="BIB1">[1]</A><DD><A HREF="http://citeseer.nj.nec.com/piumarta98optimizing.html">http://citeseer.nj.nec.com/piumarta98optimizing.html</A>, Optimizingdirect threaded code by selective inlining (1998) by Ian Piumarta, FabioRiccardi.<DT><A NAME="BIB2">[2]</A><DD><A HREF="http://developer.kde.org/~sewardj/">http://developer.kde.org/~sewardj/</A>, Valgrind, an open-sourcememory debugger for x86-GNU/Linux, by Julian Seward.<DT><A NAME="BIB3">[3]</A><DD><A HREF="http://bochs.sourceforge.net/">http://bochs.sourceforge.net/</A>, the Bochs IA-32 Emulator Project,by Kevin Lawton et al.<DT><A NAME="BIB4">[4]</A><DD><A HREF="http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html">http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html</A>, the EM86x86 emulator on Alpha-Linux.<DT><A NAME="BIB5">[5]</A><DD><A HREF="http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf">http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf</A>,DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by AntonChernoff and Ray Hookway.<DT><A NAME="BIB6">[6]</A><DD><A HREF="http://www.willows.com/">http://www.willows.com/</A>, Windows API library emulation fromWillows Software.<DT><A NAME="BIB7">[7]</A><DD><A HREF="http://user-mode-linux.sourceforge.net/">http://user-mode-linux.sourceforge.net/</A>, The User-mode Linux Kernel.<DT><A NAME="BIB8">[8]</A><DD><A HREF="http://www.plex86.org/">http://www.plex86.org/</A>, The new Plex86 project.<DT><A NAME="BIB9">[9]</A><DD><A HREF="http://www.vmware.com/">http://www.vmware.com/</A>, The VMWare PC virtualizer.<DT><A NAME="BIB10">[10]</A><DD><A HREF="http://www.microsoft.com/windowsxp/virtualpc/">http://www.microsoft.com/windowsxp/virtualpc/</A>, The VirtualPC PC virtualizer.<DT><A NAME="BIB11">[11]</A><DD><A HREF="http://www.twoostwo.org/">http://www.twoostwo.org/</A>, The TwoOStwo PC virtualizer.</DL><H1><A NAME="SEC25" HREF="qemu-tech.html#TOC25">3. Regression Tests</A></H1><P>In the directory <TT>`tests/'</TT>, various interesting testing programsare available. There are used for regression testing.<H2><A NAME="SEC26" HREF="qemu-tech.html#TOC26">3.1 <TT>`test-i386'</TT></A></H2><P>This program executes most of the 16 bit and 32 bit x86 instructions andgenerates a text output. It can be compared with the output obtained witha real CPU or another emulator. The target <CODE>make test</CODE> runs thisprogram and a <CODE>diff</CODE> on the generated output.<P>The Linux system call <CODE>modify_ldt()</CODE> is used to create x86 selectorsto test some 16 bit addressing and 32 bit with segmentation cases.<P>The Linux system call <CODE>vm86()</CODE> is used to test vm86 emulation.<P>Various exceptions are raised to test most of the x86 user spaceexception reporting.<H2><A NAME="SEC27" HREF="qemu-tech.html#TOC27">3.2 <TT>`linux-test'</TT></A></H2><P>This program tests various Linux system calls. It is used to verifythat the system call parameters are correctly converted between targetand host CPUs.<H2><A NAME="SEC28" HREF="qemu-tech.html#TOC28">3.3 <TT>`qruncom.c'</TT></A></H2><P>Example of usage of <CODE>libqemu</CODE> to emulate a user mode i386 CPU.<H1><A NAME="SEC29" HREF="qemu-tech.html#TOC29">4. Index</A></H1><P>Jump to:<P><P><HR><P>This document was generated on 3 May 2006 using<A HREF="http://wwwinfo.cern.ch/dis/texi2html/">texi2html</A> 1.56k.</BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -