⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 memmult.html

📁 关于ARM汇编的非常好的教程
💻 HTML
📖 第 1 页 / 共 2 页
字号:
It gets better. <i>Every single memory reference</i> will be passed through the MMU. So we'llwant it to operate in nanaseconds. Faster, if possible.<br>In reality, it is somewhat easier as most typical machines don't have enough memory to fill theentire addressing space, indeed many are unlikely to get close on technical reasons (the RiscPCcan have 258Mb maximum RAM, or 514Mb with Kinetic - the extra 2Mb is the VRAM). Even so, the pagetables will get large.<p>So there are three options:<ul>  <li> Have a huge array of fast registers in the MMU. Costly. Very.  <li> Hold the page tables in main memory. Slow. Very.  <li> Compromise. Cache the active pages in the MMU, and store the rest on disc.</ul>An example. A RiscPC, 64Mb of RAM, 2Mb of VRAM, 4Mb of ROM and hardware I/O (double mapped).That's 734000320 bytes, or 17920 pages. It would take 71680 bytes to store each address. But anaddress on it's own isn't much use. Seven words comprise an entry in the ARM's MMU. So our 17920pages would require 501760 bytes in order to fully index the memory.<br>You just can't store that lot in the MMU. So you'll store a snippet, say 16K worth?, and keep therest in RAM.<p>&nbsp;<p><h2>The TLB</h2>The Translation Lookaside Buffer is a way to make paging even more responsive. Typically, aprogram will make heavy use of a few pages and barely touch the rest. Even if you plan to byteread the entire memory map, you will be making four thousand hits in one page before going tothe next.<br>A solution to this is to fit a little bit in the MMU that can map virtual addresses to theirphysical counterparts without traversing the page table. This is the TLB. It lives within the MMUand contains details of a small number of pages (usually between four and sixty four - the ARM610MMU TLB has thirty two entries).<br>Now, when we have a page lookup, we first pass our virtual address to the TLB which will checkall of the addresses stored, and the protection level. If a match is found, the TLB will spitout the physical address and the page table isn't touched.<br>If a miss is encountered, then the TLB will evict one of it's entries and load in the pageinformation looked up in the page table, so the TLB will know the new page requested, so it canquickly satisfy the result for the next memory access, as chances are the next access will be inthe page just requested.<p>So far we have figured on the hardware doing all of this, as in the ARM processor. Some RISCprocessors (such as the Alpha and the MIPS) will pass the TLB miss problem to the operatingsystem. This may allow the OS to use some intelligence to preload certain pages into the TLB.<p>&nbsp;<p><h2>Page size</h2>Users of an RISC OS 3.5 system running on an ARM610 with two or more large (say, 20Mb)applications running will know the value of a 4K page. Because it's bloody slow. To be fair, thisisn't the fault of the hardware, but more the WIMP doing stuff the kernel should do (as happensin RISC OS 3.7) and doing it slower!<p>Like with harddisc LFAUs, what you need is a sensible trade-off between page granulity and pagesize. You could reduce the wastage in memory by making pages small, say 256 bytes. But then youwould need a lot of memory to store the page table. A bigger page table, slower to scan throughit. Or you could have 64K pages, which make the page table small, but can waste huge amounts ofmemory.<br>To consider, a 32K program would require eight 4K pages, or <i>sixty four</i> 512 byte pages. Ifyour system remaps memory when shuffling pages around, it is quicker to move a smaller number oflarge pages than a larger number of small pages.<p>The MEMC in older RISC OS machines had a fixed page table. So the size of page depended upon howmuch memory was utilised.<center><table border="1" cellspacing="1" cellpadding="1">  <tr> <td><b>MEMORY</b><br> <td><b>PAGE SIZE</b><br>  <tr> <td>0.5Mb<br> <td>8K<br>  <tr> <td>1Mb<br> <td>8K<br>  <tr> <td>2Mb<br> <td>16K<br>  <tr> <td>4Mb<br> <td>32K</table></center><br>3Mb wasn't a valid option, and 4Mb is the limit. You can increase this by fitting a slave MEMC,in which case you are looking at 2 lots of 4Mb (invisible to the OS/user).<br>In a RiscPC, the MMU accesses a number of 4K pages. The limits are due, I suspect, to the systembus or memory system, not the MMU itself.<p>Most commercial systems use page sizes in the order 512 bytes to 64K.<br>The later ARM processors (ARM6 onwards) and the Intel Pentium both use page sizes of 4K.<p>&nbsp;<p><h2>Page replacement algorithms</h2>When a page fault occurs, the operating system has to pick a page to dump, to allow the requiredpage to be loaded. There are several ways that this may be achieved. None of these are perfect,they are a compromise of efficiency.<p><b>Not Recently Used</b><br>This requires two bits to be reserved in the page table, a bit for read/write and a bit forpage reference. Upon each access, the paging hardware (and it must be done in hardware forspeed) will set the bits as necessary. Then on a fixed interval the operating system will clearthese bits - either when idling or upon clock interrupt? This then allows you to track therecent page accesses, so when flushing out a page you can spot those that have not recently beenread/written or referenced. NRU would remove a page at random. While it is not the best way ofsorting out which pages to remove, it is simple and gives reasonably good results.<p><b>First-In First-Out</b><br>It is hoped you are familiar with the concept of FIFO, from buffering and the like. If you arenot, consider the lame analogy of the hose pipe in which the first water in will be the firstwater to come out the other end. It is rarely used, I'll leave the whys and wherefores as anexercise for the bemused reader. <tt>:-)</tt><p><b>Second Chance</b><br>A simple modification to the FIFO arrangement is to look at the access bit, and if it is zerothen we know the page is not in current use and can be thrown. If the bit is set, then the pageis shifted to the end of the page list as if it was a new access, and the page search continues.<br>What we are doing here is looking for a page unused since the last period (clock tick?). If bysome miracle ALL the pages are current and active, then Second Change will revert to FIFO.<p><b>Clock</b>Although Second Chance is good, all that page shuffling is inefficient so the pages are insteadreferenced in a circular list (ie, clock). If the page being examined in in use, we move on andlook at the next page. With no concept of the start and end of the list, we just keep going untilwe come to a usable page.<p><b>Least Recently Used</b><br>LRU is possible, but it isn't cheap. You maintain a list of all the pages, sorted by the mostrecently used at the front of the list, to the least recently used at the back. When you need apage, you pull the last entry and use it. Because of speed, this is only really possible inhardware as the list should be updated each memory access.<p><b>Not Frequently Used</b><br>In an attempt to simulate LRU in software, we can maintain something vaguely similar to LRU ina software implementation, in which the OS scans the available pages on each clock tick andincrememnts a counter (held in memory, one for each page) depending on the read/written bit.<br>Unfortunately, it doesn't forget. So code heavily used then no longer necessary (such as arendering core) will have a high count for quite a while. Then, code that is not called often butshould be all the more responsive, such as redraw code, will have a lower count and thus standthe possibility of being kicked out, even though the higher-rated renderer is no longer neededbut not kicked out as it's count is higher.<br>But this can be fixed, and the fix emulates LRU quite well. It is called aging. Just before thecount is incremented, it is shifted one bit to the right. So after a number of shifts the countwill be zero unless the bit is added. Here you might be wondering how adding a bit can work, ifyou've just shifted a bit off. The answer is simple. The added bit is added to the leftmostposition, ie most significant.<br>The make this clearer...<pre>   Once upon a time:     0 0 1 0 1 1   Clock tick      :     0 0 0 1 0 1   Clock tick      :     0 0 0 0 1 0   Memory accessed :     1 0 0 0 0 1   Clock tick      :     0 1 0 0 0 0   Memory accessed :     1 0 1 0 0 0</pre><p>&nbsp;<p><h2>Multitasking</h2>There is no such thing as <i>true</i> multitasking (despite what they may claim in the advocacynewsgroups). To multitask properly, you need a processor per process, with all the relevant bitsso processes are not kept waiting. Effectively, a seperate computer for each task.<p>However, it is possible to provide the illusion of running several things at once. In the olddays, things happened in the background under interrupt control. Keyboards were scanned, clockswere updated. As computers became more powerful, more stuff happened in the background. HugoFiennes wrote a soundtracker player that runs on interrupts, so works in the background. You setit going, it carries on independent of your code.<p>So people began to think of the ability to apply this to applications. After all, most of thetime an application is spent waiting for user input. In fact, the application may easily so sweetsod all for almost 100% of the time - measured by an event counter in Emily's polling loop, Itype ~1 character a second, the RiscPC polls a few hundred times a second. That was measured ina multitasking application, using polling speed as a yardstick. Imagine if we were to recordloops in a single-tasking program. So the idea was arrived at. We can load several programs intomemory, provide them some standard facilities and messaging systems, and then let them run for apredefined duration. When the duration is up, we pass control to the next program. When that hasused its time, we go to the next program, and so on.<br>As a brief aside, I wish to point out Schr&ouml;dinger's cat. A rather cute little moggy, but anextremely important one. It is physically impossible to measure system polling speed in software,and pretty difficult to measure it in hardware. You see, the very act of performing yourmeasurement will affect the results. And you cannot easily 'account' for the time taken to makeyour measurements because measuring yourself is subject to the same artefacts as when measuringother things. You can only say 'to hell with it', and have your program report your polling rateas being 379 polls/sec, knowing that your measuring code may be eating around 20% of theavailable time, and use the figures in a relative form rather than trying to state &quot;Mycomputer achieves 379 polls every second&quot;. While there is no untruth in that, your computermight do 450 if you weren't so busy watching! You simply can't be JAFO.<p>&nbsp;<p><h2>Co-operative multitasking</h2>One such way of multitasking is relatively clean and simple. The application, once control haspassed to it, has full control for as long as it needs. When it has finished, control isexplicitly passed back to the operating system.<br>This is the multitasking scheme used in RISC OS.<p>&nbsp;<p><h2>Pre-emptive multitasking</h2>Seen as the cure to all the world's ills by many advocates who have seen Linux (not Windows!),this works differently. Your application is given a timeslice. You can process whatever you wantin your timeslice. When your timeslice is up, control is wrested away and given to anotherprocess. You have no say in the matter, peon.<p>&nbsp;<p>&nbsp;<p>I don't wish to get into an advocacy war here. My personal preference is co-operative, however Idon't feel that either is the answer. Rather, a hybrid using both technologies could make for aclean system. The major drawback of CMT is that if an application dies and goes into anever-ending loop, control won't come back. The application needs to be forceably killed off.<br>Niall Douglas wrote a <a href="http://www.armature.net.au/~tornado">pre-emption system</a> <font color = "red" size = "-1">[EXTERNAL LINK]</font> forRISC OS applications. Surprisingly, you didn't really notice anything much until an applicationentered some heavy processing (say, ChangeFSI) at which point life carried right on as normalwhile the task which would have stalled the machine for a while chugged away in the background.<p><hr size = 3><a href="index.html#03">Return to assembler index</a><hr size = 3><address>Copyright &copy; 2001 Richard Murray</address></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -