📄 io.txt
字号:
This is the base of a paper I wrote with Horst.It provides a good introduction to Dillo's internals.(Highly recommended if you plan to patch or develop in Dillo)--Jcid-----------------------------------------------------Paralell network programming of the Dillo web browser----------------------------------------------------- Jorge Arellano-Cid <jcid@inf.utfsm.cl> Horst H. von Brand <vonbrand@inf.utfsm.cl>--------Abstract-------- Network programs face several delay sources when sending orretrieving data. This is particularly problematic in programswhich interact directly with the user, most notably web browsers.We present a hybrid approach using threads communicated throughpipes and signal driven I/O, which allows a non-blocking mainthread and overlapping waiting times.------------Introduction------------ The Dillo project didn't start from scratch but mainly workingon the code base of gzilla (a light web browser written by RaphLevien). As the project went by, the code of the whole source wasstandardized, and the networking engine was replaced with a new,faster design. The source code is currently in alpha test, and isavailable at <http://dillo.sourceforge.net> under the GNU GeneralPublic License. This paper covers basic design aspects of the hybrid approachthat the Dillo web browser uses to solve several latencyproblems. After introducing the main delay-sources, the mainpoints of the hybrid design will be addressed.-------------Delay sources------------- Network programs face several delay-sources while sending orretrieving data. In the particular case of a web browser, theyare found in: DNS querying: The time required to solve a name. Initiating the TCP connection: The three way handshake of the TCP protocol. Sending the query: The time spent uploading queries to the remote server. Retrieving data: The time spent expecting and receiving the query answer. Closing the TCP connection: The four packet-sending closing sequence of the TCP protocol. In a WAN context, every single item of this list has anassociated delay that is non deterministic and often measured inseconds. If we add several connections per browsed page (each onerequiring at least the 4 last steps), the total latency can beconsiderable.-----------------------------------The traditional (blocking) approach----------------------------------- The main problems with the blocking approach are: When issuing an operation that can't be completed immediately, the process is put to sleep waiting for completion, and the program doesn't do any other processing in the meantime. When waiting for a specific socket operation to complete, packets that belong to other connections may be arriving, and have to wait for service. Web browsers handle many small transactions, if waiting times are not overlapped the latency perceived by the user can be very annoying. If the user interface is just put to sleep during network operations, the program becomes unresponsive, confusing and perhaps alarming the user. Not overlapping waiting times and processing makes graphical rendering (which is arguably the central function of a browser) unnecessarily slow.---------------------Dillo's hybrid design--------------------- Dillo uses threads and signal driven I/O extensively tooverlap waiting times and computation. Handling the userinterface in a thread that never blocks gives a good interactive``feel.'' The use of GTK+, a sophisticated widget framework forgraphical user interfaces, helped very much to accomplish thisgoal. All the interface, rendering and I/O engine was built uponits facilities. The design is said to be ``hybrid'' because it uses threadsfor DNS querying and reading local files, and signal driven I/Ofor TCP connections. The threaded DNS scheme is potentiallyconcurrent (this depends on underlying hardware), while the I/Ohandling (both local files and remote connections) isdefinitively parallel. To simplify the structure of the browser, local files areencapsulated into HTTP streams and presented to the rest of thebrowser as such, in exactly the same way a remote connection ishandled. To create this illusion, a thread is launched. Thisthread opens a pipe to the browser, it then synthesizes anappropriate HTTP header, sends it together with the file to thebrowser proper. In this way, all the browser sees is a handle,the data on it can come from a remote connection or from a localfile. To handle a remote connection is more complex. In this case,the browser asks the cache manager for the URL. The name in theURL has to be resolved through the DNS engine, a socket TCPconnection must be established, the HTTP request has to be sent,and finally the result retrieved. Each of the steps mentionedcould give rise to errors, which have to be handled and somehowcommunicated to the rest of the program. For performance reasons,it is critical that responses are cached locally, so the remoteconnection doesn't directly hand over the data to the browser;the response is passed to the cache manager which then relays itto the rest of the browser. The DNS engine caches DNS responses,and either answers them from the cache or by querying the DNS.Querying is done in a separate thread, so that the rest of thebrowser isn't blocked by long waits here. The activities mentioned do not happen strictly in the orderstated above. It is even possible that several URLs are beinghandled at the same time, in order to overlap waiting anddownloading. The functions called directly from the userinterface have to return quickly to maintain interactiveresponse. Sometimes they return connection handlers that haven'tbeen completely set up yet. As stated, I/O is signal-driven, whenone of the descriptors is ready for data transfer (reading orwriting), it wakes up the I/O engine. Data transfer between threads inside the browser is handled bypipes, shared memory is little used. This almost obviates theneed for explicit synchronization, which is one of the main areasof complexity and bugs in concurrent programs. Dillo handles itsthreads in a way that its developers can think of it as runningon a single thread of control. This is accomplished by making theDNS engine call-backs happen within the main thread, and byisolating file loading with pipes. Using threads in this way has three big advantages: The browser doesn't block when one of its child threads blocks. In particular, the user interface is responsive even while resolving a name or downloading a file. Developers don't need to deal with complex concurrent concerns. Concurrency is hard to handle, and few developers are adept at this. This gives access a much larger pool of potential developers, something which can be critical in an open-source development project. By making the code mostly sequential, debugging the code with traditional tools like gdb is possible. Debugging parallel programs is very hard, and appropriate tools are hard to come by. Because of simplicity and portability concerns, DNS queryingis done in a separate thread. The standard C library doesn'tprovide a function for making DNS queries that don't block. Thealternative is to implement a new, custom DNS querying functionthat doesn't block. This is certainly a complex task, integratingthis mechanism into the thread structure of the program is muchsimpler. Using a thread and a pipe to read a local file adds abuffering step to the process (and a certain latency), but it hasa couple of significative advantages: By handling local files in the same way as remote connections, a significant amount of code is reused. A preprocessing step of the file data can be added easily, if needed. In fact, the file is encapsulated into an HTTP data stream.-----------DNS queries----------- Dillo handles DNS queries with threads, letting a child threadwait until the DNS server answers the request. When the answerarrives, a call-back function is called, and the program resumeswhat it was doing at DNS-request time. The interesting thing isthat the call-back happens in the main thread, while the childthread simply exits when done. This is implemented through aserver-channel design. The server channel------------------ There is one thread for each channel, and each channel canhave multiple clients. When the program requests an IP address,the server first looks for a cached match; if it hits, the clientcall-back is invoked immediately, but if not, the client is putinto a queue, a thread is spawned to query the DNS, and a GTK+idle client is set to poll the channel 5~times per second forcompletion, and when it finally succeeds, every client of thatchannel is serviced. This scheme allows all the further processing to continue onthe same thread it began: the main thread.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -