📄 tcp.c

📁 《嵌入式系统设计与实例开发实验教材二源码》Linux内核移植与编译实验
💻 C
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
/* * INET		An implementation of the TCP/IP protocol suite for the LINUX *		operating system.  INET is implemented using the  BSD Socket *		interface as the means of communication with the user level. * *		Implementation of the Transmission Control Protocol(TCP). * * Version:	$Id: tcp.c,v 1.215 2001/10/31 08:17:58 davem Exp $ * * Authors:	Ross Biro, <bir7@leland.Stanford.Edu> *		Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> *		Mark Evans, <evansmp@uhura.aston.ac.uk> *		Corey Minyard <wf-rch!minyard@relay.EU.net> *		Florian La Roche, <flla@stud.uni-sb.de> *		Charles Hedrick, <hedrick@klinzhai.rutgers.edu> *		Linus Torvalds, <torvalds@cs.helsinki.fi> *		Alan Cox, <gw4pts@gw4pts.ampr.org> *		Matthew Dillon, <dillon@apollo.west.oic.com> *		Arnt Gulbrandsen, <agulbra@nvg.unit.no> *		Jorge Cwik, <jorge@laser.satlink.net> * * Fixes: *		Alan Cox	:	Numerous verify_area() calls *		Alan Cox	:	Set the ACK bit on a reset *		Alan Cox	:	Stopped it crashing if it closed while *					sk->inuse=1 and was trying to connect *					(tcp_err()). *		Alan Cox	:	All icmp error handling was broken *					pointers passed where wrong and the *					socket was looked up backwards. Nobody *					tested any icmp error code obviously. *		Alan Cox	:	tcp_err() now handled properly. It *					wakes people on errors. poll *					behaves and the icmp error race *					has gone by moving it into sock.c *		Alan Cox	:	tcp_send_reset() fixed to work for *					everything not just packets for *					unknown sockets. *		Alan Cox	:	tcp option processing. *		Alan Cox	:	Reset tweaked (still not 100%) [Had *					syn rule wrong] *		Herp Rosmanith  :	More reset fixes *		Alan Cox	:	No longer acks invalid rst frames. *					Acking any kind of RST is right out. *		Alan Cox	:	Sets an ignore me flag on an rst *					receive otherwise odd bits of prattle *					escape still *		Alan Cox	:	Fixed another acking RST frame bug. *					Should stop LAN workplace lockups. *		Alan Cox	: 	Some tidyups using the new skb list *					facilities *		Alan Cox	:	sk->keepopen now seems to work *		Alan Cox	:	Pulls options out correctly on accepts *		Alan Cox	:	Fixed assorted sk->rqueue->next errors *		Alan Cox	:	PSH doesn't end a TCP read. Switched a *					bit to skb ops. *		Alan Cox	:	Tidied tcp_data to avoid a potential *					nasty. *		Alan Cox	:	Added some better commenting, as the *					tcp is hard to follow *		Alan Cox	:	Removed incorrect check for 20 * psh *	Michael O'Reilly	:	ack < copied bug fix. *	Johannes Stille		:	Misc tcp fixes (not all in yet). *		Alan Cox	:	FIN with no memory -> CRASH *		Alan Cox	:	Added socket option proto entries. *					Also added awareness of them to accept. *		Alan Cox	:	Added TCP options (SOL_TCP) *		Alan Cox	:	Switched wakeup calls to callbacks, *					so the kernel can layer network *					sockets. *		Alan Cox	:	Use ip_tos/ip_ttl settings. *		Alan Cox	:	Handle FIN (more) properly (we hope). *		Alan Cox	:	RST frames sent on unsynchronised *					state ack error. *		Alan Cox	:	Put in missing check for SYN bit. *		Alan Cox	:	Added tcp_select_window() aka NET2E *					window non shrink trick. *		Alan Cox	:	Added a couple of small NET2E timer *					fixes *		Charles Hedrick :	TCP fixes *		Toomas Tamm	:	TCP window fixes *		Alan Cox	:	Small URG fix to rlogin ^C ack fight *		Charles Hedrick	:	Rewrote most of it to actually work *		Linus		:	Rewrote tcp_read() and URG handling *					completely *		Gerhard Koerting:	Fixed some missing timer handling *		Matthew Dillon  :	Reworked TCP machine states as per RFC *		Gerhard Koerting:	PC/TCP workarounds *		Adam Caldwell	:	Assorted timer/timing errors *		Matthew Dillon	:	Fixed another RST bug *		Alan Cox	:	Move to kernel side addressing changes. *		Alan Cox	:	Beginning work on TCP fastpathing *					(not yet usable) *		Arnt Gulbrandsen:	Turbocharged tcp_check() routine. *		Alan Cox	:	TCP fast path debugging *		Alan Cox	:	Window clamping *		Michael Riepe	:	Bug in tcp_check() *		Matt Dillon	:	More TCP improvements and RST bug fixes *		Matt Dillon	:	Yet more small nasties remove from the *					TCP code (Be very nice to this man if *					tcp finally works 100%) 8) *		Alan Cox	:	BSD accept semantics. *		Alan Cox	:	Reset on closedown bug. *	Peter De Schrijver	:	ENOTCONN check missing in tcp_sendto(). *		Michael Pall	:	Handle poll() after URG properly in *					all cases. *		Michael Pall	:	Undo the last fix in tcp_read_urg() *					(multi URG PUSH broke rlogin). *		Michael Pall	:	Fix the multi URG PUSH problem in *					tcp_readable(), poll() after URG *					works now. *		Michael Pall	:	recv(...,MSG_OOB) never blocks in the *					BSD api. *		Alan Cox	:	Changed the semantics of sk->socket to *					fix a race and a signal problem with *					accept() and async I/O. *		Alan Cox	:	Relaxed the rules on tcp_sendto(). *		Yury Shevchuk	:	Really fixed accept() blocking problem. *		Craig I. Hagan  :	Allow for BSD compatible TIME_WAIT for *					clients/servers which listen in on *					fixed ports. *		Alan Cox	:	Cleaned the above up and shrank it to *					a sensible code size. *		Alan Cox	:	Self connect lockup fix. *		Alan Cox	:	No connect to multicast. *		Ross Biro	:	Close unaccepted children on master *					socket close. *		Alan Cox	:	Reset tracing code. *		Alan Cox	:	Spurious resets on shutdown. *		Alan Cox	:	Giant 15 minute/60 second timer error *		Alan Cox	:	Small whoops in polling before an *					accept. *		Alan Cox	:	Kept the state trace facility since *					it's handy for debugging. *		Alan Cox	:	More reset handler fixes. *		Alan Cox	:	Started rewriting the code based on *					the RFC's for other useful protocol *					references see: Comer, KA9Q NOS, and *					for a reference on the difference *					between specifications and how BSD *					works see the 4.4lite source. *		A.N.Kuznetsov	:	Don't time wait on completion of tidy *					close. *		Linus Torvalds	:	Fin/Shutdown & copied_seq changes. *		Linus Torvalds	:	Fixed BSD port reuse to work first syn *		Alan Cox	:	Reimplemented timers as per the RFC *					and using multiple timers for sanity. *		Alan Cox	:	Small bug fixes, and a lot of new *					comments. *		Alan Cox	:	Fixed dual reader crash by locking *					the buffers (much like datagram.c) *		Alan Cox	:	Fixed stuck sockets in probe. A probe *					now gets fed up of retrying without *					(even a no space) answer. *		Alan Cox	:	Extracted closing code better *		Alan Cox	:	Fixed the closing state machine to *					resemble the RFC. *		Alan Cox	:	More 'per spec' fixes. *		Jorge Cwik	:	Even faster checksumming. *		Alan Cox	:	tcp_data() doesn't ack illegal PSH *					only frames. At least one pc tcp stack *					generates them. *		Alan Cox	:	Cache last socket. *		Alan Cox	:	Per route irtt. *		Matt Day	:	poll()->select() match BSD precisely on error *		Alan Cox	:	New buffers *		Marc Tamsky	:	Various sk->prot->retransmits and *					sk->retransmits misupdating fixed. *					Fixed tcp_write_timeout: stuck close, *					and TCP syn retries gets used now. *		Mark Yarvis	:	In tcp_read_wakeup(), don't send an *					ack if state is TCP_CLOSED. *		Alan Cox	:	Look up device on a retransmit - routes may *					change. Doesn't yet cope with MSS shrink right *					but its a start! *		Marc Tamsky	:	Closing in closing fixes. *		Mike Shaver	:	RFC1122 verifications. *		Alan Cox	:	rcv_saddr errors. *		Alan Cox	:	Block double connect(). *		Alan Cox	:	Small hooks for enSKIP. *		Alexey Kuznetsov:	Path MTU discovery. *		Alan Cox	:	Support soft errors. *		Alan Cox	:	Fix MTU discovery pathological case *					when the remote claims no mtu! *		Marc Tamsky	:	TCP_CLOSE fix. *		Colin (G3TNE)	:	Send a reset on syn ack replies in *					window but wrong (fixes NT lpd problems) *		Pedro Roque	:	Better TCP window handling, delayed ack. *		Joerg Reuter	:	No modification of locked buffers in *					tcp_do_retransmit() *		Eric Schenk	:	Changed receiver side silly window *					avoidance algorithm to BSD style *					algorithm. This doubles throughput *					against machines running Solaris, *					and seems to result in general *					improvement. *	Stefan Magdalinski	:	adjusted tcp_readable() to fix FIONREAD *	Willy Konynenberg	:	Transparent proxying support. *	Mike McLagan		:	Routing by source *		Keith Owens	:	Do proper merging with partial SKB's in *					tcp_do_sendmsg to avoid burstiness. *		Eric Schenk	:	Fix fast close down bug with *					shutdown() followed by close(). *		Andi Kleen 	:	Make poll agree with SIGIO *	Salvatore Sanfilippo	:	Support SO_LINGER with linger == 1 and *					lingertime == 0 (RFC 793 ABORT Call) *					 *		This program is free software; you can redistribute it and/or *		modify it under the terms of the GNU General Public License *		as published by the Free Software Foundation; either version *		2 of the License, or(at your option) any later version. * * Description of States: * *	TCP_SYN_SENT		sent a connection request, waiting for ack * *	TCP_SYN_RECV		received a connection request, sent ack, *				waiting for final ack in three-way handshake. * *	TCP_ESTABLISHED		connection established * *	TCP_FIN_WAIT1		our side has shutdown, waiting to complete *				transmission of remaining buffered data * *	TCP_FIN_WAIT2		all buffered data sent, waiting for remote *				to shutdown * *	TCP_CLOSING		both sides have shutdown but we still have *				data we have to finish sending * *	TCP_TIME_WAIT		timeout to catch resent junk before entering *				closed, can only be entered from FIN_WAIT2 *				or CLOSING.  Required because the other end *				may not have gotten our last ACK causing it *				to retransmit the data packet (which we ignore) * *	TCP_CLOSE_WAIT		remote side has shutdown and is waiting for *				us to finish writing our data and to shutdown *				(we have to close() to move on to LAST_ACK) * *	TCP_LAST_ACK		out side has shutdown after remote has *				shutdown.  There may still be data in our *				buffer that we have to finish sending * *	TCP_CLOSE		socket is finished */#include <linux/config.h>#include <linux/types.h>#include <linux/fcntl.h>#include <linux/poll.h>#include <linux/init.h>#include <linux/smp_lock.h>#include <net/icmp.h>#include <net/tcp.h>#include <asm/uaccess.h>#include <asm/ioctls.h>int sysctl_tcp_fin_timeout = TCP_FIN_TIMEOUT;struct tcp_mib	tcp_statistics[NR_CPUS*2];kmem_cache_t *tcp_openreq_cachep;kmem_cache_t *tcp_bucket_cachep;kmem_cache_t *tcp_timewait_cachep;atomic_t tcp_orphan_count = ATOMIC_INIT(0);int sysctl_tcp_mem[3];int sysctl_tcp_wmem[3] = { 4*1024, 16*1024, 128*1024 };int sysctl_tcp_rmem[3] = { 4*1024, 87380, 87380*2 };atomic_t tcp_memory_allocated;	/* Current allocated memory. */atomic_t tcp_sockets_allocated;	/* Current number of TCP sockets. *//* Pressure flag: try to collapse. * Technical note: it is used by multiple contexts non atomically. * All the tcp_mem_schedule() is of this nature: accounting * is strict, actions are advisory and have some latency. */int tcp_memory_pressure;#define TCP_PAGES(amt) (((amt)+TCP_MEM_QUANTUM-1)/TCP_MEM_QUANTUM)int tcp_mem_schedule(struct sock *sk, int size, int kind){	int amt = TCP_PAGES(size);	sk->forward_alloc += amt*TCP_MEM_QUANTUM;	atomic_add(amt, &tcp_memory_allocated);	/* Under limit. */	if (atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {		if (tcp_memory_pressure)			tcp_memory_pressure = 0;		return 1;	}	/* Over hard limit. */	if (atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2]) {		tcp_enter_memory_pressure();		goto suppress_allocation;	}	/* Under pressure. */	if (atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[1])		tcp_enter_memory_pressure();	if (kind) {		if (atomic_read(&sk->rmem_alloc) < sysctl_tcp_rmem[0])			return 1;	} else {		if (sk->wmem_queued < sysctl_tcp_wmem[0])			return 1;	}	if (!tcp_memory_pressure ||	    sysctl_tcp_mem[2] > atomic_read(&tcp_sockets_allocated)	    * TCP_PAGES(sk->wmem_queued+atomic_read(&sk->rmem_alloc)+			sk->forward_alloc))		return 1;suppress_allocation:	if (kind == 0) {		tcp_moderate_sndbuf(sk);		/* Fail only if socket is _under_ its sndbuf.		 * In this case we cannot block, so that we have to fail.		 */		if (sk->wmem_queued+size >= sk->sndbuf)			return 1;	}	/* Alas. Undo changes. */	sk->forward_alloc -= amt*TCP_MEM_QUANTUM;	atomic_sub(amt, &tcp_memory_allocated);	return 0;}void __tcp_mem_reclaim(struct sock *sk){	if (sk->forward_alloc >= TCP_MEM_QUANTUM) {		atomic_sub(sk->forward_alloc/TCP_MEM_QUANTUM, &tcp_memory_allocated);		sk->forward_alloc &= (TCP_MEM_QUANTUM-1);		if (tcp_memory_pressure &&		    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0])			tcp_memory_pressure = 0;	}}void tcp_rfree(struct sk_buff *skb){	struct sock *sk = skb->sk;	atomic_sub(skb->truesize, &sk->rmem_alloc);	sk->forward_alloc += skb->truesize;}/* * LISTEN is a special case for poll.. */static __inline__ unsigned int tcp_listen_poll(struct sock *sk, poll_table *wait){	return sk->tp_pinfo.af_tcp.accept_queue ? (POLLIN | POLLRDNORM) : 0;}/* *	Wait for a TCP event. * *	Note that we don't need to lock the socket, as the upper poll layers *	take care of normal races (between the test and the event) and we don't *	go look at any of the socket buffers directly. */unsigned int tcp_poll(struct file * file, struct socket *sock, poll_table *wait){	unsigned int mask;	struct sock *sk = sock->sk;	struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);	poll_wait(file, sk->sleep, wait);	if (sk->state == TCP_LISTEN)		return tcp_listen_poll(sk, wait);	/* Socket is not locked. We are protected from async events	   by poll logic and correct handling of state changes	   made by another threads is impossible in any case.	 */	mask = 0;	if (sk->err)		mask = POLLERR;	/*	 * POLLHUP is certainly not done right. But poll() doesn't	 * have a notion of HUP in just one direction, and for a	 * socket the read side is more interesting.	 *	 * Some poll() documentation says that POLLHUP is incompatible	 * with the POLLOUT/POLLWR flags, so somebody should check this	 * all. But careful, it tends to be safer to return too many	 * bits than too few, and you can easily break real applications	 * if you don't tell them that something has hung up!	 *	 * Check-me.	 *	 * Check number 1. POLLHUP is _UNMASKABLE_ event (see UNIX98 and	 * our fs/select.c). It means that after we received EOF,	 * poll always returns immediately, making impossible poll() on write()	 * in state CLOSE_WAIT. One solution is evident --- to set POLLHUP	 * if and only if shutdown has been made in both directions.	 * Actually, it is interesting to look how Solaris and DUX	 * solve this dilemma. I would prefer, if PULLHUP were maskable,	 * then we could set it on SND_SHUTDOWN. BTW examples given	 * in Stevens' books assume exactly this behaviour, it explains	 * why PULLHUP is incompatible with POLLOUT.	--ANK	 *	 * NOTE. Check for TCP_CLOSE is added. The goal is to prevent	 * blocking on fresh not-connected or disconnected socket. --ANK	 */	if (sk->shutdown == SHUTDOWN_MASK || sk->state == TCP_CLOSE)		mask |= POLLHUP;	if (sk->shutdown & RCV_SHUTDOWN)		mask |= POLLIN | POLLRDNORM;	/* Connected? */	if ((1 << sk->state) & ~(TCPF_SYN_SENT|TCPF_SYN_RECV)) {		/* Potential race condition. If read of tp below will		 * escape above sk->state, we can be illegally awaken		 * in SYN_* states. */		if ((tp->rcv_nxt != tp->copied_seq) &&		    (tp->urg_seq != tp->copied_seq ||		     tp->rcv_nxt != tp->copied_seq+1 ||		     sk->urginline || !tp->urg_data))			mask |= POLLIN | POLLRDNORM;		if (!(sk->shutdown & SEND_SHUTDOWN)) {			if (tcp_wspace(sk) >= tcp_min_write_space(sk)) {				mask |= POLLOUT | POLLWRNORM;			} else {  /* send SIGIO later */				set_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags);				set_bit(SOCK_NOSPACE, &sk->socket->flags);				/* Race breaker. If space is freed after				 * wspace test but before the flags are set,				 * IO signal will be lost.				 */				if (tcp_wspace(sk) >= tcp_min_write_space(sk))					mask |= POLLOUT | POLLWRNORM;			}		}		if (tp->urg_data & TCP_URG_VALID)			mask |= POLLPRI;	}	return mask;}/* *	TCP socket write_space callback. */void tcp_write_space(struct sock *sk){	struct socket *sock = sk->socket;	if (tcp_wspace(sk) >= tcp_min_write_space(sk) && sock) {		clear_bit(SOCK_NOSPACE, &sock->flags);		if (sk->sleep && waitqueue_active(sk->sleep))			wake_up_interruptible(sk->sleep);		if (sock->fasync_list && !(sk->shutdown&SEND_SHUTDOWN))			sock_wake_async(sock, 2, POLL_OUT);	}}int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg){	struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);	int answ;	switch(cmd) {	case SIOCINQ:		if (sk->state == TCP_LISTEN)			return(-EINVAL);		lock_sock(sk);		if ((1<<sk->state) & (TCPF_SYN_SENT|TCPF_SYN_RECV))			answ = 0;		else if (sk->urginline || !tp->urg_data ||			 before(tp->urg_seq,tp->copied_seq) ||			 !before(tp->urg_seq,tp->rcv_nxt)) {			answ = tp->rcv_nxt - tp->copied_seq;			/* Subtract 1, if FIN is in queue. */
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -