概述

TCP协议是整个Linux内核中比较重要的部分之一,我们基于Linux目前最稳定的版本4.17.13来对TCP协议进行比较系统的分析,读者可以参照博文来对源码进行分析学习,每个代码片段都附有在整个内核中的路径,方便大家查看。TCP实现复杂且庞大,我们会分几篇来介绍。

通常我们在进行网络编程的时候很少关注接口的具体实现,对于TCP连接的建立,关闭和阻塞控制等等也停留在具体的概念上,这里我们首先从接口开始,采用自顶向下的方法,从用户层面慢慢的深入了解TCP的实现。

我们通过分析源码具体要学习什么?

  • TCP代码实现

  • TCP参数优化

  • TCP网络分析工具

背景知识

TCP接口

我们通常在网络编程中使用以下几个接口:

  • socket() :返回一个套接字
  • bind():将套接字与指定的端口相连
  • listen():监听socket
  • connect():连接服务器套接字
  • accept():接受来自客户端的连接
  • recv():读取数据
  • send():发送数据
  • close():关闭连接

socket的数据结构

通常我们在使用socket()的接口来创建一个套接字时,需要指定具体的协议,比如AF_INET (IPv4 protocol) , AF_INET6 (IPv6 protocol),这里我们只来看一下IPv4的socket结构:

include/net/inet_sock.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/** struct inet_sock - representation of INET sockets
*
* @sk - ancestor class
* @pinet6 - pointer to IPv6 control block
* @inet_daddr - Foreign IPv4 addr
* @inet_rcv_saddr - Bound local IPv4 addr
* @inet_dport - Destination port
* @inet_num - Local port
* @inet_saddr - Sending source
* @uc_ttl - Unicast TTL
* @inet_sport - Source port
* @inet_id - ID counter for DF pkts
* @tos - TOS
* @mc_ttl - Multicasting TTL
* @is_icsk - is this an inet_connection_sock?
* @uc_index - Unicast outgoing device index
* @mc_index - Multicast device index
* @mc_list - Group array
* @cork - info to build ip hdr on each ip frag while socket is corked
*/

这里定义了IPv4协议族的socket结构,每个字段有其注释方便大家理解,其中的sk为通用的socket结构定义,与协议无关。

三次握手

Imgur

这是被大家提及最多的三次握手,而其中的具体的实现我们从Client和Server慢慢来探究。

Client主动打开连接

一般情况下,TCP中Client主动建立连接的三个过程为:

  • Client发送一个SYN段
  • Client接受一个SYN段以及对Server端对SYN段的ACK
  • Client对Server的回复ACK

第一次握手: Client发送SYN段

net/ipv4/tcp_ipv4.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
/* This will initiate an outgoing connection. */
/* 参数:
sk: 传输控制块,即socket
uaddr: 地址结构
addr_len: 地址长度
*/
int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
{
struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
struct inet_sock *inet = inet_sk(sk);
struct tcp_sock *tp = tcp_sk(sk);
__be16 orig_sport, orig_dport;
__be32 daddr, nexthop;
struct flowi4 *fl4;
struct rtable *rt;
int err;
struct ip_options_rcu *inet_opt;
struct inet_timewait_death_row *tcp_death_row = &sock_net(sk)->ipv4.tcp_death_row;

/* 判断地址长度是否有效 */
if (addr_len < sizeof(struct sockaddr_in))
return -EINVAL;

/* 判断协议族是否为IPv4 */
if (usin->sin_family != AF_INET)
return -EAFNOSUPPORT;

nexthop = daddr = usin->sin_addr.s_addr;
inet_opt = rcu_dereference_protected(inet->inet_opt,
lockdep_sock_is_held(sk));
if (inet_opt && inet_opt->opt.srr) {
if (!daddr)
return -EINVAL;
nexthop = inet_opt->opt.faddr;
}

/* 这段代码是关于IP路由,我们后面会慢慢介绍 */
orig_sport = inet->inet_sport;
orig_dport = usin->sin_port;
fl4 = &inet->cork.fl.u.ip4;
rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
IPPROTO_TCP,
orig_sport, orig_dport, sk);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
if (err == -ENETUNREACH)
IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
return err;
}

if (rt->rt_flags & (RTCF_MULTICAST | RTCF_BROADCAST)) {
ip_rt_put(rt);
return -ENETUNREACH;
}

if (!inet_opt || !inet_opt->opt.srr)
daddr = fl4->daddr;

if (!inet->inet_saddr)
inet->inet_saddr = fl4->saddr;
sk_rcv_saddr_set(sk, inet->inet_saddr);

if (tp->rx_opt.ts_recent_stamp && inet->inet_daddr != daddr) {
/* Reset inherited state */
tp->rx_opt.ts_recent = 0;
tp->rx_opt.ts_recent_stamp = 0;

/* 将初始序号write_seq初始化为0 */
if (likely(!tp->repair))
tp->write_seq = 0;
}

inet->inet_dport = usin->sin_port;
sk_daddr_set(sk, daddr);

inet_csk(sk)->icsk_ext_hdr_len = 0;
if (inet_opt)
inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;

tp->rx_opt.mss_clamp = TCP_MSS_DEFAULT;

/* Socket identity is still unknown (sport may be zero).
* However we set state to SYN-SENT and not releasing socket
* lock select source port, enter ourselves into the hash tables and
* complete initialization after this.
*/

/* 将TCP的状态设为TCP_SYN_SENT,动态绑定一个本地端口 */
tcp_set_state(sk, TCP_SYN_SENT);
err = inet_hash_connect(tcp_death_row, sk);
if (err)
goto failure;

sk_set_txhash(sk);

rt = ip_route_newports(fl4, rt, orig_sport, orig_dport,
inet->inet_sport, inet->inet_dport, sk);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
rt = NULL;
goto failure;
}
/* OK, now commit destination to socket. */
sk->sk_gso_type = SKB_GSO_TCPV4;
sk_setup_caps(sk, &rt->dst);
rt = NULL;

if (likely(!tp->repair)) {
if (!tp->write_seq)
/* hash计算write_seq的值,与源地址、端口和目的地址、端口有关 */
tp->write_seq = secure_tcp_seq(inet->inet_saddr,
inet->inet_daddr,
inet->inet_sport,
usin->sin_port);
tp->tsoffset = secure_tcp_ts_off(sock_net(sk),
inet->inet_saddr,
inet->inet_daddr);
}

inet->inet_id = tp->write_seq ^ jiffies;

if (tcp_fastopen_defer_connect(sk, &err))
return err;
if (err)
goto failure;

/* 调用tcp_connect()来构造SYN字段并发送 */
err = tcp_connect(sk);

if (err)
goto failure;

return 0;

failure:
/*
* This unhashes the socket and releases the local port,
* if necessary.
*/
/* 如果出错会goto至这里,将TCP的状态置为TCP_CLOSE */
tcp_set_state(sk, TCP_CLOSE);
ip_rt_put(rt);
sk->sk_route_caps = 0;
inet->inet_dport = 0;
return err;
}

这里主要的函数是tcp_connect()来构造SYN段并发送的过程,我们详细来看下这段代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
/* Build a SYN and send it off. */
int tcp_connect(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *buff;
int err;

/* 判断是否调用Berkeley Packet Filter */
tcp_call_bpf(sk, BPF_SOCK_OPS_TCP_CONNECT_CB, 0, NULL);

if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk))
return -EHOSTUNREACH; /* Routing failure or similar. */

/* 初始化socket中与连接相关的字段 */
tcp_connect_init(sk);

if (unlikely(tp->repair)) {
tcp_finish_connect(sk, NULL);
return 0;
}

/* 初始化sk_buff */
buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation, true);
if (unlikely(!buff))
return -ENOBUFS;
/* 利用socket中的seq构造sk_buff中的字段 */
tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
/* 设置RTT时间 */
tcp_mstamp_refresh(tp);
tp->retrans_stamp = tcp_time_stamp(tp);
/* 将sk_buff放入发送队列 */
tcp_connect_queue_skb(sk, buff);
tcp_ecn_send_syn(sk, buff);
tcp_rbtree_insert(&sk->tcp_rtx_queue, buff);

/* Send off SYN; include data in Fast Open. */
/* 发送SYN段, 并在这里判断是否为Fast Open, 即是否在建立连接的时候就发送数据 */
err = tp->fastopen_req ? tcp_send_syn_data(sk, buff) :
tcp_transmit_skb(sk, buff, 1, sk->sk_allocation);
if (err == -ECONNREFUSED)
return err;

/* We change tp->snd_nxt after the tcp_transmit_skb() call
* in order to make this packet get counted in tcpOutSegs.
*/
/* 更新下一个seq */
tp->snd_nxt = tp->write_seq;
tp->pushed_seq = tp->write_seq;
buff = tcp_send_head(sk);
if (unlikely(buff)) {
tp->snd_nxt = TCP_SKB_CB(buff)->seq;
tp->pushed_seq = TCP_SKB_CB(buff)->seq;
}
TCP_INC_STATS(sock_net(sk), TCP_MIB_ACTIVEOPENS);

/* Timer for repeating the SYN until an answer. */
/* 这里设置了定时器用来重发SYN段,超时时间icsk_rto根据RFC6298被设为1s */
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
inet_csk(sk)->icsk_rto, TCP_RTO_MAX);
return 0;
}
EXPORT_SYMBOL(tcp_connect);

关于第一次握手,我们梳理了最关键的SYN段的逻辑,其中SYN的seq序号我们可以得知与双方的地址、端口有关,通过hash计算得出。

第二次握手:接收SYN和ACK段

TCP是一个典型的状态机实现,根据不同的状态,状态机作出不同的回应:

net/ipv4/tcp_input.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/*
* This function implements the receiving procedure of RFC 793 for
* all states except ESTABLISHED and TIME_WAIT.
* It's called from both tcp_v4_rcv and tcp_v6_rcv and should be
* address independent.
*/
/* tcp_rev_state_process可以处理除了ESTABLISHED和TIME_WAIT的其他状态。第一次握手结束后TCP的状态被
置为SYN_SENT, 之后再收到信息即二次握手时,tcp_rcv_synsent_state_process会进行处理
*/
int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
const struct tcphdr *th = tcp_hdr(skb);
struct request_sock *req;
int queued = 0;
bool acceptable;

switch (sk->sk_state) {
case TCP_CLOSE: /* TCP_CLOSE状态的处理 */
goto discard;

case TCP_LISTEN: /* TCP_LISTEN状态的处理 */
/* ... */

case TCP_SYN_SENT: /* TCP_SYN_SENT状态的处理 */
tp->rx_opt.saw_tstamp = 0;
tcp_mstamp_refresh(tp);
queued = tcp_rcv_synsent_state_process(sk, skb, th);
if (queued >= 0)
return queued;

/* Do step6 onward by hand. */
tcp_urg(sk, skb, th);
__kfree_skb(skb);
tcp_data_snd_check(sk);
return 0;
}

tcp_rcv_synsent_state_process是二次握手的核心处理逻辑,我们来梳理一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
/* 参数:
sk: 传输控制块
skb: 缓冲区
th: tcp报文

根据RFC793的定义,TCP报文如下:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TCP的报文数据结构:/include/uapi/linux/tcp.h
*/
static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
const struct tcphdr *th)
{
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_fastopen_cookie foc = { .len = -1 };
int saved_clamp = tp->rx_opt.mss_clamp;
bool fastopen_fail;

/* 解析TCP选项,并保存在传输控制块中 */
tcp_parse_options(sock_net(sk), skb, &tp->rx_opt, 0, &foc);
if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr)
tp->rx_opt.rcv_tsecr -= tp->tsoffset;

if (th->ack) { /* 检查TCP报文中的ACK字段部分 */
/* rfc793:
* "If the state is SYN-SENT then
* first check the ACK bit
* If the ACK bit is set
* If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send
* a reset (unless the RST bit is set, if so drop
* the segment and return)"
*/

/* 根据RFC793定义:
If SND.UNA =< SEG.ACK =< SND.NXT then enter ESTABLISHED state
and continue processing.
snd_una: First byte we want an ack for
ack_seq: Sequence number ACK'd
snd_nex: Next sequence we send
下面这段逻辑就是对ACK的seq进行判断:ACK的值是否在初始发送序号和下一个序号之间
*/
if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
goto reset_and_undo; /* 如果不是,就发送一个RST重置 */

if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
!between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
tcp_time_stamp(tp))) {
NET_INC_STATS(sock_net(sk),
LINUX_MIB_PAWSACTIVEREJECTED);
goto reset_and_undo;
}

/* Now ACK is acceptable.
*
* "If the RST bit is set
* If the ACK was acceptable then signal the user "error:
* connection reset", drop the segment, enter CLOSED state,
* delete TCB, and return."
*/

/* 这时ACK的seq已经被接受了 */

/* 检查TCP报文中的RST字段,如果被设置了就进行tcp_reset */
if (th->rst) {
tcp_reset(sk);
goto discard;
}

/* rfc793:
* "fifth, if neither of the SYN or RST bits is set then
* drop the segment and return."
*
* See note below!
* --ANK(990513)
*/
/* 检查TCP报文中的是否存在SYN字段 */
if (!th->syn)
goto discard_and_undo;

/* rfc793:
* "If the SYN bit is on ...
* are acceptable then ...
* (our SYN has been ACKed), change the connection
* state to ESTABLISHED..."
*/

/* 判断是否支持显示拥塞通知的特性, 即Explicit Congestion Notification */
tcp_ecn_rcv_synack(tp, th);

/* 初始化与窗口有关的参数 */
tcp_init_wl(tp, TCP_SKB_CB(skb)->seq);
/* 处理ACK, 检查ack_seq的值是否有效 */
tcp_ack(sk, skb, FLAG_SLOWPATH);

/* Ok.. it's good. Set up sequence numbers and
* move to established.
*/
tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;

/* RFC1323: The window in SYN & SYN/ACK segments is
* never scaled.
*/
tp->snd_wnd = ntohs(th->window);

if (!tp->rx_opt.wscale_ok) {
tp->rx_opt.snd_wscale = tp->rx_opt.rcv_wscale = 0;
tp->window_clamp = min(tp->window_clamp, 65535U);
}

/* 根据时间戳选项,设定相关字段及 TCP 头部长度 */
if (tp->rx_opt.saw_tstamp) {
tp->rx_opt.tstamp_ok = 1;
tp->tcp_header_len =
sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED;
tp->advmss -= TCPOLEN_TSTAMP_ALIGNED;
tcp_store_ts_recent(tp);
} else {
tp->tcp_header_len = sizeof(struct tcphdr);
}

/* 初始化 MTU、MSS等参数 */
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
tcp_initialize_rcv_mss(sk);

/* Remember, tcp_poll() does not lock socket!
* Change state from SYN-SENT only after copied_seq
* is initialized. */
tp->copied_seq = tp->rcv_nxt;

smc_check_reset_syn(tp);

smp_mb();

/* 完成连接,将TCP的状态置为TCP_ESTABLISHED */
tcp_finish_connect(sk, skb);

/* 处理FAST OPEN的情况 */
fastopen_fail = (tp->syn_fastopen || tp->syn_data) &&
tcp_rcv_fastopen_synack(sk, skb, &foc);

if (!sock_flag(sk, SOCK_DEAD)) {
sk->sk_state_change(sk);
sk_wake_async(sk, SOCK_WAKE_IO, POLL_OUT);
}
if (fastopen_fail)
return -1;
if (sk->sk_write_pending ||
icsk->icsk_accept_queue.rskq_defer_accept ||
icsk->icsk_ack.pingpong) {
/* Save one ACK. Data will be ready after
* several ticks, if write_pending is set.
*
* It may be deleted, but with this feature tcpdumps
* look so _wonderfully_ clever, that I was not able
* to stand against the temptation 8) --ANK
*/
inet_csk_schedule_ack(sk);
tcp_enter_quickack_mode(sk);
inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
TCP_DELACK_MAX, TCP_RTO_MAX);

discard:
tcp_drop(sk, skb);
return 0;
} else {
/* 回复ACK包,即第三次握手 */
tcp_send_ack(sk);
}
return -1;
}

/* No ACK in the segment */
/* 以下为ACK字段不存在的处理逻辑 */
if (th->rst) {
/* rfc793:
* "If the RST bit is set
*
* Otherwise (no ACK) drop the segment and return."
*/

goto discard_and_undo;
}

/* PAWS check. */
if (tp->rx_opt.ts_recent_stamp && tp->rx_opt.saw_tstamp &&
tcp_paws_reject(&tp->rx_opt, 0))
goto discard_and_undo;

/* 仅有SYN段而没有ACK */
if (th->syn) {
/* We see SYN without ACK. It is attempt of
* simultaneous connect with crossed SYNs.
* Particularly, it can be connect to self.
*/
/* 将TCP状态设为TCP_SYN_RECV */
tcp_set_state(sk, TCP_SYN_RECV);

if (tp->rx_opt.saw_tstamp) {
tp->rx_opt.tstamp_ok = 1;
tcp_store_ts_recent(tp);
tp->tcp_header_len =
sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED;
} else {
tp->tcp_header_len = sizeof(struct tcphdr);
}

tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
tp->copied_seq = tp->rcv_nxt;
tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;

/* RFC1323: The window in SYN & SYN/ACK segments is
* never scaled.
*/
tp->snd_wnd = ntohs(th->window);
tp->snd_wl1 = TCP_SKB_CB(skb)->seq;
tp->max_window = tp->snd_wnd;

tcp_ecn_rcv_syn(tp, th);

tcp_mtup_init(sk);
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
tcp_initialize_rcv_mss(sk);

tcp_send_synack(sk);
#if 0
/* Note, we could accept data and URG from this segment.
* There are no obstacles to make this (except that we must
* either change tcp_recvmsg() to prevent it from returning data
* before 3WHS completes per RFC793, or employ TCP Fast Open).
*
* However, if we ignore data in ACKless segments sometimes,
* we have no reasons to accept it sometimes.
* Also, seems the code doing it in step6 of tcp_rcv_state_process
* is not flawless. So, discard packet for sanity.
* Uncomment this return to process the data.
*/
return -1;
#else
goto discard;
#endif
}
/* "fifth, if neither of the SYN or RST bits is set then
* drop the segment and return."
*/

discard_and_undo:
tcp_clear_options(&tp->rx_opt);
tp->rx_opt.mss_clamp = saved_clamp;
goto discard;

reset_and_undo:
tcp_clear_options(&tp->rx_opt);
tp->rx_opt.mss_clamp = saved_clamp;
return 1;
}

第三次握手:发送ACK段

net/ipv4/tcp_output.c

__tcp_send_ack()是发送ACK的具体实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/* This routine sends an ack and also updates the window. */
void __tcp_send_ack(struct sock *sk, u32 rcv_nxt)
{
struct sk_buff *buff;

/* If we have been reset, we may not send again. */
/* 如果TCP的状态为TCP_CLOSE,直接返回 */
if (sk->sk_state == TCP_CLOSE)
return;

/* We are not putting this on the write queue, so
* tcp_transmit_skb() will set the ownership to this
* sock.
*/
/* 初始化数据包的缓冲区 */
buff = alloc_skb(MAX_TCP_HEADER,
sk_gfp_mask(sk, GFP_ATOMIC | __GFP_NOWARN));
if (unlikely(!buff)) {
inet_csk_schedule_ack(sk);
inet_csk(sk)->icsk_ack.ato = TCP_ATO_MIN;
inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
TCP_DELACK_MAX, TCP_RTO_MAX);
return;
}

/* Reserve space for headers and prepare control bits. */
skb_reserve(buff, MAX_TCP_HEADER);

/* tcp_init_nondata_skb()初始化一个ACK回复的TCP报文, tcp_acceptable_seq构造该报文的seq,
TCPHDR_ACK设置TCP报文的标志位
*/
tcp_init_nondata_skb(buff, tcp_acceptable_seq(sk), TCPHDR_ACK);

/* We do not want pure acks influencing TCP Small Queues or fq/pacing
* too much.
* SKB_TRUESIZE(max(1 .. 66, MAX_TCP_HEADER)) is unfortunately ~784
*/
skb_set_tcp_pure_ack(buff);

/* Send it off, this clears delayed acks for us. */
/* 发送ACK包 */
__tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0, rcv_nxt);
}

这就是整个TCP三次握手中Client主动连接的实现部分,后面我们会从Server的角度来分析被动连接的实现。

主动打开的三次握手总结

  • Client发送SYN段,SYN的seq与双方的地址、端口有关,完成后TCP状态为TCP_SYN_SENT
  • Client收到Server的SYN和ACK段,对Server的ACK的ack_seq进行检查,完成后TCP状态为TCP_ESTABLISHED
  • Client回复ACK段,其中Client的ACK的ack_seq为Server发送的TCP报文中的seq+1