Date: Tue,  9 Jul 2024 23:08:15 +0000
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240709230815.2717872-1-edumazet@google.com>
Subject: [PATCH net-next] net: do not inline rtnl_calcit()
From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, 
	Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, eric.dumazet@gmail.com, 
	Eric Dumazet <edumazet@google.com>
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.netdev:355996
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

IFLA_MAX is increasing slowly but surely.

Use noinline_for_stack attribute to not inline rtnl_calcit()
in its unique caller (rtnetlink_rcv_msg()) to save stack space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eabfc8290f5e29f2ef3e5c1481715ae9056ea689..842d315675d5c749a0a1b62fd67afdc1d8046812 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3969,7 +3969,8 @@ static int rtnl_dellinkprop(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return rtnl_linkprop(RTM_DELLINKPROP, skb, nlh, extack);
 }
 
-static u32 rtnl_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
+static noinline_for_stack u32 rtnl_calcit(struct sk_buff *skb,
+					  struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	size_t min_ifinfo_dump_size = 0;
-- 
2.45.2.993.g49e7a77208-goog

.

Date: Wed, 10 Jul 2024 00:14:01 +0000
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240710001402.2758273-1-edumazet@google.com>
Subject: [PATCH net] tcp: avoid too many retransmit packets
From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, 
	Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, Kuniyuki Iwashima <kuniyu@amazon.com>, eric.dumazet@gmail.com, 
	Eric Dumazet <edumazet@google.com>, Jon Maxwell <jmaxwell37@gmail.com>, 
	Neal Cardwell <ncardwell@google.com>
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.netdev:355997
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
retracted its window to zero, tcp_retransmit_timer() can
retransmit a packet every two jiffies (2 ms for HZ=1000),
for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.

The fix is to make sure tcp_rtx_probe0_timed_out() takes
icsk->icsk_user_timeout into account.

Before blamed commit, the socket would not timeout after
icsk->icsk_user_timeout, but would use standard exponential
backoff for the retransmits.

Also worth noting that before commit e89688e3e978 ("net: tcp:
fix unexcepted socket die when snd_wnd is 0"), the issue
would last 2 minutes instead of 4.

Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maxwell <jmaxwell37@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_timer.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index db9d826560e57caf8274d1b7253c7af4dd7821a0..892c86657fbc243ce53a939157b77f1fe0410097 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -483,15 +483,26 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk,
 				     const struct sk_buff *skb,
 				     u32 rtx_delta)
 {
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+	u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout);
 	const struct tcp_sock *tp = tcp_sk(sk);
-	const int timeout = TCP_RTO_MAX * 2;
+	int timeout = TCP_RTO_MAX * 2;
 	s32 rcv_delta;
 
+	if (user_timeout) {
+		/* If user application specified a TCP_USER_TIMEOUT,
+		 * it does not want win 0 packets to 'reset the timer'
+		 * while retransmits are not making progress.
+		 */
+		if (rtx_delta > user_timeout)
+			return true;
+		timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout));
+	}
 	/* Note: timer interrupt might have been delayed by at least one jiffy,
 	 * and tp->rcv_tstamp might very well have been written recently.
 	 * rcv_delta can thus be negative.
 	 */
-	rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
+	rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp;
 	if (rcv_delta <= timeout)
 		return false;
 
-- 
2.45.2.993.g49e7a77208-goog

.

From: Gaosheng Cui <cuigaosheng1@huawei.com>
To: <dhowells@redhat.com>, <marc.dionne@auristor.com>, <davem@davemloft.net>,
	<edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<cuigaosheng1@huawei.com>
CC: <linux-afs@lists.infradead.org>, <netdev@vger.kernel.org>
Subject: [PATCH -next] rxrpc: Remove the BUG in rxkad_init_connection_security
Date: Wed, 10 Jul 2024 10:00:55 +0800
Message-ID: <20240710020055.4116034-1-cuigaosheng1@huawei.com>
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Xref: photonic.trudheim.com org.kernel.vger.netdev:355999
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

If crypto_sync_skcipher_setkey fails, we only need to return the
error code, It is not necessary to trigger the BUG directly.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
---
 net/rxrpc/rxkad.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 104bb1ec9002..75d291ada9e8 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -114,9 +114,10 @@ static int rxkad_init_connection_security(struct rxrpc_connection *conn,
 		goto error;
 	}
 
-	if (crypto_sync_skcipher_setkey(ci, token->kad->session_key,
-				   sizeof(token->kad->session_key)) < 0)
-		BUG();
+	ret = crypto_sync_skcipher_setkey(ci, token->kad->session_key,
+					  sizeof(token->kad->session_key));
+	if (ret < 0)
+		goto error_ci;
 
 	switch (conn->security_level) {
 	case RXRPC_SECURITY_PLAIN:
-- 
2.25.1

.

From: Michal Switala <michal.switala@infogain.com>
To: revest@google.com,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Michal Switala <michal.switala@infogain.com>,
	syzbot+cca39e6e84a367a7e6f6@syzkaller.appspotmail.com
Subject: [PATCH] bpf: Ensure BPF programs testing skb context initialization
Date: Wed, 10 Jul 2024 10:46:33 +0200
Message-ID: <20240710084633.2229015-1-michal.switala@infogain.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1273315 org.kernel.vger.netdev:356003
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.bpf,org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

This commit addresses an issue where a netdevice was found to be uninitialized.
To mitigate this case, the change ensures that BPF programs designed to test
skb context initialization thoroughly verify the availability of a fully
initialized context before execution.The root cause of a NULL ctx stems from
the initialization process in bpf_ctx_init(). This function returns NULL if
the user initializes the bpf_attr variables ctx_in and ctx_out with invalid
pointers or sets them to NULL. These variables are directly controlled by user
input, and if both are NULL, the context cannot be initialized, resulting in a
NULL ctx.

Reported-by: syzbot+cca39e6e84a367a7e6f6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cca39e6e84a367a7e6f6
Link: https://lore.kernel.org/all/000000000000b95d41061cbf302a@google.com/
Signed-off-by: Michal Switala <michal.switala@infogain.com>
---
 net/bpf/test_run.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 36ae54f57bf5..8b2efcee059f 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -970,7 +970,7 @@ static struct proto bpf_dummy_proto = {
 int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
-	bool is_l2 = false, is_direct_pkt_access = false;
+	bool is_l2 = false, is_direct_pkt_access = false, ctx_needed = false;
 	struct net *net = current->nsproxy->net_ns;
 	struct net_device *dev = net->loopback_dev;
 	u32 size = kattr->test.data_size_in;
@@ -998,6 +998,34 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		return PTR_ERR(ctx);
 	}
 
+	switch (prog->type) {
+	case BPF_PROG_TYPE_SOCKET_FILTER:
+	case BPF_PROG_TYPE_SCHED_CLS:
+	case BPF_PROG_TYPE_SCHED_ACT:
+	case BPF_PROG_TYPE_XDP:
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_SK_SKB:
+	case BPF_PROG_TYPE_SK_MSG:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_LWT_SEG6LOCAL:
+	case BPF_PROG_TYPE_SK_REUSEPORT:
+	case BPF_PROG_TYPE_NETFILTER:
+	case BPF_PROG_TYPE_LWT_IN:
+	case BPF_PROG_TYPE_LWT_OUT:
+	case BPF_PROG_TYPE_LWT_XMIT:
+		ctx_needed = true;
+		break;
+	default:
+		break;
+	}
+
+	if (!ctx && ctx_needed) {
+		kfree(data);
+		return -EINVAL;
+	}
+
 	switch (prog->type) {
 	case BPF_PROG_TYPE_SCHED_CLS:
 	case BPF_PROG_TYPE_SCHED_ACT:
-- 
2.43.0

.

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
From: ayaka <ayaka@soulik.info>
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
Mime-Version: 1.0 (1.0)
Date: Wed, 10 Jul 2024 17:40:46 +0800
Subject: tun: need an ioctl() cmd to get multi_queue index?
Message-Id: <FABA3A61-3062-4AC6-94D8-7DF602E09EC3@soulik.info>
To: netdev@vger.kernel.org
Xref: photonic.trudheim.com org.kernel.vger.netdev:356006
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hello All

I have read some example that filter packet with tc qdisc. It could a very u=
seful feature for dispatcher in a VPN program.
But I didn=E2=80=99t find an ioctl() to fetch the queue index, which I belie=
ve is the queue number used in tc qdisc.
There is an ioctl() which set the ifindex which would affect the queue_index=
 storing in the same union. But I don=E2=80=99t think there is an ioctl() to=
 fetch it.

If I was right, could I add a new ioctl() cmd for that?

Sincerely
Randy=
.

From: Kory Maincent <kory.maincent@bootlin.com>
To: Andrew Lunn <andrew@lunn.ch>,
	"Kory Maincent (Dent Project)" <kory.maincent@bootlin.com>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: thomas.petazzoni@bootlin.com,
	Oleksij Rempel <o.rempel@pengutronix.de>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>
Subject: [PATCH v2 1/2] net: pse-pd: Do not return EOPNOSUPP if config is null
Date: Wed, 10 Jul 2024 13:42:30 +0200
Message-Id: <20240710114232.257190-1-kory.maincent@bootlin.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1273556 org.kernel.vger.netdev:356008
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

For a PSE supporting both c33 and PoDL, setting config for one type of PoE
leaves the other type's config null. Currently, this case returns
EOPNOTSUPP, which is incorrect. Instead, we should do nothing if the
configuration is empty.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework")
---

Changes in v2:
- New patch to fix dealing with a null config.
---
 drivers/net/pse-pd/pse_core.c | 4 ++--
 net/ethtool/pse-pd.c          | 4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
index 795ab264eaf2..513cd7f85933 100644
--- a/drivers/net/pse-pd/pse_core.c
+++ b/drivers/net/pse-pd/pse_core.c
@@ -719,13 +719,13 @@ int pse_ethtool_set_config(struct pse_control *psec,
 {
 	int err = 0;
 
-	if (pse_has_c33(psec)) {
+	if (pse_has_c33(psec) && config->c33_admin_control) {
 		err = pse_ethtool_c33_set_config(psec, config);
 		if (err)
 			return err;
 	}
 
-	if (pse_has_podl(psec))
+	if (pse_has_podl(psec) && config->podl_admin_control)
 		err = pse_ethtool_podl_set_config(psec, config);
 
 	return err;
diff --git a/net/ethtool/pse-pd.c b/net/ethtool/pse-pd.c
index 2c981d443f27..982995ff1628 100644
--- a/net/ethtool/pse-pd.c
+++ b/net/ethtool/pse-pd.c
@@ -183,7 +183,9 @@ ethnl_set_pse(struct ethnl_req_info *req_info, struct genl_info *info)
 	if (pse_has_c33(phydev->psec))
 		config.c33_admin_control = nla_get_u32(tb[ETHTOOL_A_C33_PSE_ADMIN_CONTROL]);
 
-	/* Return errno directly - PSE has no notification */
+	/* Return errno directly - PSE has no notification
+	 * pse_ethtool_set_config() will do nothing if the config is null
+	 */
 	return pse_ethtool_set_config(phydev->psec, info->extack, &config);
 }
 
-- 
2.34.1

.

Date: Wed, 10 Jul 2024 19:16:50 +0800
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240710111654.4085575-1-yumike@google.com>
Subject: [PATCH ipsec v3 0/4] Support IPsec crypto offload for IPv6 ESP and
 IPv4 UDP-encapsulated ESP data paths
From: Mike Yu <yumike@google.com>
To: netdev@vger.kernel.org, steffen.klassert@secunet.com
Cc: yumike@google.com, stanleyjhu@google.com, martinwu@google.com, 
	chiachangwang@google.com
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.netdev:356010
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Currently, IPsec crypto offload is enabled for GRO code path. However, there
are other code paths where the XFRM stack is involved; for example, IPv6 ESP
packets handled by xfrm6_esp_rcv() in ESP layer, and IPv4 UDP-encapsulated
ESP packets handled by udp_rcv() in UDP layer.

This patchset extends the crypto offload support to cover these two cases.
This is useful for devices with traffic accounting (e.g., Android), where GRO
can lead to inaccurate accounting on the underlying network. For example, VPN
traffic might not be counted on the wifi network interface wlan0 if the packets
are handled in GRO code path before entering the network stack for accounting.

Below is the RX data path scenario the crypto offload can be applied to.

  +-----------+   +-------+
  | HW Driver |-->| wlan0 |--------+
  +-----------+   +-------+        |
                                   v
                             +---------------+   +------+
                     +------>| Network Stack |-->| Apps |
                     |       +---------------+   +------+
                     |             |
                     |             v
                 +--------+   +------------+
                 | ipsec1 |<--| XFRM Stack |
                 +--------+   +------------+

v2 -> v3:
- Correct ESP seq in esp_xmit().
v1 -> v2:
- Fix comment style.

Mike Yu (4):
  xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO
    path
  xfrm: Allow UDP encapsulation in crypto offload control path
  xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP
    packet
  xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP
    packet

 net/ipv4/esp4.c         |  8 +++++++-
 net/ipv4/esp4_offload.c | 17 ++++++++++++++++-
 net/xfrm/xfrm_device.c  |  6 +++---
 net/xfrm/xfrm_input.c   |  3 ++-
 net/xfrm/xfrm_policy.c  |  5 ++++-
 5 files changed, 32 insertions(+), 7 deletions(-)

-- 
2.45.2.803.g4e1b14247a-goog

.

From: Aleksandr Mishin <amishin@t-argos.ru>
To: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
CC: Aleksandr Mishin <amishin@t-argos.ru>, Jesse Brandeburg
	<jesse.brandeburg@intel.com>, Tony Nguyen <anthony.l.nguyen@intel.com>,
	"David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <lvc-project@linuxtesting.org>, Przemek
 Kitszel <przemyslaw.kitszel@intel.com>
Subject: [PATCH net-next v4] ice: Adjust over allocation of memory in ice_sched_add_root_node() and ice_sched_add_node()
Date: Wed, 10 Jul 2024 15:39:49 +0300
Message-ID: <20240710123949.9265-1-amishin@t-argos.ru>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-KSMG-Rule-ID: 1
X-KSMG-Message-Action: clean
X-KSMG-AntiSpam-Lua-Profiles: 186432 [Jul 10 2024]
X-KSMG-AntiSpam-Version: 6.1.0.4
X-KSMG-AntiSpam-Envelope-From: amishin@t-argos.ru
X-KSMG-AntiSpam-Rate: 0
X-KSMG-AntiSpam-Status: not_detected
X-KSMG-AntiSpam-Method: none
X-KSMG-AntiSpam-Auth: dkim=none
X-KSMG-AntiSpam-Info: LuaCore: 24 0.3.24 186c4d603b899ccfd4883d230c53f273b80e467f, {Tracking_uf_ne_domains}, {Tracking_from_domain_doesnt_match_to}, d41d8cd98f00b204e9800998ecf8427e.com:7.1.1;t-argos.ru:7.1.1;mx1.t-argos.ru.ru:7.1.1;127.0.0.199:7.1.2;lore.kernel.org:7.1.1, FromAlignment: s
X-MS-Exchange-Organization-SCL: -1
X-KSMG-AntiSpam-Interceptor-Info: scan successful
X-KSMG-AntiPhishing: Clean, bases: 2024/07/10 11:05:00
X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2024/07/10 05:16:00 #25942711
X-KSMG-AntiVirus-Status: Clean, skipped
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1273620 org.kernel.vger.netdev:356014
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.netdev,org.osuosl.intel-wired-lan
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

In ice_sched_add_root_node() and ice_sched_add_node() there are calls to
devm_kcalloc() in order to allocate memory for array of pointers to
'ice_sched_node' structure. But incorrect types are used as sizeof()
arguments in these calls (structures instead of pointers) which leads to
over allocation of memory.

Adjust over allocation of memory by correcting types in devm_kcalloc()
sizeof() arguments.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
---
v4:
  - Remove Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
  - Add Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    (https://lore.kernel.org/all/6d8ac0cf-b954-4c12-8b5b-e172c850e529@intel.com/)
v3: https://lore.kernel.org/all/20240708182736.8514-1-amishin@t-argos.ru/
  - Update comment and use the correct entities as suggested by Przemek
v2: https://lore.kernel.org/all/20240706140518.9214-1-amishin@t-argos.ru/
  - Update comment, remove 'Fixes' tag and change the tree from 'net' to
    'net-next' as suggested by Simon
    (https://lore.kernel.org/all/20240706095258.GB1481495@kernel.org/)
v1: https://lore.kernel.org/all/20240705163620.12429-1-amishin@t-argos.ru/

 drivers/net/ethernet/intel/ice/ice_sched.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c b/drivers/net/ethernet/intel/ice/ice_sched.c
index ecf8f5d60292..6ca13c5dcb14 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -28,9 +28,8 @@ ice_sched_add_root_node(struct ice_port_info *pi,
 	if (!root)
 		return -ENOMEM;
 
-	/* coverity[suspicious_sizeof] */
 	root->children = devm_kcalloc(ice_hw_to_dev(hw), hw->max_children[0],
-				      sizeof(*root), GFP_KERNEL);
+				      sizeof(*root->children), GFP_KERNEL);
 	if (!root->children) {
 		devm_kfree(ice_hw_to_dev(hw), root);
 		return -ENOMEM;
@@ -186,10 +185,9 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer,
 	if (!node)
 		return -ENOMEM;
 	if (hw->max_children[layer]) {
-		/* coverity[suspicious_sizeof] */
 		node->children = devm_kcalloc(ice_hw_to_dev(hw),
 					      hw->max_children[layer],
-					      sizeof(*node), GFP_KERNEL);
+					      sizeof(*node->children), GFP_KERNEL);
 		if (!node->children) {
 			devm_kfree(ice_hw_to_dev(hw), node);
 			return -ENOMEM;
-- 
2.30.2

.

Date: Wed, 10 Jul 2024 15:16:53 +0000
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240710151653.3786604-1-edumazet@google.com>
Subject: [PATCH v2 net-next] net: reduce rtnetlink_rcv_msg() stack usage
From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, 
	Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, eric.dumazet@gmail.com, 
	Eric Dumazet <edumazet@google.com>
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.netdev:356022
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

IFLA_MAX is increasing slowly but surely.

Some compilers use more than 512 bytes of stack in rtnetlink_rcv_msg()
because it calls rtnl_calcit() for RTM_GETLINK message.

Use noinline_for_stack attribute to not inline rtnl_calcit(),
and directly use nla_for_each_attr_type() (Jakub suggestion)
because we only care about IFLA_EXT_MASK at this stage.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eabfc8290f5e29f2ef3e5c1481715ae9056ea689..87e67194f24046a8420bbb51c19fb0a686b9b06b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3969,22 +3969,28 @@ static int rtnl_dellinkprop(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return rtnl_linkprop(RTM_DELLINKPROP, skb, nlh, extack);
 }
 
-static u32 rtnl_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
+static noinline_for_stack u32 rtnl_calcit(struct sk_buff *skb,
+					  struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	size_t min_ifinfo_dump_size = 0;
-	struct nlattr *tb[IFLA_MAX+1];
 	u32 ext_filter_mask = 0;
 	struct net_device *dev;
-	int hdrlen;
+	struct nlattr *nla;
+	int hdrlen, rem;
 
 	/* Same kernel<->userspace interface hack as in rtnl_dump_ifinfo. */
 	hdrlen = nlmsg_len(nlh) < sizeof(struct ifinfomsg) ?
 		 sizeof(struct rtgenmsg) : sizeof(struct ifinfomsg);
 
-	if (nlmsg_parse_deprecated(nlh, hdrlen, tb, IFLA_MAX, ifla_policy, NULL) >= 0) {
-		if (tb[IFLA_EXT_MASK])
-			ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);
+	if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen))
+		return NLMSG_GOODSIZE;
+
+	nla_for_each_attr_type(nla, IFLA_EXT_MASK,
+			       nlmsg_attrdata(nlh, hdrlen),
+			       nlmsg_attrlen(nlh, hdrlen), rem) {
+		if (nla_len(nla) == sizeof(u32))
+			ext_filter_mask = nla_get_u32(nla);
 	}
 
 	if (!ext_filter_mask)
-- 
2.45.2.993.g49e7a77208-goog

.

From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet
	<edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni
	<pabeni@redhat.com>, David Ahern <dsahern@kernel.org>
CC: Kuniyuki Iwashima <kuniyu@amazon.com>, Kuniyuki Iwashima
	<kuni1840@gmail.com>, <netdev@vger.kernel.org>
Subject: [PATCH v3 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant.
Date: Wed, 10 Jul 2024 10:12:44 -0700
Message-ID: <20240710171246.87533-1-kuniyu@amazon.com>
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Xref: photonic.trudheim.com org.kernel.vger.netdev:356029
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Patch 1 fixes an issue that BPF TCP option parser is triggered for ACK
instead of SYN+ACK in the case of simultaneous connect().

Patch 2 removes an wrong assumption in tcp_ao/self-connnect tests.

v3:
  * Use (sk->sk_state == TCP_SYN_RECV && sk->sk_socket) to detect cross SYN case

v2: https://lore.kernel.org/netdev/20240708180852.92919-1-kuniyu@amazon.com/
  * Target net-next and remove Fixes: tag
  * Don't skip bpf_skops_parse_hdr() to centralise sk_state check
  * Remove unnecessary ACK after SYN+ACK
  * Add patch 2

v1: https://lore.kernel.org/netdev/20240704035703.95065-1-kuniyu@amazon.com/


Kuniyuki Iwashima (2):
  tcp: Don't drop SYN+ACK for simultaneous connect().
  selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect
    tests.

 net/ipv4/tcp_input.c                           |  9 +++++++++
 .../selftests/net/tcp_ao/self-connect.c        | 18 ------------------
 2 files changed, 9 insertions(+), 18 deletions(-)

-- 
2.30.2

.

X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: Jason Zhou <jasonzhou@x.com>
Date: Wed, 10 Jul 2024 16:45:55 -0400
Message-ID: <CAHXsExy+zm+twpC9Qrs9myBre+5s_ApGzOYU45Pt=sw-FyOn1w@mail.gmail.com>
Subject: PROBLEM: Issue with setting veth MAC address being unreliable.
To: netdev@vger.kernel.org
Cc: Benjamin Mahler <bmahler@x.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Xref: photonic.trudheim.com org.kernel.vger.netdev:356032
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

[1.] One line summary of the problem:

Issue with setting veth address being unreliable.

[2.] Full description of the problem/report:

Hello!

We have been investigating a strange behavior within Apache Mesos
where after setting the MAC address on a veth device to the same
address as our eth0 MAC address, the change is sometimes not reflected
appropriately despite the ioctl call succeeding (~4% of the time in
our testing). Note that we also tried using libnl to set the MAC
address but the issue still persists.

Included below is the github link to the section where we set the veth
address, to clarify what we were trying to do. We first create the
veth pair [1] using a libnl function [2], then we set the veth device
MAC addresses to that of our host public interface (eth0) [3] using a
function called setMAC. Inside the setMAC [4] is where we are
observing the aforementioned issue with unreliable setting of veth
addresses..

This behavior was observed when re-fetching the MAC address on said
veth device after we made the function call to set its MAC address. We
have observed this issue on CentOS 9 only, but not on CentOS 7. We
have tried Linux kernels 5.15.147, 5.15.160 & 5.15.161 for CentOS 9,
CentOS 7 was using 5.10, but we also tried upgrading the Centos 7 host
to 5.15.160 but could not reproduce the bug.

We were re-fetching the addresses via the ioctl SIOCGIFHWADDR syscall
as well as via getifaddr (which appears to use netlink under the
covers), and, in problematic cases, both functions reported
discrepancies from the target MAC address we were initially setting
to. We also performed a fetch before we set the MAC addresses and
found that there are instances where getifaddr and ioctl results do
not match for our veth device *even before we perform any setting of
the MAC address*. It's also worth noting that after setting the MAC
address: there are no cases where ioctl or getifaddr come back with
the same MAC address as before we set the address. So, the set
operation always seems to have an effect.

Observed scenarios with incorrectly assigned MAC addresses:

(1) After setting the mac address: ioctl returns the correct MAC
address, but the results from getifaddr, returns an incorrect MAC
address (different from the original value before setting as well!)

(2) After setting the MAC address: both ioctl and getifaddr return the
same MAC address, but are both wrong (and different from the original
one!)

(3) There is a possibility that the MAC address we set ends up
overwritten by a garbage value *after* we have already updated the MAC
address, and checked that the MAC address was set correctly. Since
this error happens after this function has finished, we cannot log nor
detect it in the function where we set the MAC address because we have
not yet studied at what point this late overwriting of MAC address
occurs. It=E2=80=99s worth noting that this is the rarest scenario that we
have encountered, and we were only able to reproduce it in our testing
cluster machine, not in any of the production cluster machines.

[3.] Keywords:

networking, veth, kernel, MAC, netlink

[X.] Other notes, patches, fixes, workarounds:

Notes:

More specific kernel and environment information will be available on
request for security reasons, please let us know if you are interested
and we will be happy to provide you with the necessary information.

We have observed this behavior only on CentOS 9 systems at the moment,
CentOS 7 systems under various kernels do not seem to have the issue
(which is quite strange if this was purely a kernel bug).

We have tried kernels 5.15.147, 5.15.160, 5.15.161, all of these have
this issue on CentOS 9.

We have also tried rewriting our function for setting MAC address to
use libnl rather than ioctl to perform the MAC address setting, but it
did not eliminate the issue.

To work around this bug, we checked that the MAC address is set
correctly after the ioctl set call, and retry the address setting if
necessary. In our testing, this workaround appears to remedy scenarios
(1) and (2) above, but it does not address scenario (3).  You can see
it here:

https://github.com/apache/mesos/commit/8b202bbebdc89429ad82c6983aa1c514eb1b=
8d95

We would greatly appreciate any insights or guidance on this matter.
Please let me know if you need further information or if there are any
specific tests we should run to assist in diagnosing the issue. Again,
specific details for the production machines on which we encountered
this error can be provided upon request, so please let us know if
there is anything we can provide to help.

Thank you for your time and assistance.

Best regards,
Jason Zhou
Software Engineering Intern
jasonzhou@x.com

embedded links:
[1] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0f=
bc7ff0/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp#L35=
99
[2] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0f=
bc7ff0/src/linux/routing/link/veth.cpp#L45
[3] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0f=
bc7ff0/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp#L36=
28
[4] https://github.com/apache/mesos/blob/8cf287778371c13ee7e88fa428424b3c0f=
bc7ff0/src/linux/routing/link/link.cpp#L283
.