From: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
To: netdev@vger.kernel.org
Cc: stable@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	hiraku.toyooka@miraclelinux.com,
	Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Subject: [PATCH 5.15 0/3] ipv6/v4: Fix data races around sk->sk_prot and icsk->icsk_af_ops
Date: Mon, 17 Apr 2023 16:53:45 +0000
Message-Id: <20230417165348.26189-1-kazunori.kobayashi@miraclelinux.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1261726 org.kernel.vger.netdev:355246
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.netdev,org.kernel.vger.stable
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

For 5.15 kernel, this series includes backports of CVE-2022-3566 and
CVE-2022-3567 fixes.

Eric Dumazet (1):
  ipv6: annotate some data-races around sk->sk_prot

Kuniyuki Iwashima (2):
  ipv6: Fix data races around sk->sk_prot.
  tcp: Fix data races around icsk->icsk_af_ops.

 net/core/sock.c          |  6 ++++--
 net/ipv4/af_inet.c       | 23 ++++++++++++++++-------
 net/ipv4/tcp.c           | 10 ++++++----
 net/ipv6/af_inet6.c      | 24 ++++++++++++++++++------
 net/ipv6/ipv6_sockglue.c |  9 ++++++---
 net/ipv6/tcp_ipv6.c      |  6 ++++--
 6 files changed, 54 insertions(+), 24 deletions(-)

-- 
2.39.2

.

From: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
To: netdev@vger.kernel.org
Cc: stable@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	hiraku.toyooka@miraclelinux.com,
	Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Subject: [PATCH 5.4 0/3] ipv6/v4: Fix data races around sk->sk_prot and icsk->icsk_af_ops
Date: Mon, 17 Apr 2023 16:54:03 +0000
Message-Id: <20230417165406.26237-1-kazunori.kobayashi@miraclelinux.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1261730 org.kernel.vger.netdev:355250
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.netdev,org.kernel.vger.stable
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

For 5.4 kernel, this series includes backports of CVE-2022-3566 and
CVE-2022-3567 fixes.

Eric Dumazet (1):
  ipv6: annotate some data-races around sk->sk_prot

Kuniyuki Iwashima (2):
  ipv6: Fix data races around sk->sk_prot.
  tcp: Fix data races around icsk->icsk_af_ops.

 net/core/sock.c          |  6 ++++--
 net/ipv4/af_inet.c       | 23 ++++++++++++++++-------
 net/ipv4/tcp.c           | 10 ++++++----
 net/ipv6/af_inet6.c      | 24 ++++++++++++++++++------
 net/ipv6/ipv6_sockglue.c |  9 ++++++---
 net/ipv6/tcp_ipv6.c      |  6 ++++--
 6 files changed, 54 insertions(+), 24 deletions(-)

-- 
2.39.2

.

From: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
To: netdev@vger.kernel.org
Cc: stable@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	hiraku.toyooka@miraclelinux.com,
	Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Subject: [PATCH 4.19 0/3] ipv6/v4: Fix data races around sk->sk_prot and icsk->icsk_af_ops
Date: Mon, 17 Apr 2023 16:54:25 +0000
Message-Id: <20230417165428.26284-1-kazunori.kobayashi@miraclelinux.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1261734 org.kernel.vger.netdev:355254
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.netdev,org.kernel.vger.stable
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

For 4.19 kernel, this series includes backports of CVE-2022-3566 and
CVE-2022-3567 fixes.

Eric Dumazet (1):
  ipv6: annotate some data-races around sk->sk_prot

Kuniyuki Iwashima (2):
  ipv6: Fix data races around sk->sk_prot.
  tcp: Fix data races around icsk->icsk_af_ops.

 net/core/sock.c          |  6 ++++--
 net/ipv4/af_inet.c       | 38 +++++++++++++++++++++++++++-----------
 net/ipv4/tcp.c           | 10 ++++++----
 net/ipv6/af_inet6.c      | 14 ++++++++++----
 net/ipv6/ipv6_sockglue.c |  9 ++++++---
 net/ipv6/tcp_ipv6.c      |  6 ++++--
 6 files changed, 57 insertions(+), 26 deletions(-)

-- 
2.39.2

.

From: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
To: netdev@vger.kernel.org
Cc: stable@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	hiraku.toyooka@miraclelinux.com,
	Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Subject: [PATCH 5.10 0/3] ipv6/v4: Fix data races around sk->sk_prot and icsk->icsk_af_ops
Date: Mon, 17 Apr 2023 16:50:31 +0000
Message-Id: <20230417165034.26123-1-kazunori.kobayashi@miraclelinux.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1261744 org.kernel.vger.netdev:355258
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.netdev,org.kernel.vger.stable
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

For 5.10 kernel, this series includes backports of CVE-2022-3566 and
CVE-2022-3567 fixes.

Eric Dumazet (1):
  ipv6: annotate some data-races around sk->sk_prot

Kuniyuki Iwashima (2):
  ipv6: Fix data races around sk->sk_prot.
  tcp: Fix data races around icsk->icsk_af_ops.

 net/core/sock.c          |  6 ++++--
 net/ipv4/af_inet.c       | 23 ++++++++++++++++-------
 net/ipv4/tcp.c           | 10 ++++++----
 net/ipv6/af_inet6.c      | 24 ++++++++++++++++++------
 net/ipv6/ipv6_sockglue.c |  9 ++++++---
 net/ipv6/tcp_ipv6.c      |  6 ++++--
 6 files changed, 54 insertions(+), 24 deletions(-)

-- 
2.39.2

.

From: Kory Maincent <kory.maincent@bootlin.com>
Subject: [PATCH net-next v5 0/7] net: pse-pd: Add new PSE c33 features
Date: Fri, 28 Jun 2024 10:31:53 +0200
Message-Id: <20240628-feature_poe_power_cap-v5-0-5e1375d3817a@bootlin.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
To: "David S. Miller" <davem@davemloft.net>, 
 Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, 
 Paolo Abeni <pabeni@redhat.com>, Donald Hunter <donald.hunter@gmail.com>, 
 Oleksij Rempel <o.rempel@pengutronix.de>, Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>, 
 linux-kernel@vger.kernel.org, netdev@vger.kernel.org, 
 Dent Project <dentproject@linuxfoundation.org>, kernel@pengutronix.de, 
 linux-doc@vger.kernel.org, Kory Maincent <kory.maincent@bootlin.com>, 
 Sai Krishna <saikrishnag@marvell.com>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1261912 org.kernel.vger.netdev:355260
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-doc,org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

From: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>

This patch series adds new c33 features to the PSE API.
- Expand the PSE PI informations status with power, class and failure
  reason
- Add the possibility to get and set the PSE PIs power limit

Changes in v5:
- Fix few nitpick.
- Link to v4: https://lore.kernel.org/r/20240625-feature_poe_power_cap-v4-0-b0813aad57d5@bootlin.com

Changes in v4:
- Made few update in PSE extended state an substate.
- Add support for c33 pse power limit ranges.
- Few changes in the specs and the documentation.
- Link to v3: https://lore.kernel.org/r/20240614-feature_poe_power_cap-v3-0-a26784e78311@bootlin.com

Changes in v3:
- Use u32 instead of u8 size for c33 pse extended state and substate.
- Reformat the state and substate enumeration to follow Oleksij proposal which
  is more IEEE 802.3 standard compliant
- Sent the first patch standalone in net.
- Link to v2: https://lore.kernel.org/r/20240607-feature_poe_power_cap-v2-0-c03c2deb83ab@bootlin.com

Changes in v2:
- Use uA and uV instead of mA and mV to have more precision in the power
  calculation. Need to use 64bit variables for the calculation.
- Modify the pd-92x0behavior in case of setting the current out of the
  available ranges. Report an error now.
- Link to v1: https://lore.kernel.org/r/20240529-feature_poe_power_cap-v1-0-0c4b1d5953b8@bootlin.com

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
---
Kory Maincent (7):
      net: ethtool: pse-pd: Expand C33 PSE status with class, power and extended state
      netlink: specs: Expand the PSE netlink command with C33 new features
      net: pse-pd: pd692x0: Expand ethtool status message
      net: pse-pd: Add new power limit get and set c33 features
      net: ethtool: Add new power limit get and set features
      netlink: specs: Expand the PSE netlink command with C33 pw-limit attributes
      net: pse-pd: pd692x0: Enhance with new current limit and voltage read callbacks

 Documentation/netlink/specs/ethtool.yaml     |  58 +++++
 Documentation/networking/ethtool-netlink.rst |  87 +++++++-
 drivers/net/pse-pd/pd692x0.c                 | 317 ++++++++++++++++++++++++++-
 drivers/net/pse-pd/pse_core.c                | 172 ++++++++++++++-
 include/linux/ethtool.h                      |  20 ++
 include/linux/pse-pd/pse.h                   |  51 +++++
 include/uapi/linux/ethtool.h                 | 191 ++++++++++++++++
 include/uapi/linux/ethtool_netlink.h         |  12 +
 net/ethtool/pse-pd.c                         | 119 +++++++++-
 9 files changed, 997 insertions(+), 30 deletions(-)
---
base-commit: f203f9086d3b3718bc63782a56218c7122f07db3
change-id: 20240425-feature_poe_power_cap-18e90ba7294b

Best regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

.

From: Jakub Sitnicki <jakub@cloudflare.com>
To: netdev@vger.kernel.org
Cc: kernel-team@cloudflare.com
Subject: [FYI] Input route ref count underflow since probably 6.6.22
Date: Fri, 28 Jun 2024 13:10:53 +0200
Message-ID: <87ikxtfhky.fsf@cloudflare.com>
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
Xref: photonic.trudheim.com org.kernel.vger.netdev:355277
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hi,

We've observed an unbalanced dst_release() on an input route in v6.6.y.
First noticed in 6.6.22. Or at least that is how far back our logs go.

We have just started looking into it and don't have much context yet,
except that:

1. the issue is architecture agnostic, seen both on x86_64 and arm64;
2. the backtrace, we realize, doesn't point to the source of problem,
   it's just where the ref count underflow manifests itself;
3. while have out-of-tree modules, they are for the crypto subsystem.

We will follow up as we collect more info on this, but we would
appreciate any hints or pointers to potential suspects, if anything
comes to mind.

Decoded warning reports follow.

Thanks,
-jkbs

* arm64

------------[ cut here ]------------
rcuref - imbalanced put()
WARNING: CPU: 20 PID: 180350 at lib/rcuref.c:267 rcuref_put_slowpath (lib/r=
curef.c:267 (discriminator 1))
Modules linked in: overlay mptcp_diag xsk_diag raw_diag unix_diag af_packet=
_diag netlink_diag nft_compat esp4 xt_hashlimit ip_set_hash_netport xt_leng=
th nf_conntrack_netlink nft_fwd_netdev nf_dup_netdev xfrm_interface xfrm6_t=
unnel nft_numgen nft_log nft_limit dummy ip_gre gre cls_bpf xfrm_user xfrm_=
algo fou6 ip6_tunnel tunnel6 ipip mpls_gso mpls_iptunnel mpls_router sit tu=
nnel4 fou ip_tunnel ip6_udp_tunnel udp_tunnel nft_ct nf_tables zstd zram zs=
malloc xgene_edac sch_ingress tcp_diag udp_diag inet_diag veth tun tcp_bbr =
sch_fq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio ip6t_REJECT n=
f_reject_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6table_securit=
y ip6table_nat ip6_tables xt_LOG nf_log_syslog ipt_REJECT nf_reject_ipv4 xt=
_tcpmss iptable_filter xt_TCPMSS xt_bpf xt_limit xt_multiport xt_NFLOG nfne=
tlink_log xt_connbytes xt_connlabel xt_statistic xt_mark xt_connmark xt_con=
ntrack iptable_mangle xt_nat iptable_nat nf_nat xt_owner xt_set xt_comment =
xt_tcpudp xt_CT nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 iptable_raw ip_set_hash_ip ip_set_hash_net ip_set raid0 md_m=
od dm_crypt trusted asn1_encoder tee algif_skcipher af_alg 8021q garp mrp s=
tp llc nvme_fabrics acpi_ipmi mlx5_core crct10dif_ce ghash_ce ipmi_ssif mlx=
fw sha2_ce sha256_arm64 ipmi_devintf nvme sha1_ce xhci_pci tls tiny_power_b=
utton arm_spe_pmu ipmi_msghandler xhci_hcd nvme_core psample button i2c_des=
ignware_platform i2c_designware_core cppc_cpufreq arm_dsu_pmu tpm_tis tpm_t=
is_core fuse dm_mod dax nfnetlink efivarfs ip_tables x_tables bcmcrypt(O) a=
es_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: kheaders]
CPU: 20 PID: 180350 Comm: napi/iconduit-g Tainted: G           O       6.6.=
32-cloudflare-2024.5.16 #1
Hardware name: GIGABYTE R152-P30-CD/MP32-AR1-00, BIOS F33e (SCP: 2.10.20230=
517) 02/21/2024
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=3D--)
pc : rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
lr : rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
sp : ffff8000cb6eb760
x29: ffff8000cb6eb760 x28: 00000000f0d32f50 x27: ffff080223064cf0
x26: 0000000000000001 x25: ffffbea2e405703c x24: 00000000ce087c9f
x23: 0000000000000000 x22: ffff080223064cf0 x21: ffff0831525c1e00
x20: ffff07ff8a3eef00 x19: ffff0831525c1e40 x18: 0000000000000004
x17: 0000000000000002 x16: 0000000000000001 x15: 0000000000000000
x14: 0000000000000000 x13: 2928747570206465 x12: ffff087d3edbffa8
x11: ffff087d3eb00000 x10: ffff087d3edc0000 x9 : ffffbea2e35e7c78
x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff
x5 : ffff087d3f0eee88 x4 : 0000000000000000 x3 : ffff49da5a45f000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff081eaa143f00
Call trace:
rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
dst_release (include/linux/rcuref.h:94 include/linux/rcuref.h:150 net/core/=
dst.c:166)
rt_cache_route (net/ipv4/route.c:1497)
rt_set_nexthop.isra.0 (net/ipv4/route.c:1604 (discriminator 1))
ip_route_input_slow (include/net/lwtunnel.h:140 net/ipv4/route.c:1873 net/i=
pv4/route.c:2152 net/ipv4/route.c:2338)
ip_route_input_noref (net/ipv4/route.c:2485 net/ipv4/route.c:2496)
ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
ip_sublist_rcv (net/ipv4/ip_input.c:613 (discriminator 1) net/ipv4/ip_input=
.c:639 (discriminator 1))
ip_list_rcv (net/ipv4/ip_input.c:675)
__netif_receive_skb_list_core (net/core/dev.c:5598 net/core/dev.c:5646)
netif_receive_skb_list_internal (net/core/dev.c:5700 net/core/dev.c:5789)
napi_complete_done (include/linux/list.h:37 (discriminator 2) include/net/g=
ro.h:449 (discriminator 2) include/net/gro.h:444 (discriminator 2) net/core=
/dev.c:6129 (discriminator 2))
veth_poll (drivers/net/veth.c:1008 (discriminator 1)) veth
__napi_poll (net/core/dev.c:6559)
bpf_trampoline_6442466812+0xbc/0x1000
__napi_poll (net/core/dev.c:6546)
napi_threaded_poll (include/linux/netpoll.h:89 net/core/dev.c:6703)
kthread (kernel/kthread.c:388)
ret_from_fork (arch/arm64/kernel/entry.S:862)
---[ end trace 0000000000000000 ]---


* x86_64

------------[ cut here ]------------
rcuref - imbalanced put()
WARNING: CPU: 18 PID: 164489 at lib/rcuref.c:267 rcuref_put_slowpath (lib/r=
curef.c:267 (discriminator 1))
Modules linked in: macvlan overlay nft_compat esp4 xt_hashlimit ip_set_hash=
_netport xt_length nf_conntrack_netlink nft_fwd_netdev nf_dup_netdev xfrm_i=
nterface xfrm6_tunnel nft_numgen nft_log nft_limit dummy ip_gre gre cls_bpf=
 xfrm_user xfrm_algo fou6 ip6_tunnel tunnel6 ipip nft_ct nf_tables mpls_gso=
 mpls_iptunnel mpls_router sit tunnel4 fou ip_tunnel ip6_udp_tunnel udp_tun=
nel zstd zram zsmalloc sch_ingress tcp_diag udp_diag inet_diag veth tun tcp=
_bbr sch_fq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio ip6t_REJ=
ECT nf_reject_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6table_se=
curity ip6table_nat ip6_tables xt_LOG nf_log_syslog ipt_REJECT nf_reject_ip=
v4 xt_tcpmss iptable_filter xt_TCPMSS xt_bpf xt_limit xt_multiport xt_NFLOG=
 nfnetlink_log xt_connbytes xt_connlabel xt_statistic xt_mark xt_connmark x=
t_conntrack iptable_mangle xt_nat iptable_nat nf_nat xt_owner xt_set xt_com=
ment xt_tcpudp xt_CT nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw=
 ip_set_hash_ip ip_set_hash_net ip_set raid0
md_mod essiv dm_crypt trusted asn1_encoder tee 8021q garp mrp stp llc nvme_=
fabrics amd64_edac ipmi_ssif kvm_amd kvm irqbypass crc32_pclmul crc32c_inte=
l sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel xhci_pci acpi_ipmi mlx5_=
core rapl mlxfw nvme ipmi_si tls ipmi_devintf tiny_power_button xhci_hcd nv=
me_core ccp psample i2c_piix4 ipmi_msghandler button fuse dm_mod dax nfnetl=
ink efivarfs ip_tables x_tables bcmcrypt(O) crypto_simd cryptd [last unload=
ed: kheaders]
CPU: 18 PID: 164489 Comm: napi/iconduit-g Kdump: loaded Tainted: G         =
  O       6.6.32-cloudflare-2024.5.16 #1
Hardware name: GIGABYTE R162-Z12-CD1/MZ12-HD4-CD, BIOS M06-sig 12/28/2022
RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
Code: 31 c0 eb da 80 3d 23 a5 38 02 00 74 0a c7 03 00 00 00 e0 31 c0 eb c7 =
48 c7 c7 ef cd 0a 90 c6 05 09 a5 38 02 01 e8 69 cc 9c ff <0f> 0b eb df cc c=
c cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90
All code
=3D=3D=3D=3D=3D=3D=3D=3D
   0:   31 c0                   xor    %eax,%eax
   2:   eb da                   jmp    0xffffffffffffffde
   4:   80 3d 23 a5 38 02 00    cmpb   $0x0,0x238a523(%rip)        # 0x238a=
52e
   b:   74 0a                   je     0x17
   d:   c7 03 00 00 00 e0       movl   $0xe0000000,(%rbx)
  13:   31 c0                   xor    %eax,%eax
  15:   eb c7                   jmp    0xffffffffffffffde
  17:   48 c7 c7 ef cd 0a 90    mov    $0xffffffff900acdef,%rdi
  1e:   c6 05 09 a5 38 02 01    movb   $0x1,0x238a509(%rip)        # 0x238a=
52e
  25:   e8 69 cc 9c ff          call   0xffffffffff9ccc93
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   eb df                   jmp    0xd
  2e:   cc                      int3
  2f:   cc                      int3
  30:   cc                      int3
  31:   cc                      int3
  32:   cc                      int3
  33:   90                      nop
  34:   90                      nop
  35:   90                      nop
  36:   90                      nop
  37:   90                      nop
  38:   90                      nop
  39:   90                      nop
  3a:   90                      nop
  3b:   90                      nop
  3c:   90                      nop
  3d:   90                      nop
  3e:   90                      nop
  3f:   90                      nop

Code starting with the faulting instruction
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
   0:   0f 0b                   ud2
   2:   eb df                   jmp    0xffffffffffffffe3
   4:   cc                      int3
   5:   cc                      int3
   6:   cc                      int3
   7:   cc                      int3
   8:   cc                      int3
   9:   90                      nop
   a:   90                      nop
   b:   90                      nop
   c:   90                      nop
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop
  10:   90                      nop
  11:   90                      nop
  12:   90                      nop
  13:   90                      nop
  14:   90                      nop
  15:   90                      nop
RSP: 0018:ffffc90047a23908 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff888cedda4e80 RCX: 0000000000000027
RDX: ffff88a43f720748 RSI: 0000000000000001 RDI: ffff88a43f720740
RBP: ffffc90047a23988 R08: 0000000000000000 R09: ffffc90047a23798
R10: ffff88e06f2cc1a8 R11: 0000000000000003 R12: ffff888cedda4e40
R13: ffffc90047a23a98 R14: 0000000000000000 R15: 000000000a8cf4d5
FS:  0000000000000000(0000) GS:ffff88a43f700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6e59776000 CR3: 000000417ef0c005 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
<TASK>
? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
? __warn (kernel/panic.c:681)
? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
? report_bug (lib/bug.c:180 lib/bug.c:219)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:175)
? prb_read_valid (kernel/printk/printk_ringbuffer.c:1941)
? handle_bug (arch/x86/kernel/traps.c:237)
? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
dst_release (arch/x86/include/asm/preempt.h:95 include/linux/rcuref.h:151 n=
et/core/dst.c:166)
rt_cache_route (net/ipv4/route.c:1497)
rt_set_nexthop.isra.0 (net/ipv4/route.c:1604 (discriminator 1))
ip_route_input_slow (include/net/lwtunnel.h:140 net/ipv4/route.c:1873 net/i=
pv4/route.c:2152 net/ipv4/route.c:2338)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:175)
ip_route_input_noref (net/ipv4/route.c:2485 net/ipv4/route.c:2496)
ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
ip_sublist_rcv (net/ipv4/ip_input.c:613 (discriminator 1) net/ipv4/ip_input=
.c:639 (discriminator 1))
? __pfx_ip_rcv_finish (net/ipv4/ip_input.c:436)
ip_list_rcv (net/ipv4/ip_input.c:675)
__netif_receive_skb_list_core (net/core/dev.c:5598 (discriminator 3) net/co=
re/dev.c:5646 (discriminator 3))
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:175)
netif_receive_skb_list_internal (net/core/dev.c:5700 net/core/dev.c:5789)
napi_complete_done (include/linux/list.h:37 (discriminator 2) include/net/g=
ro.h:449 (discriminator 2) include/net/gro.h:444 (discriminator 2) net/core=
/dev.c:6129 (discriminator 2))
veth_poll (drivers/net/veth.c:1008 (discriminator 1)) veth
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:175)
? psi_group_change (kernel/sched/psi.c:873)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:175)
? __perf_event_task_sched_in (arch/x86/include/asm/atomic.h:23 include/linu=
x/atomic/atomic-arch-fallback.h:444 include/linux/atomic/atomic-instrumente=
d.h:33 kernel/events/core.c:4014)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:175)
? finish_task_switch.isra.0 (arch/x86/include/asm/irqflags.h:42 arch/x86/in=
clude/asm/irqflags.h:77 kernel/sched/sched.h:1386 kernel/sched/core.c:5138 =
kernel/sched/core.c:5256)
__napi_poll (net/core/dev.c:6559)
bpf_trampoline_6442482065+0x79/0x1000
? schedule (arch/x86/include/asm/preempt.h:85 (discriminator 13) kernel/sch=
ed/core.c:6773 (discriminator 13))
__napi_poll (net/core/dev.c:6546)
napi_threaded_poll (include/linux/netpoll.h:89 net/core/dev.c:6703)
? __pfx_napi_threaded_poll (net/core/dev.c:6686)
kthread (kernel/kthread.c:388)
? __pfx_kthread (kernel/kthread.c:341)
ret_from_fork (arch/x86/kernel/process.c:153)
? __pfx_kthread (kernel/kthread.c:341)
ret_from_fork_asm (arch/x86/entry/entry_64.S:314)
</TASK>
---[ end trace 0000000000000000 ]---
.

X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: Robert McMahon <rjmcmahon@rjmcmahon.com>
Date: Fri, 28 Jun 2024 10:33:59 -0700
Message-ID: <CAEBrVk5BNZmJWAK0q+eX0zxEq+VJtf0C7pK-xvCMGTo6Ct6jMQ@mail.gmail.com>
Subject: Little's law byte wait time and iperf 2
To: netdev@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.netdev:355296
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hi All,

I have two questions with respect to adding support for TCP byte wait
times based on Little's law into iperf 2.

1) Is this avg byte wait time field useful
2) Should the queue depth be based on the sampled CWND or the bytes inflight

Thanks in advance for comments on this (including a better column header name)

Example output:

rjmcmahon@fedora:~/Code/pyflows/iperf2-code$ src/iperf -c 192.168.1.77
-i 1 -e --fq-rate 100m -w 4M --tcp-write-prefetch 256K
------------------------------------------------------------
Client connecting to 192.168.1.77, TCP port 5001 with pid 60030 (1/0 flows/load)
Write buffer size: 131072 Byte
fair-queue socket pacing set to  100 Mbit/s
TCP congestion control using cubic
TOS defaults to 0x0 (dscp=0,ecn=0) (Nagle on)
TCP window size: 8.00 MByte (WARNING: requested 4.00 MByte)
Event based writes (pending queue watermark at 262144 bytes)
------------------------------------------------------------
[  1] local 192.168.1.103%enp4s0 port 55468 connected with
192.168.1.77 port 5001 (prefetch=262144) (icwnd/mss/irtt=14/1448/627)
(ct=0.69 ms) on 2024-06-28 10:21:14.724 (PDT)
[ ID] Interval        Transfer    Bandwidth      Wait        Write/Err
 Rtry     InF(pkts)/Cwnd(pkts)/RTT(var)  fq-rate  NetPwr
[  1] 0.00-1.00 sec  12.3 MBytes   103 Mbits/sec  15.147 ms    99/0
   0      190K(135)/207K(147)/403(25) us  100 Mbit/sec 31874
[  1] 1.00-2.00 sec  11.9 MBytes  99.6 Mbits/sec  10.444 ms    95/0
   0      127K(90)/207K(147)/384(21) us  100 Mbit/sec 32427
[  1] 2.00-3.00 sec  12.0 MBytes   101 Mbits/sec  15.462 ms    96/0
   0      190K(135)/207K(147)/366(17) us  100 Mbit/sec 34380
[  1] 3.00-4.00 sec  11.9 MBytes  99.6 Mbits/sec  10.444 ms    95/0
   0      127K(90)/207K(147)/385(13) us  100 Mbit/sec 32342
[  1] 4.00-5.00 sec  12.0 MBytes   101 Mbits/sec  15.462 ms    96/0
   0      190K(135)/207K(147)/378(26) us  100 Mbit/sec 33288
[  1] 5.00-6.00 sec  11.9 MBytes  99.6 Mbits/sec  10.444 ms    95/0
   0      127K(90)/207K(147)/394(30) us  100 Mbit/sec 31604
[  1] 6.00-7.00 sec  11.9 MBytes  99.6 Mbits/sec  10.444 ms    95/0
   0      127K(90)/207K(147)/385(19) us  100 Mbit/sec 32342
[  1] 7.00-8.00 sec  12.0 MBytes   101 Mbits/sec  15.462 ms    96/0
   0      190K(135)/207K(147)/384(19) us  100 Mbit/sec 32768
[  1] 8.00-9.00 sec  11.9 MBytes  99.6 Mbits/sec  10.444 ms    95/0
   0      127K(90)/207K(147)/349(17) us  100 Mbit/sec 35679
[  1] 9.00-10.00 sec  12.0 MBytes   101 Mbits/sec  15.462 ms    96/0
    0      190K(135)/207K(20) us 32113

Bob

https://en.wikipedia.org/wiki/Little%27s_law
.

Date: Fri, 28 Jun 2024 18:54:46 +0000
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240628185446.262191-1-rushilg@google.com>
Subject: [PATCH] gve: Add retry logic for recoverable adminq errors
From: Rushil Gupta <rushilg@google.com>
To: netdev@vger.kernel.org
Cc: jeroendb@google.com, pkaligineedi@google.com, davem@davemloft.net, 
	kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, willemb@google.com, 
	hramamurthy@google.com, Rushil Gupta <rushilg@google.com>, 
	Shailend Chand <shailend@google.com>, Ziwei Xiao <ziweixiao@google.com>
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.netdev:355297
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

From: Jeroen de Borst <jeroendb@google.com>

An adminq command is retried if it fails with an ETIME error code
which translates to the deadline exceeded error for the device.
The create and destroy adminq commands are now managed via a common
method. This method keeps track of return codes for each queue and retries
the commands for the queues that failed with ETIME.
Other adminq commands that do not require queue level granularity are
simply retried in gve_adminq_execute_cmd.

Signed-off-by: Rushil Gupta <rushilg@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Ziwei Xiao <ziweixiao@google.com>
---
 drivers/net/ethernet/google/gve/gve_adminq.c | 197 ++++++++++++-------
 drivers/net/ethernet/google/gve/gve_adminq.h |   5 +
 2 files changed, 129 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index c5bbc1b7524e..74c61b90ea45 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -12,7 +12,7 @@
 
 #define GVE_MAX_ADMINQ_RELEASE_CHECK	500
 #define GVE_ADMINQ_SLEEP_LEN		20
-#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	1000
 
 #define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n" \
 "Expected: length=%d, feature_mask=%x.\n" \
@@ -415,14 +415,17 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
 /* Flushes all AQ commands currently queued and waits for them to complete.
  * If there are failures, it will return the first error.
  */
-static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+static int gve_adminq_kick_and_wait(struct gve_priv *priv, int ret_cnt, int *ret_codes)
 {
 	int tail, head;
-	int i;
+	int i, j;
 
 	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
 	head = priv->adminq_prod_cnt;
 
+	if ((head - tail) > ret_cnt)
+		return -EINVAL;
+
 	gve_adminq_kick_cmd(priv, head);
 	if (!gve_adminq_wait_for_cmd(priv, head)) {
 		dev_err(&priv->pdev->dev, "AQ commands timed out, need to reset AQ\n");
@@ -430,16 +433,13 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv)
 		return -ENOTRECOVERABLE;
 	}
 
-	for (i = tail; i < head; i++) {
+	for (i = tail, j = 0; i < head; i++, j++) {
 		union gve_adminq_command *cmd;
-		u32 status, err;
+		u32 status;
 
 		cmd = &priv->adminq[i & priv->adminq_mask];
 		status = be32_to_cpu(READ_ONCE(cmd->status));
-		err = gve_adminq_parse_err(priv, status);
-		if (err)
-			// Return the first error if we failed.
-			return err;
+		ret_codes[j] = gve_adminq_parse_err(priv, status);
 	}
 
 	return 0;
@@ -458,24 +458,8 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
 	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
 
 	// Check if next command will overflow the buffer.
-	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
-	    (tail & priv->adminq_mask)) {
-		int err;
-
-		// Flush existing commands to make room.
-		err = gve_adminq_kick_and_wait(priv);
-		if (err)
-			return err;
-
-		// Retry.
-		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
-		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
-		    (tail & priv->adminq_mask)) {
-			// This should never happen. We just flushed the
-			// command queue so there should be enough space.
-			return -ENOMEM;
-		}
-	}
+	if ((priv->adminq_prod_cnt - tail) > priv->adminq_mask)
+		return -ENOMEM;
 
 	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
 	priv->adminq_prod_cnt++;
@@ -544,8 +528,9 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
 static int gve_adminq_execute_cmd(struct gve_priv *priv,
 				  union gve_adminq_command *cmd_orig)
 {
+	int retry_cnt = 0;
 	u32 tail, head;
-	int err;
+	int err, ret;
 
 	mutex_lock(&priv->adminq_lock);
 	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
@@ -555,15 +540,21 @@ static int gve_adminq_execute_cmd(struct gve_priv *priv,
 		goto out;
 	}
 
-	err = gve_adminq_issue_cmd(priv, cmd_orig);
-	if (err)
-		goto out;
+	do {
+		err = gve_adminq_issue_cmd(priv, cmd_orig);
+		if (err)
+			goto out;
 
-	err = gve_adminq_kick_and_wait(priv);
+		err = gve_adminq_kick_and_wait(priv, 1, &ret);
+		if (err)
+			goto out;
+	} while (ret == -ETIME && ++retry_cnt < GVE_ADMINQ_RETRY_COUNT);
 
 out:
 	mutex_unlock(&priv->adminq_lock);
-	return err;
+	if (err)
+		return err;
+	return ret;
 }
 
 static int gve_adminq_execute_extended_cmd(struct gve_priv *priv, u32 opcode,
@@ -638,6 +629,98 @@ int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
 	return gve_adminq_execute_cmd(priv, &cmd);
 }
 
+typedef int (gve_adminq_queue_cmd) (struct gve_priv *priv, u32 queue_index);
+
+static int gve_adminq_manage_queues(struct gve_priv *priv,
+				    gve_adminq_queue_cmd *cmd,
+				    u32 start_id, u32 num_queues)
+{
+	u32 cmd_idx, queue_idx, ret_code_idx;
+	int queue_done = -1;
+	int *queues_waiting;
+	int retry_cnt = 0;
+	int retry_needed;
+	int *ret_codes;
+	int *commands;
+	int err;
+	int ret;
+
+	queues_waiting = kvcalloc(num_queues, sizeof(int), GFP_KERNEL);
+	if (!queues_waiting)
+		return -ENOMEM;
+	ret_codes = kvcalloc(num_queues, sizeof(int), GFP_KERNEL);
+	if (!ret_codes) {
+		err = -ENOMEM;
+		goto free_with_queues_waiting;
+	}
+	commands = kvcalloc(num_queues, sizeof(int), GFP_KERNEL);
+	if (!commands) {
+		err = -ENOMEM;
+		goto free_with_ret_codes;
+	}
+
+	for (queue_idx = 0; queue_idx < num_queues; queue_idx++)
+		queues_waiting[queue_idx] = start_id + queue_idx;
+	do {
+		retry_needed = 0;
+		queue_idx = 0;
+		while (queue_idx < num_queues) {
+			cmd_idx = 0;
+			while (queue_idx < num_queues) {
+				if (queues_waiting[queue_idx] != queue_done) {
+					err = cmd(priv, queues_waiting[queue_idx]);
+					if (err == -ENOMEM)
+						break;
+					if (err)
+						goto free_with_commands;
+					commands[cmd_idx++] = queue_idx;
+				}
+				queue_idx++;
+			}
+
+			if (queue_idx < num_queues)
+				dev_dbg(&priv->pdev->dev,
+					"Issued %d of %d batched commands\n",
+					queue_idx, num_queues);
+
+			err = gve_adminq_kick_and_wait(priv, cmd_idx, ret_codes);
+			if (err)
+				goto free_with_commands;
+
+			for (ret_code_idx = 0; ret_code_idx < cmd_idx; ret_code_idx++) {
+				if (ret_codes[ret_code_idx] == 0) {
+					queues_waiting[commands[ret_code_idx]] = queue_done;
+				} else if (ret_codes[ret_code_idx] != -ETIME) {
+					ret = ret_codes[ret_code_idx];
+					goto free_with_commands;
+				} else {
+					retry_needed++;
+				}
+			}
+
+			if (retry_needed)
+				dev_dbg(&priv->pdev->dev,
+					"Issued %d batched commands, %d needed a retry\n",
+					cmd_idx, retry_needed);
+		}
+	} while (retry_needed && ++retry_cnt < GVE_ADMINQ_RETRY_COUNT);
+
+	ret = retry_needed ? -ETIME : 0;
+
+free_with_commands:
+	kvfree(commands);
+	commands = NULL;
+free_with_ret_codes:
+	kvfree(ret_codes);
+	ret_codes = NULL;
+free_with_queues_waiting:
+	kvfree(queues_waiting);
+	queues_waiting = NULL;
+	if (err)
+		return err;
+	return ret;
+}
+
 static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	struct gve_tx_ring *tx = &priv->tx[queue_index];
@@ -678,16 +761,8 @@ static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 start_id, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = start_id; i < start_id + num_queues; i++) {
-		err = gve_adminq_create_tx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_create_tx_queue,
+					start_id, num_queues);
 }
 
 static void gve_adminq_get_create_rx_queue_cmd(struct gve_priv *priv,
@@ -759,16 +834,8 @@ int gve_adminq_create_single_rx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = 0; i < num_queues; i++) {
-		err = gve_adminq_create_rx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_create_rx_queue,
+					0, num_queues);
 }
 
 static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
@@ -791,16 +858,8 @@ static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 start_id, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = start_id; i < start_id + num_queues; i++) {
-		err = gve_adminq_destroy_tx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_destroy_tx_queue,
+					start_id, num_queues);
 }
 
 static void gve_adminq_make_destroy_rx_queue_cmd(union gve_adminq_command *cmd,
@@ -832,16 +891,8 @@ int gve_adminq_destroy_single_rx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = 0; i < num_queues; i++) {
-		err = gve_adminq_destroy_rx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_destroy_rx_queue,
+					0, num_queues);
 }
 
 static void gve_set_default_desc_cnt(struct gve_priv *priv,
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index ed1370c9b197..96e98f65273c 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -62,6 +62,11 @@ enum gve_adminq_statuses {
 	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
 };
 
+/* AdminQ commands (that aren't batched) will be retried if they encounter
+ * an recoverable error.
+ */
+#define GVE_ADMINQ_RETRY_COUNT 3
+
 #define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
 
 /* All AdminQ command structs should be naturally packed. The static_assert
-- 
2.45.2.803.g4e1b14247a-goog

.

From: Michael Chan <michael.chan@broadcom.com>
To: davem@davemloft.net
Cc: netdev@vger.kernel.org,
	edumazet@google.com,
	kuba@kernel.org,
	pabeni@redhat.com,
	pavan.chebbi@broadcom.com,
	andrew.gospodarek@broadcom.com,
	richardcochran@gmail.com,
	horms@kernel.org,
	przemyslaw.kitszel@intel.com
Subject: [PATCH net-next v2 00/10] bnxt_en: PTP updates for net-next
Date: Fri, 28 Jun 2024 12:29:55 -0700
Message-ID: <20240628193006.225906-1-michael.chan@broadcom.com>
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256;
	boundary="0000000000007c521d061bf8479e"
Xref: photonic.trudheim.com org.kernel.vger.netdev:355298
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

--0000000000007c521d061bf8479e
Content-Transfer-Encoding: 8bit

The first 5 patches implement the PTP feature on the new BCM5760X
chips.  The main new hardware feature is the new TX timestamp
completion which enables the driver to retrieve the TX timestamp
in NAPI without deferring to the PTP worker.

The last 5 patches increase the number of TX PTP packets in-flight
from 1 to 4 on the older BCM5750X chips.  On these older chips, we
need to call firmware in the PTP worker to retrieve the timestamp.
We use an arry to keep track of the in-flight TX PTP packets.

v2: Patch #2: Fix the unwind of txr->is_ts_pkt when bnxt_start_xmit() aborts.
    Patch #4: Set the SKBTX_IN_PROGRESS flag for timestamp packets.

Michael Chan (4):
  bnxt_en: Add new TX timestamp completion definitions
  bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd
  bnxt_en: Allow some TX packets to be unprocessed in NAPI
  bnxt_en: Add TX timestamp completion logic

Pavan Chebbi (6):
  bnxt_en: Add BCM5760X specific PHC registers mapping
  bnxt_en: Refactor all PTP TX timestamp fields into a struct
  bnxt_en: Remove an impossible condition check for PTP TX pending SKB
  bnxt_en: Let bnxt_stamp_tx_skb() return error code
  bnxt_en: Increase the max total outstanding PTP TX packets to 4
  bnxt_en: Remove atomic operations on ptp->tx_avail

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 106 +++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  42 ++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c | 151 +++++++++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h |  36 ++++-
 4 files changed, 252 insertions(+), 83 deletions(-)

-- 
2.30.1


--0000000000007c521d061bf8479e
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIIQbQYJKoZIhvcNAQcCoIIQXjCCEFoCAQExDzANBglghkgBZQMEAgEFADALBgkqhkiG9w0BBwGg
gg3EMIIFDTCCA/WgAwIBAgIQeEqpED+lv77edQixNJMdADANBgkqhkiG9w0BAQsFADBMMSAwHgYD
VQQLExdHbG9iYWxTaWduIFJvb3QgQ0EgLSBSMzETMBEGA1UEChMKR2xvYmFsU2lnbjETMBEGA1UE
AxMKR2xvYmFsU2lnbjAeFw0yMDA5MTYwMDAwMDBaFw0yODA5MTYwMDAwMDBaMFsxCzAJBgNVBAYT
AkJFMRkwFwYDVQQKExBHbG9iYWxTaWduIG52LXNhMTEwLwYDVQQDEyhHbG9iYWxTaWduIEdDQyBS
MyBQZXJzb25hbFNpZ24gMiBDQSAyMDIwMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
vbCmXCcsbZ/a0fRIQMBxp4gJnnyeneFYpEtNydrZZ+GeKSMdHiDgXD1UnRSIudKo+moQ6YlCOu4t
rVWO/EiXfYnK7zeop26ry1RpKtogB7/O115zultAz64ydQYLe+a1e/czkALg3sgTcOOcFZTXk38e
aqsXsipoX1vsNurqPtnC27TWsA7pk4uKXscFjkeUE8JZu9BDKaswZygxBOPBQBwrA5+20Wxlk6k1
e6EKaaNaNZUy30q3ArEf30ZDpXyfCtiXnupjSK8WU2cK4qsEtj09JS4+mhi0CTCrCnXAzum3tgcH
cHRg0prcSzzEUDQWoFxyuqwiwhHu3sPQNmFOMwIDAQABo4IB2jCCAdYwDgYDVR0PAQH/BAQDAgGG
MGAGA1UdJQRZMFcGCCsGAQUFBwMCBggrBgEFBQcDBAYKKwYBBAGCNxQCAgYKKwYBBAGCNwoDBAYJ
KwYBBAGCNxUGBgorBgEEAYI3CgMMBggrBgEFBQcDBwYIKwYBBQUHAxEwEgYDVR0TAQH/BAgwBgEB
/wIBADAdBgNVHQ4EFgQUljPR5lgXWzR1ioFWZNW+SN6hj88wHwYDVR0jBBgwFoAUj/BLf6guRSSu
TVD6Y5qL3uLdG7wwegYIKwYBBQUHAQEEbjBsMC0GCCsGAQUFBzABhiFodHRwOi8vb2NzcC5nbG9i
YWxzaWduLmNvbS9yb290cjMwOwYIKwYBBQUHMAKGL2h0dHA6Ly9zZWN1cmUuZ2xvYmFsc2lnbi5j
b20vY2FjZXJ0L3Jvb3QtcjMuY3J0MDYGA1UdHwQvMC0wK6ApoCeGJWh0dHA6Ly9jcmwuZ2xvYmFs
c2lnbi5jb20vcm9vdC1yMy5jcmwwWgYDVR0gBFMwUTALBgkrBgEEAaAyASgwQgYKKwYBBAGgMgEo
CjA0MDIGCCsGAQUFBwIBFiZodHRwczovL3d3dy5nbG9iYWxzaWduLmNvbS9yZXBvc2l0b3J5LzAN
BgkqhkiG9w0BAQsFAAOCAQEAdAXk/XCnDeAOd9nNEUvWPxblOQ/5o/q6OIeTYvoEvUUi2qHUOtbf
jBGdTptFsXXe4RgjVF9b6DuizgYfy+cILmvi5hfk3Iq8MAZsgtW+A/otQsJvK2wRatLE61RbzkX8
9/OXEZ1zT7t/q2RiJqzpvV8NChxIj+P7WTtepPm9AIj0Keue+gS2qvzAZAY34ZZeRHgA7g5O4TPJ
/oTd+4rgiU++wLDlcZYd/slFkaT3xg4qWDepEMjT4T1qFOQIL+ijUArYS4owpPg9NISTKa1qqKWJ
jFoyms0d0GwOniIIbBvhI2MJ7BSY9MYtWVT5jJO3tsVHwj4cp92CSFuGwunFMzCCA18wggJHoAMC
AQICCwQAAAAAASFYUwiiMA0GCSqGSIb3DQEBCwUAMEwxIDAeBgNVBAsTF0dsb2JhbFNpZ24gUm9v
dCBDQSAtIFIzMRMwEQYDVQQKEwpHbG9iYWxTaWduMRMwEQYDVQQDEwpHbG9iYWxTaWduMB4XDTA5
MDMxODEwMDAwMFoXDTI5MDMxODEwMDAwMFowTDEgMB4GA1UECxMXR2xvYmFsU2lnbiBSb290IENB
IC0gUjMxEzARBgNVBAoTCkdsb2JhbFNpZ24xEzARBgNVBAMTCkdsb2JhbFNpZ24wggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDMJXaQeQZ4Ihb1wIO2hMoonv0FdhHFrYhy/EYCQ8eyip0E
XyTLLkvhYIJG4VKrDIFHcGzdZNHr9SyjD4I9DCuul9e2FIYQebs7E4B3jAjhSdJqYi8fXvqWaN+J
J5U4nwbXPsnLJlkNc96wyOkmDoMVxu9bi9IEYMpJpij2aTv2y8gokeWdimFXN6x0FNx04Druci8u
nPvQu7/1PQDhBjPogiuuU6Y6FnOM3UEOIDrAtKeh6bJPkC4yYOlXy7kEkmho5TgmYHWyn3f/kRTv
riBJ/K1AFUjRAjFhGV64l++td7dkmnq/X8ET75ti+w1s4FRpFqkD2m7pg5NxdsZphYIXAgMBAAGj
QjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBSP8Et/qC5FJK5N
UPpjmove4t0bvDANBgkqhkiG9w0BAQsFAAOCAQEAS0DbwFCq/sgM7/eWVEVJu5YACUGssxOGhigH
M8pr5nS5ugAtrqQK0/Xx8Q+Kv3NnSoPHRHt44K9ubG8DKY4zOUXDjuS5V2yq/BKW7FPGLeQkbLmU
Y/vcU2hnVj6DuM81IcPJaP7O2sJTqsyQiunwXUaMld16WCgaLx3ezQA3QY/tRG3XUyiXfvNnBB4V
14qWtNPeTCekTBtzc3b0F5nCH3oO4y0IrQocLP88q1UOD5F+NuvDV0m+4S4tfGCLw0FREyOdzvcy
a5QBqJnnLDMfOjsl0oZAzjsshnjJYS8Uuu7bVW/fhO4FCU29KNhyztNiUGUe65KXgzHZs7XKR1g/
XzCCBUwwggQ0oAMCAQICDF5AaMOe0cZvaJpCQjANBgkqhkiG9w0BAQsFADBbMQswCQYDVQQGEwJC
RTEZMBcGA1UEChMQR2xvYmFsU2lnbiBudi1zYTExMC8GA1UEAxMoR2xvYmFsU2lnbiBHQ0MgUjMg
UGVyc29uYWxTaWduIDIgQ0EgMjAyMDAeFw0yMjA5MTAwODIxMzhaFw0yNTA5MTAwODIxMzhaMIGO
MQswCQYDVQQGEwJJTjESMBAGA1UECBMJS2FybmF0YWthMRIwEAYDVQQHEwlCYW5nYWxvcmUxFjAU
BgNVBAoTDUJyb2FkY29tIEluYy4xFTATBgNVBAMTDE1pY2hhZWwgQ2hhbjEoMCYGCSqGSIb3DQEJ
ARYZbWljaGFlbC5jaGFuQGJyb2FkY29tLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC
ggEBALhEmG7egFWvPKcrDxuNhNcn2oHauIHc8AzGhPyJxU4S6ZUjHM/psoNo5XxlMSRpYE7g7vLx
J4NBefU36XTEWVzbEkAuOSuJTuJkm98JE3+wjeO+aQTbNF3mG2iAe0AZbAWyqFxZulWitE8U2tIC
9mttDjSN/wbltcwuti7P57RuR+WyZstDlPJqUMm1rJTbgDqkF2pnvufc4US2iexnfjGopunLvioc
OnaLEot1MoQO7BIe5S9H4AcCEXXcrJJiAtMCl47ARpyHmvQFQFFTrHgUYEd9V+9bOzY7MBIGSV1N
/JfsT1sZw6HT0lJkSQefhPGpBniAob62DJP3qr11tu8CAwEAAaOCAdowggHWMA4GA1UdDwEB/wQE
AwIFoDCBowYIKwYBBQUHAQEEgZYwgZMwTgYIKwYBBQUHMAKGQmh0dHA6Ly9zZWN1cmUuZ2xvYmFs
c2lnbi5jb20vY2FjZXJ0L2dzZ2NjcjNwZXJzb25hbHNpZ24yY2EyMDIwLmNydDBBBggrBgEFBQcw
AYY1aHR0cDovL29jc3AuZ2xvYmFsc2lnbi5jb20vZ3NnY2NyM3BlcnNvbmFsc2lnbjJjYTIwMjAw
TQYDVR0gBEYwRDBCBgorBgEEAaAyASgKMDQwMgYIKwYBBQUHAgEWJmh0dHBzOi8vd3d3Lmdsb2Jh
bHNpZ24uY29tL3JlcG9zaXRvcnkvMAkGA1UdEwQCMAAwSQYDVR0fBEIwQDA+oDygOoY4aHR0cDov
L2NybC5nbG9iYWxzaWduLmNvbS9nc2djY3IzcGVyc29uYWxzaWduMmNhMjAyMC5jcmwwJAYDVR0R
BB0wG4EZbWljaGFlbC5jaGFuQGJyb2FkY29tLmNvbTATBgNVHSUEDDAKBggrBgEFBQcDBDAfBgNV
HSMEGDAWgBSWM9HmWBdbNHWKgVZk1b5I3qGPzzAdBgNVHQ4EFgQU31rAyTdZweIF0tJTFYwfOv2w
L4QwDQYJKoZIhvcNAQELBQADggEBACcuyaGmk0NSZ7Kio7O7WSZ0j0f9xXcBnLbJvQXFYM7JI5uS
kw5ozATEN5gfmNIe0AHzqwoYjAf3x8Dv2w7HgyrxWdpjTKQFv5jojxa3A5LVuM8mhPGZfR/L5jSk
5xc3llsKqrWI4ov4JyW79p0E99gfPA6Waixoavxvv1CZBQ4Stu7N660kTu9sJrACf20E+hdKLoiU
hd5wiQXo9B2ncm5P3jFLYLBmPltIn/uzdiYpFj+E9kS9XYDd+boBZhN1Vh0296zLQZobLfKFzClo
E6IFyTTANonrXvCRgodKS+QJEH8Syu2jSKe023aVemkuZjzvPK7o9iU7BKkPG2pzLPgxggJtMIIC
aQIBATBrMFsxCzAJBgNVBAYTAkJFMRkwFwYDVQQKExBHbG9iYWxTaWduIG52LXNhMTEwLwYDVQQD
EyhHbG9iYWxTaWduIEdDQyBSMyBQZXJzb25hbFNpZ24gMiBDQSAyMDIwAgxeQGjDntHGb2iaQkIw
DQYJYIZIAWUDBAIBBQCggdQwLwYJKoZIhvcNAQkEMSIEIG7KRbULUEh5DuNjdGc1uGtdN3SW4Gtc
/vYH1tELsqYLMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTI0MDYy
ODE5MzAyNVowaQYJKoZIhvcNAQkPMVwwWjALBglghkgBZQMEASowCwYJYIZIAWUDBAEWMAsGCWCG
SAFlAwQBAjAKBggqhkiG9w0DBzALBgkqhkiG9w0BAQowCwYJKoZIhvcNAQEHMAsGCWCGSAFlAwQC
ATANBgkqhkiG9w0BAQEFAASCAQApDnU4pEtR/s8m9aMxjMGsSjEYK10xu18QkZX7HBWwxPHE0wrH
LAMSxLSNprtFeJURBvM9mLlevpBe9ahEmSP/a3q28LKl+Y7dhjIuS4pv9GsknwtpZniIdMD6SWTg
R4hw4REvv6gxkp5cWzf7dbTK16HZnZX1P0t1irtJuY6UPYobkWyZ4NyXUJNOhUy3wdac+3LPxriH
dlwEjL5Ufc/7CPMrbFmOqzHRj2HhqJk21WP4iGgQpUXvDsXonr9FiqrMiWScduEZJB1e9fCNy60c
aNK6uxlSz0L0S3LbaPO/JwV2SBgkZjwzJH3DC6ODJU+SoCygq8jZ07wOM5R/fImM
--0000000000007c521d061bf8479e--
.

From: Tony Nguyen <anthony.l.nguyen@intel.com>
To: davem@davemloft.net,
	kuba@kernel.org,
	pabeni@redhat.com,
	edumazet@google.com,
	netdev@vger.kernel.org
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Subject: [PATCH net-next 0/6][pull request] Intel Wired LAN Driver Updates 2024-06-28 (MAINTAINERS, ice)
Date: Fri, 28 Jun 2024 13:13:18 -0700
Message-ID: <20240628201328.2738672-1-anthony.l.nguyen@intel.com>
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.netdev:355310
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

This series contains updates to MAINTAINERS file and ice driver.

Jesse replaces himself with Przemek in the maintainers file.

Karthik Sundaravel adds support for VF get/set MAC address via devlink.

Eric checks for errors from ice_vsi_rebuild() during queue
reconfiguration.

Paul adjusts FW API version check for E830 devices.

Piotr adds differentiation of unload type when shutting down AdminQ.

Przemek changes ice_adapter initialization to occur once per physical
card.

The following are changes since commit 748e3bbf47212d5e2e22d731328b0c15ee3b85ae:
  Merge branch 'net-selftests-mirroring-cleanup' into main
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE

Eric Joyner (1):
  ice: Check all ice_vsi_rebuild() errors in function

Jesse Brandeburg (1):
  MAINTAINERS: update Intel Ethernet maintainers

Karthik Sundaravel (1):
  ice: Add get/set hw address for VFs using devlink commands

Paul Greenwalt (1):
  ice: Allow different FW API versions based on MAC type

Piotr Gardocki (1):
  ice: Distinguish driver reset and removal for AQ shutdown

Przemek Kitszel (1):
  ice: do not init struct ice_adapter more times than needed

 MAINTAINERS                                   |  2 +-
 .../ethernet/intel/ice/devlink/devlink_port.c | 59 +++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_adapter.c  | 60 +++++++++----------
 drivers/net/ethernet/intel/ice/ice_common.h   |  2 +-
 drivers/net/ethernet/intel/ice/ice_controlq.c | 30 ++++++----
 drivers/net/ethernet/intel/ice/ice_controlq.h | 15 ++++-
 drivers/net/ethernet/intel/ice/ice_main.c     | 19 ++++--
 drivers/net/ethernet/intel/ice/ice_sriov.c    | 34 ++++++++---
 drivers/net/ethernet/intel/ice/ice_sriov.h    |  8 +++
 9 files changed, 165 insertions(+), 64 deletions(-)

-- 
2.41.0

.

From: Daniel Borkmann <daniel@iogearbox.net>
To: kuba@kernel.org
Cc: netdev@vger.kernel.org,
	bpf@vger.kernel.org,
	Daniel Borkmann <daniel@iogearbox.net>,
	Lex Siegel <usiegl00@gmail.com>,
	Neil Brown <neilb@suse.de>,
	Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>
Subject: [PATCH net v3] net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
Date: Fri, 28 Jun 2024 22:35:25 +0200
Message-ID: <20240628203525.XyTsNaBIb4l-V1xlGXBeMUd7eP6S45oVNFYE81_k2p0@z>
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.netdev:355317
Newsgroups: org.kernel.vger.netdev,org.kernel.vger.bpf
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

When using a BPF program on kernel_connect(), the call can return -EPERM. This
causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
the kernel to potentially freeze up.

Neil suggested:

  This will propagate -EPERM up into other layers which might not be ready
  to handle it. It might be safer to map EPERM to an error we would be more
  likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.

ECONNREFUSED as error seems reasonable. For programs setting a different error
can be out of reach (see handling in 4fbac77d2d09) in particular on kernels
which do not have f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
instead of allow boolean"), thus given that it is better to simply remap for
consistent behavior. UDP does handle EPERM in xs_udp_send_request().

Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Co-developed-by: Lex Siegel <usiegl00@gmail.com>
Signed-off-by: Lex Siegel <usiegl00@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@kernel.org>
Cc: Anna Schumaker <anna@kernel.org>
Link: https://github.com/cilium/cilium/issues/33395
Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.neil.brown.name
---
 [ Fixes tags are set to the orig connect commit so that stable team
   can pick this up. ]

 v1 -> v2 -> v3:
   - Plain resend, adding correct sunrpc folks to Cc
     https://lore.kernel.org/bpf/Zn7wtStV+iafWRXj@tissot.1015granger.net/

 net/sunrpc/xprtsock.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index dfc353eea8ed..0e1691316f42 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2441,6 +2441,13 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		transport->srcport = 0;
 		status = -EAGAIN;
 		break;
+	case -EPERM:
+		/* Happens, for instance, if a BPF program is preventing
+		 * the connect. Remap the error so upper layers can better
+		 * deal with it.
+		 */
+		status = -ECONNREFUSED;
+		fallthrough;
 	case -EINVAL:
 		/* Happens, for instance, if the user specified a link
 		 * local IPv6 address without a scope-id.
-- 
2.21.0

.

Date: Fri, 28 Jun 2024 20:41:39 +0000
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240628204139.458075-1-rushilg@google.com>
Subject: [PATCH net-next] gve: Add retry logic for recoverable adminq errors
From: Rushil Gupta <rushilg@google.com>
To: netdev@vger.kernel.org
Cc: jeroendb@google.com, pkaligineedi@google.com, davem@davemloft.net, 
	kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, willemb@google.com, 
	hramamurthy@google.com, Rushil Gupta <rushilg@google.com>, 
	Shailend Chand <shailend@google.com>, Ziwei Xiao <ziweixiao@google.com>
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.netdev:355318
Newsgroups: org.kernel.vger.netdev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

From: Jeroen de Borst <jeroendb@google.com>

An adminq command is retried if it fails with an ETIME error code
which translates to the deadline exceeded error for the device.
The create and destroy adminq commands are now managed via a common
method. This method keeps track of return codes for each queue and retries
the commands for the queues that failed with ETIME.
Other adminq commands that do not require queue level granularity are
simply retried in gve_adminq_execute_cmd.

Signed-off-by: Rushil Gupta <rushilg@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Ziwei Xiao <ziweixiao@google.com>
---
 drivers/net/ethernet/google/gve/gve_adminq.c | 197 ++++++++++++-------
 drivers/net/ethernet/google/gve/gve_adminq.h |   5 +
 2 files changed, 129 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index c5bbc1b7524e..74c61b90ea45 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -12,7 +12,7 @@
 
 #define GVE_MAX_ADMINQ_RELEASE_CHECK	500
 #define GVE_ADMINQ_SLEEP_LEN		20
-#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	1000
 
 #define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n" \
 "Expected: length=%d, feature_mask=%x.\n" \
@@ -415,14 +415,17 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
 /* Flushes all AQ commands currently queued and waits for them to complete.
  * If there are failures, it will return the first error.
  */
-static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+static int gve_adminq_kick_and_wait(struct gve_priv *priv, int ret_cnt, int *ret_codes)
 {
 	int tail, head;
-	int i;
+	int i, j;
 
 	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
 	head = priv->adminq_prod_cnt;
 
+	if ((head - tail) > ret_cnt)
+		return -EINVAL;
+
 	gve_adminq_kick_cmd(priv, head);
 	if (!gve_adminq_wait_for_cmd(priv, head)) {
 		dev_err(&priv->pdev->dev, "AQ commands timed out, need to reset AQ\n");
@@ -430,16 +433,13 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv)
 		return -ENOTRECOVERABLE;
 	}
 
-	for (i = tail; i < head; i++) {
+	for (i = tail, j = 0; i < head; i++, j++) {
 		union gve_adminq_command *cmd;
-		u32 status, err;
+		u32 status;
 
 		cmd = &priv->adminq[i & priv->adminq_mask];
 		status = be32_to_cpu(READ_ONCE(cmd->status));
-		err = gve_adminq_parse_err(priv, status);
-		if (err)
-			// Return the first error if we failed.
-			return err;
+		ret_codes[j] = gve_adminq_parse_err(priv, status);
 	}
 
 	return 0;
@@ -458,24 +458,8 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
 	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
 
 	// Check if next command will overflow the buffer.
-	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
-	    (tail & priv->adminq_mask)) {
-		int err;
-
-		// Flush existing commands to make room.
-		err = gve_adminq_kick_and_wait(priv);
-		if (err)
-			return err;
-
-		// Retry.
-		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
-		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
-		    (tail & priv->adminq_mask)) {
-			// This should never happen. We just flushed the
-			// command queue so there should be enough space.
-			return -ENOMEM;
-		}
-	}
+	if ((priv->adminq_prod_cnt - tail) > priv->adminq_mask)
+		return -ENOMEM;
 
 	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
 	priv->adminq_prod_cnt++;
@@ -544,8 +528,9 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
 static int gve_adminq_execute_cmd(struct gve_priv *priv,
 				  union gve_adminq_command *cmd_orig)
 {
+	int retry_cnt = 0;
 	u32 tail, head;
-	int err;
+	int err, ret;
 
 	mutex_lock(&priv->adminq_lock);
 	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
@@ -555,15 +540,21 @@ static int gve_adminq_execute_cmd(struct gve_priv *priv,
 		goto out;
 	}
 
-	err = gve_adminq_issue_cmd(priv, cmd_orig);
-	if (err)
-		goto out;
+	do {
+		err = gve_adminq_issue_cmd(priv, cmd_orig);
+		if (err)
+			goto out;
 
-	err = gve_adminq_kick_and_wait(priv);
+		err = gve_adminq_kick_and_wait(priv, 1, &ret);
+		if (err)
+			goto out;
+	} while (ret == -ETIME && ++retry_cnt < GVE_ADMINQ_RETRY_COUNT);
 
 out:
 	mutex_unlock(&priv->adminq_lock);
-	return err;
+	if (err)
+		return err;
+	return ret;
 }
 
 static int gve_adminq_execute_extended_cmd(struct gve_priv *priv, u32 opcode,
@@ -638,6 +629,98 @@ int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
 	return gve_adminq_execute_cmd(priv, &cmd);
 }
 
+typedef int (gve_adminq_queue_cmd) (struct gve_priv *priv, u32 queue_index);
+
+static int gve_adminq_manage_queues(struct gve_priv *priv,
+				    gve_adminq_queue_cmd *cmd,
+				    u32 start_id, u32 num_queues)
+{
+	u32 cmd_idx, queue_idx, ret_code_idx;
+	int queue_done = -1;
+	int *queues_waiting;
+	int retry_cnt = 0;
+	int retry_needed;
+	int *ret_codes;
+	int *commands;
+	int err;
+	int ret;
+
+	queues_waiting = kvcalloc(num_queues, sizeof(int), GFP_KERNEL);
+	if (!queues_waiting)
+		return -ENOMEM;
+	ret_codes = kvcalloc(num_queues, sizeof(int), GFP_KERNEL);
+	if (!ret_codes) {
+		err = -ENOMEM;
+		goto free_with_queues_waiting;
+	}
+	commands = kvcalloc(num_queues, sizeof(int), GFP_KERNEL);
+	if (!commands) {
+		err = -ENOMEM;
+		goto free_with_ret_codes;
+	}
+
+	for (queue_idx = 0; queue_idx < num_queues; queue_idx++)
+		queues_waiting[queue_idx] = start_id + queue_idx;
+	do {
+		retry_needed = 0;
+		queue_idx = 0;
+		while (queue_idx < num_queues) {
+			cmd_idx = 0;
+			while (queue_idx < num_queues) {
+				if (queues_waiting[queue_idx] != queue_done) {
+					err = cmd(priv, queues_waiting[queue_idx]);
+					if (err == -ENOMEM)
+						break;
+					if (err)
+						goto free_with_commands;
+					commands[cmd_idx++] = queue_idx;
+				}
+				queue_idx++;
+			}
+
+			if (queue_idx < num_queues)
+				dev_dbg(&priv->pdev->dev,
+					"Issued %d of %d batched commands\n",
+					queue_idx, num_queues);
+
+			err = gve_adminq_kick_and_wait(priv, cmd_idx, ret_codes);
+			if (err)
+				goto free_with_commands;
+
+			for (ret_code_idx = 0; ret_code_idx < cmd_idx; ret_code_idx++) {
+				if (ret_codes[ret_code_idx] == 0) {
+					queues_waiting[commands[ret_code_idx]] = queue_done;
+				} else if (ret_codes[ret_code_idx] != -ETIME) {
+					ret = ret_codes[ret_code_idx];
+					goto free_with_commands;
+				} else {
+					retry_needed++;
+				}
+			}
+
+			if (retry_needed)
+				dev_dbg(&priv->pdev->dev,
+					"Issued %d batched commands, %d needed a retry\n",
+					cmd_idx, retry_needed);
+		}
+	} while (retry_needed && ++retry_cnt < GVE_ADMINQ_RETRY_COUNT);
+
+	ret = retry_needed ? -ETIME : 0;
+
+free_with_commands:
+	kvfree(commands);
+	commands = NULL;
+free_with_ret_codes:
+	kvfree(ret_codes);
+	ret_codes = NULL;
+free_with_queues_waiting:
+	kvfree(queues_waiting);
+	queues_waiting = NULL;
+	if (err)
+		return err;
+	return ret;
+}
+
 static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	struct gve_tx_ring *tx = &priv->tx[queue_index];
@@ -678,16 +761,8 @@ static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 start_id, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = start_id; i < start_id + num_queues; i++) {
-		err = gve_adminq_create_tx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_create_tx_queue,
+					start_id, num_queues);
 }
 
 static void gve_adminq_get_create_rx_queue_cmd(struct gve_priv *priv,
@@ -759,16 +834,8 @@ int gve_adminq_create_single_rx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = 0; i < num_queues; i++) {
-		err = gve_adminq_create_rx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_create_rx_queue,
+					0, num_queues);
 }
 
 static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
@@ -791,16 +858,8 @@ static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 start_id, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = start_id; i < start_id + num_queues; i++) {
-		err = gve_adminq_destroy_tx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_destroy_tx_queue,
+					start_id, num_queues);
 }
 
 static void gve_adminq_make_destroy_rx_queue_cmd(union gve_adminq_command *cmd,
@@ -832,16 +891,8 @@ int gve_adminq_destroy_single_rx_queue(struct gve_priv *priv, u32 queue_index)
 
 int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
 {
-	int err;
-	int i;
-
-	for (i = 0; i < num_queues; i++) {
-		err = gve_adminq_destroy_rx_queue(priv, i);
-		if (err)
-			return err;
-	}
-
-	return gve_adminq_kick_and_wait(priv);
+	return gve_adminq_manage_queues(priv, &gve_adminq_destroy_rx_queue,
+					0, num_queues);
 }
 
 static void gve_set_default_desc_cnt(struct gve_priv *priv,
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index ed1370c9b197..96e98f65273c 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -62,6 +62,11 @@ enum gve_adminq_statuses {
 	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
 };
 
+/* AdminQ commands (that aren't batched) will be retried if they encounter
+ * an recoverable error.
+ */
+#define GVE_ADMINQ_RETRY_COUNT 3
+
 #define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
 
 /* All AdminQ command structs should be naturally packed. The static_assert
-- 
2.45.2.803.g4e1b14247a-goog

.

