From: Sagi Grimberg <sagi@grimberg.me>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH rfc] nfsd: offer write delegation for O_WRONLY opens
Date: Sun,  7 Jul 2024 01:42:07 +0300
Message-ID: <20240706224207.927978-1-sagi@grimberg.me>
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86614
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Many applications open files with O_WRONLY, fully intending to write
into the opened file. There is no reason why these applications should
not enjoy a write delegation handed to them.

Cc: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
Note: I couldn't find any reason to why the initial implementation chose
to offer write delegations only to NFS4_SHARE_ACCESS_BOTH, but it seemed
like an oversight to me. So I figured why not just send it out and see who
objects...

 fs/nfsd/nfs4state.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a20c2c9d7d45..69d576b19eb6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5784,15 +5784,15 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
 	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
 	 *   on its own, all opens."
 	 *
-	 * Furthermore the client can use a write delegation for most READ
-	 * operations as well, so we require a O_RDWR file here.
-	 *
-	 * Offer a write delegation in the case of a BOTH open, and ensure
-	 * we get the O_RDWR descriptor.
+	 * Offer a write delegation in the case of a BOTH open (ensure
+	 * a O_RDWR descriptor) Or WRONLY open (with a O_WRONLY descriptor).
 	 */
 	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
 		nf = find_rw_file(fp);
 		dl_type = NFS4_OPEN_DELEGATE_WRITE;
+	} else if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
+		nf = find_writeable_file(fp);
+		dl_type = NFS4_OPEN_DELEGATE_WRITE;
 	}
 
 	/*
-- 
2.43.0

.

X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Date: Sat, 06 Jul 2024 21:37:17 -0700
Message-ID: <000000000000f8ed54061ca0d9a5@google.com>
Subject: [syzbot] [nfs?] INFO: task hung in nfsd_umount
From: syzbot <syzbot+b568ba42c85a332a88ee@syzkaller.appspotmail.com>
To: Dai.Ngo@oracle.com, chuck.lever@oracle.com, jlayton@kernel.org, 
	kolga@netapp.com, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, 
	neilb@suse.de, syzkaller-bugs@googlegroups.com, tom@talpey.com
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1270015 org.kernel.vger.linux-nfs:86617
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hello,

syzbot found the following issue on:

HEAD commit:    1dd28064d416 Merge tag 'integrity-v6.10-fix' of ssh://ra.k..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16814d76980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=1ace69f521989b1f
dashboard link: https://syzkaller.appspot.com/bug?extid=b568ba42c85a332a88ee
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/20c723869b92/disk-1dd28064.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c2a9a7382516/vmlinux-1dd28064.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9872fe6f2853/bzImage-1dd28064.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b568ba42c85a332a88ee@syzkaller.appspotmail.com

INFO: task syz.0.2871:13167 blocked for more than 143 seconds.
      Not tainted 6.10.0-rc6-syzkaller-00212-g1dd28064d416 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.0.2871      state:D stack:26704 pid:13167 tgid:13165 ppid:10304  flags:0x00000004
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5408 [inline]
 __schedule+0x1796/0x49d0 kernel/sched/core.c:6745
 __schedule_loop kernel/sched/core.c:6822 [inline]
 schedule+0x14b/0x320 kernel/sched/core.c:6837
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894
 __mutex_lock_common kernel/locking/mutex.c:684 [inline]
 __mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
 nfsd_shutdown_threads+0x4e/0xd0 fs/nfsd/nfssvc.c:632
 nfsd_umount+0x43/0xd0 fs/nfsd/nfsctl.c:1412
 deactivate_locked_super+0xc4/0x130 fs/super.c:473
 put_fs_context+0x94/0x780 fs/fs_context.c:516
 fscontext_release+0x65/0x80 fs/fsopen.c:73
 __fput+0x24a/0x8a0 fs/file_table.c:422
 task_work_run+0x24f/0x310 kernel/task_work.c:180
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x168/0x360 kernel/entry/common.c:218
 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f11fe775bd9
RSP: 002b:00007f11ff494048 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 00007f11fe903f60 RCX: 00007f11fe775bd9
RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003
RBP: 00007f11fe7e4aa1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f11fe903f60 R15: 00007ffc04bba328
 </TASK>

Showing all locks held in the system:
1 lock held by khungtaskd/30:
 #0: ffffffff8e333f20 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
 #0: ffffffff8e333f20 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
 #0: ffffffff8e333f20 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6614
3 locks held by kworker/u8:4/66:
 #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3223 [inline]
 #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x90a/0x1830 kernel/workqueue.c:3329
 #1: ffffc900020afd00 ((work_completion)(&map->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3224 [inline]
 #1: ffffc900020afd00 ((work_completion)(&map->work)){+.+.}-{0:0}, at: process_scheduled_works+0x945/0x1830 kernel/workqueue.c:3329
 #2: ffffffff8e3391c0 (rcu_state.barrier_mutex){+.+.}-{3:3}, at: rcu_barrier+0x4c/0x530 kernel/rcu/tree.c:4448
4 locks held by udevd/4534:
 #0: ffff88805a0fdd58 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb7/0xd60 fs/seq_file.c:182
 #1: ffff88806542ec88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_seq_start+0x53/0x3b0 fs/kernfs/file.c:154
 #2: ffff88805af532d8 (kn->active#5){++++}-{0:0}, at: kernfs_seq_start+0x72/0x3b0 fs/kernfs/file.c:155
 #3: ffff8880686c90e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:1009 [inline]
 #3: ffff8880686c90e8 (&dev->mutex){....}-{3:3}, at: uevent_show+0x17d/0x340 drivers/base/core.c:2743
2 locks held by getty/4842:
 #0: ffff88802f9d10a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
 #1: ffffc9000312b2f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b5/0x1e10 drivers/tty/n_tty.c:2211
3 locks held by kworker/u9:10/5101:
 #0: ffff88802e1fa148 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3223 [inline]
 #0: ffff88802e1fa148 ((wq_completion)hci2){+.+.}-{0:0}, at: process_scheduled_works+0x90a/0x1830 kernel/workqueue.c:3329
 #1: ffffc90003827d00 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3224 [inline]
 #1: ffffc90003827d00 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_scheduled_works+0x945/0x1830 kernel/workqueue.c:3329
 #2: ffff888020eb4d88 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x1ec/0x400 net/bluetooth/hci_sync.c:322
2 locks held by syz.4.2691/12623:
 #0: ffffffff8f63ad70 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
 #1: ffffffff8e5fff48 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1940
2 locks held by syz.0.2871/13167:
 #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: __super_lock fs/super.c:56 [inline]
 #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: __super_lock_excl fs/super.c:71 [inline]
 #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: deactivate_super+0xb5/0xf0 fs/super.c:505
 #1: ffffffff8e5fff48 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_shutdown_threads+0x4e/0xd0 fs/nfsd/nfssvc.c:632
1 lock held by udevd/14823:
2 locks held by syz-executor/15993:
 #0: ffffffff8f5d4908 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f5d4908 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x1180 net/core/rtnetlink.c:6632
 #1: ffffffff8e3392f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:323 [inline]
 #1: ffffffff8e3392f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x451/0x830 kernel/rcu/tree_exp.h:939
1 lock held by syz.4.3895/16063:
 #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: __super_lock fs/super.c:58 [inline]
 #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: super_lock+0x27c/0x400 fs/super.c:120
4 locks held by kvm-nx-lpage-re/16077:
 #0: ffffffff8e361f28 (cgroup_mutex){+.+.}-{3:3}, at: cgroup_lock include/linux/cgroup.h:368 [inline]
 #0: ffffffff8e361f28 (cgroup_mutex){+.+.}-{3:3}, at: cgroup_attach_task_all+0x27/0xe0 kernel/cgroup/cgroup-v1.c:61
 #1: ffffffff8e1ce5b0 (cpu_hotplug_lock){++++}-{0:0}, at: cgroup_attach_lock+0x11/0x40 kernel/cgroup/cgroup.c:2413
 #2: ffffffff8e362110 (cgroup_threadgroup_rwsem){++++}-{0:0}, at: cgroup_attach_task_all+0x31/0xe0 kernel/cgroup/cgroup-v1.c:62
 #3: ffffffff8e3392f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:291 [inline]
 #3: ffffffff8e3392f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x381/0x830 kernel/rcu/tree_exp.h:939

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 30 Comm: khungtaskd Not tainted 6.10.0-rc6-syzkaller-00212-g1dd28064d416 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
 nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
 nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:223 [inline]
 watchdog+0xfde/0x1020 kernel/hung_task.c:379
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 1109 Comm: kworker/u8:6 Not tainted 6.10.0-rc6-syzkaller-00212-g1dd28064d416 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Workqueue: writeback wb_workfn (flush-8:0)
RIP: 0010:bio_add_page+0xb7/0x840 block/bio.c:1118
Code: 05 00 00 8b 5d 00 45 89 f4 41 f7 d4 89 df 44 89 e6 e8 dd 6f 11 fd 44 39 e3 76 0a e8 13 6e 11 fd e9 20 04 00 00 48 89 6c 24 28 <49> 8d 5f 70 48 89 d8 48 c1 e8 03 48 89 44 24 30 42 0f b6 04 28 84
RSP: 0018:ffffc900045ce610 EFLAGS: 00000213
RAX: 0000000000000000 RBX: 000000000000a000 RCX: ffff888022311e00
RDX: ffff888022311e00 RSI: 00000000ffffefff RDI: 000000000000a000
RBP: ffff88802eb37a28 R08: ffffffff8484b8a3 R09: 1ffffd4000090b30
R10: dffffc0000000000 R11: fffff94000090b31 R12: 00000000ffffefff
R13: dffffc0000000000 R14: 0000000000001000 R15: ffff88802eb37a00
FS:  0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30e11ff8 CR3: 000000000e132000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
 bio_add_folio+0x52/0x80 block/bio.c:1160
 io_submit_add_bh fs/ext4/page-io.c:422 [inline]
 ext4_bio_write_folio+0x1691/0x1da0 fs/ext4/page-io.c:560
 mpage_submit_folio+0x1af/0x230 fs/ext4/inode.c:1869
 mpage_process_page_bufs+0x6c9/0x8d0 fs/ext4/inode.c:1982
 mpage_prepare_extent_to_map+0xec7/0x1c80 fs/ext4/inode.c:2490
 ext4_do_writepages+0xc52/0x3d40 fs/ext4/inode.c:2632
 ext4_writepages+0x213/0x3c0 fs/ext4/inode.c:2768
 do_writepages+0x359/0x870 mm/page-writeback.c:2656
 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1651
 writeback_sb_inodes+0x99c/0x1380 fs/fs-writeback.c:1947
 __writeback_inodes_wb+0x11b/0x260 fs/fs-writeback.c:2018
 wb_writeback+0x495/0xd40 fs/fs-writeback.c:2129
 wb_check_old_data_flush fs/fs-writeback.c:2233 [inline]
 wb_do_writeback fs/fs-writeback.c:2286 [inline]
 wb_workfn+0xba1/0x1090 fs/fs-writeback.c:2314
 process_one_work kernel/workqueue.c:3248 [inline]
 process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3329
 worker_thread+0x86d/0xd50 kernel/workqueue.c:3409
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup
.

Message-ID: <8f4b1a41-0e35-43dc-a62b-1e9239df9b45@afaics.de>
Date: Sun, 7 Jul 2024 08:25:30 +0200
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
To: linux-nfs@vger.kernel.org
From: Harald Dunkel <harri@afaics.de>
Content-Language: en-US
Subject: is nfs-utils aware of cgroup memory limits?
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86618
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hi folks,

I wonder if nfs-utils should be aware of the limits set using legacy or unified
cgroup?

Reason for asking is, I am running nfs exports in an LXC container (using
legacy cgroup). Sometimes nfs gets stuck on the host, and one of the symptoms
are messages about out of memory inside the container.

"cgroup" cannot be found in the code or git log of nfs-utils 2.6.4.


Regards
Harri
.