Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: "NeilBrown" <neilb@suse.de>
To: "Chuck Lever" <chuck.lever@oracle.com>
Cc: "Mike Snitzer" <snitzer@kernel.org>, linux-nfs@vger.kernel.org,
 "Jeff Layton" <jlayton@kernel.org>, "Anna Schumaker" <anna@kernel.org>,
 "Trond Myklebust" <trondmy@hammerspace.com>, snitzer@hammerspace.com
Subject: Re: [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to
 properly support v4
In-reply-to: <ZoAtT4M3jCIF8pIC@tissot.1015granger.net>
References: <>, <ZoAtT4M3jCIF8pIC@tissot.1015granger.net>
Date: Mon, 01 Jul 2024 08:01:30 +1000
Message-id: <171978489020.16071.9497041442174299803@noble.neil.brown.name>
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86428
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

On Sun, 30 Jun 2024, Chuck Lever wrote:
> On Fri, Jun 28, 2024 at 05:10:53PM -0400, Mike Snitzer wrote:
> > This is nfs-localio code which blurs the boundary between server and
> > client...
> >=20
> > The change_attr is used by NFS to detect if a file might have changed.
> > This code is used to get the attributes after a write request.  NFS
> > uses a GETATTR request to the server at other times.  The change_attr
> > should be consistent between the two else comparisons will be
> > meaningless.
> >=20
> > So nfs_localio_vfs_getattr() should use the same change_attr as the
> > one that would be used if the NFS GETATTR request were made.  For
> > NFSv3, that is nfs_timespec_to_change_attr() as was already
> > implemented.  For NFSv4 it is something different (as implemented in
> > this commit).
> >=20
> > [above header derived from linux-nfs message Neil sent on this topic]
>=20
> Instead of this note, I recommend:
>=20
> Message-Id: <171918165963.14261.959545364150864599@noble.neil.brown.name>

Linus would not be impressed.  He likes links that you can click on and
follow.
So
   Link: https://lore.kernel.org/171918165963.14261.959545364150864599@noble.=
neil.brown.name

is preferred (at least I think that is the current state of the
conversation

see https://lore.kernel.org/all/CAHk-=3DwiD9du3fBHuLYzwUSdNgY+hxMZEWNZpqJXy-=
=3DwD2wafdg@mail.gmail.com/

NeilBrown

>=20
>=20
> > Suggested-by: NeilBrown <neil@brown.name>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfs/localio.c | 48 +++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 39 insertions(+), 9 deletions(-)
> >=20
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > index 0f7d6d55087b..fe96f05ba8ca 100644
> > --- a/fs/nfs/localio.c
> > +++ b/fs/nfs/localio.c
> > @@ -364,21 +364,47 @@ nfs_set_local_verifier(struct inode *inode,
> >  	verf->committed =3D how;
> >  }
> > =20
> > +/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
> > +static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
> > +{
> > +	u32 request_mask =3D STATX_BASIC_STATS;
> > +
> > +	if (version =3D=3D 4)
> > +		request_mask |=3D (STATX_BTIME | STATX_CHANGE_COOKIE);
> > +	return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
> > +}
> > +
> > +/*
> > + * Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute(),
> > + * FIXME: factor out to common code.
> > + */
> > +static u64 __nfsd4_change_attribute(const struct kstat *stat,
> > +				    const struct inode *inode)
> > +{
> > +	u64 chattr;
> > +
> > +	if (stat->result_mask & STATX_CHANGE_COOKIE) {
> > +		chattr =3D stat->change_cookie;
> > +		if (S_ISREG(inode->i_mode) &&
> > +		    !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
> > +			chattr +=3D (u64)stat->ctime.tv_sec << 30;
> > +			chattr +=3D stat->ctime.tv_nsec;
> > +		}
> > +	} else {
> > +		chattr =3D time_to_chattr(&stat->ctime);
> > +	}
> > +	return chattr;
> > +}
> > +
> >  static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
> >  {
> >  	struct kstat stat;
> >  	struct file *filp =3D iocb->kiocb.ki_filp;
> >  	struct nfs_pgio_header *hdr =3D iocb->hdr;
> >  	struct nfs_fattr *fattr =3D hdr->res.fattr;
> > +	int version =3D NFS_PROTO(hdr->inode)->version;
> > =20
> > -	if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
> > -					    STATX_INO |
> > -					    STATX_ATIME |
> > -					    STATX_MTIME |
> > -					    STATX_CTIME |
> > -					    STATX_SIZE |
> > -					    STATX_BLOCKS,
> > -					    AT_STATX_SYNC_AS_STAT))
> > +	if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
> >  		return;
> > =20
> >  	fattr->valid =3D (NFS_ATTR_FATTR_FILEID |
> > @@ -394,7 +420,11 @@ static void nfs_local_vfs_getattr(struct nfs_local_k=
iocb *iocb)
> >  	fattr->atime =3D stat.atime;
> >  	fattr->mtime =3D stat.mtime;
> >  	fattr->ctime =3D stat.ctime;
> > -	fattr->change_attr =3D nfs_timespec_to_change_attr(&fattr->ctime);
> > +	if (version =3D=3D 4) {
> > +		fattr->change_attr =3D
> > +			__nfsd4_change_attribute(&stat, file_inode(filp));
> > +	} else
> > +		fattr->change_attr =3D nfs_timespec_to_change_attr(&fattr->ctime);
> >  	fattr->du.nfs3.used =3D stat.blocks << 9;
> >  }
> > =20
> > --=20
> > 2.44.0
> >=20
>=20
> --=20
> Chuck Lever
>=20

.

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: "NeilBrown" <neilb@suse.de>
To: "Chuck Lever" <chuck.lever@oracle.com>
Cc: "Mike Snitzer" <snitzer@kernel.org>, linux-nfs@vger.kernel.org,
 "Jeff Layton" <jlayton@kernel.org>, "Anna Schumaker" <anna@kernel.org>,
 "Trond Myklebust" <trondmy@hammerspace.com>, snitzer@hammerspace.com
Subject: Re: [PATCH v9 13/19] nfsd: add "localio" support
In-reply-to: <ZoCIQjxougYwplsp@tissot.1015granger.net>
References: <>, <ZoCIQjxougYwplsp@tissot.1015granger.net>
Date: Mon, 01 Jul 2024 08:22:56 +1000
Message-id: <171978617639.16071.17212237728640634496@noble.neil.brown.name>
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86429
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

On Sun, 30 Jun 2024, Chuck Lever wrote:
> Sorry, I guess I expected to have more time to learn about these
> patches before writing review comments. But if you want them to go
> in soon, I had better look more closely at them now.
>=20
>=20
> On Fri, Jun 28, 2024 at 05:10:59PM -0400, Mike Snitzer wrote:
> > Pass the stored cl_nfssvc_net from the client to the server as
>=20
> This is the only mention of cl_nfssvc_net I can find in this
> patch. I'm not sure what it is. Patch description should maybe
> provide some context.
>=20
>=20
> > first argument to nfsd_open_local_fh() to ensure the proper network
> > namespace is used for localio.
>=20
> Can the patch description say something about the distinct mount=20
> namespaces -- if the local application is running in a different
> container than the NFS server, are we using only the network
> namespaces for authorizing the file access? And is that OK to do?
> If yes, patch description should explain that NFS local I/O ignores
> the boundaries of mount namespaces and why that is OK to do.
>=20
>=20
> > Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> > Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> > Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfsd/Makefile    |   1 +
> >  fs/nfsd/filecache.c |   2 +-
> >  fs/nfsd/localio.c   | 239 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/nfsd/nfssvc.c    |   1 +
> >  fs/nfsd/trace.h     |   3 +-
> >  fs/nfsd/vfs.h       |   9 ++
> >  6 files changed, 253 insertions(+), 2 deletions(-)
> >  create mode 100644 fs/nfsd/localio.c
> >=20
> > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > index b8736a82e57c..78b421778a79 100644
> > --- a/fs/nfsd/Makefile
> > +++ b/fs/nfsd/Makefile
> > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) +=3D nfs4layouts.o
> >  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) +=3D blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_SCSILAYOUT) +=3D blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) +=3D flexfilelayout.o flexfilelayoutx=
dr.o
> > +nfsd-$(CONFIG_NFSD_LOCALIO) +=3D localio.o
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index ad9083ca144b..99631fa56662 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -52,7 +52,7 @@
> >  #define NFSD_FILE_CACHE_UP		     (0)
> > =20
> >  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > -#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
> > +#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALI=
O)
> > =20
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > new file mode 100644
> > index 000000000000..759a5cb79652
> > --- /dev/null
> > +++ b/fs/nfsd/localio.c
> > @@ -0,0 +1,239 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NFS server support for local clients to bypass network stack
> > + *
> > + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +
> > +#include <linux/exportfs.h>
> > +#include <linux/sunrpc/svcauth_gss.h>
> > +#include <linux/sunrpc/clnt.h>
> > +#include <linux/nfs.h>
> > +#include <linux/string.h>
> > +
> > +#include "nfsd.h"
> > +#include "vfs.h"
> > +#include "netns.h"
> > +#include "filecache.h"
> > +
> > +#define NFSDDBG_FACILITY		NFSDDBG_FH
>=20
> With no more dprintk() call sites in this patch, you no longer need
> this macro definition.
>=20
>=20
> > +/*
> > + * We need to translate between nfs status return values and
> > + * the local errno values which may not be the same.
> > + * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
> > + *   all compiled nfs objects if it were in include/linux/nfs.h
> > + */
> > +static const struct {
> > +	int stat;
> > +	int errno;
> > +} nfs_common_errtbl[] =3D {
> > +	{ NFS_OK,		0		},
> > +	{ NFSERR_PERM,		-EPERM		},
> > +	{ NFSERR_NOENT,		-ENOENT		},
> > +	{ NFSERR_IO,		-EIO		},
> > +	{ NFSERR_NXIO,		-ENXIO		},
> > +/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
> > +	{ NFSERR_ACCES,		-EACCES		},
> > +	{ NFSERR_EXIST,		-EEXIST		},
> > +	{ NFSERR_XDEV,		-EXDEV		},
> > +	{ NFSERR_NODEV,		-ENODEV		},
> > +	{ NFSERR_NOTDIR,	-ENOTDIR	},
> > +	{ NFSERR_ISDIR,		-EISDIR		},
> > +	{ NFSERR_INVAL,		-EINVAL		},
> > +	{ NFSERR_FBIG,		-EFBIG		},
> > +	{ NFSERR_NOSPC,		-ENOSPC		},
> > +	{ NFSERR_ROFS,		-EROFS		},
> > +	{ NFSERR_MLINK,		-EMLINK		},
> > +	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
> > +	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
> > +	{ NFSERR_DQUOT,		-EDQUOT		},
> > +	{ NFSERR_STALE,		-ESTALE		},
> > +	{ NFSERR_REMOTE,	-EREMOTE	},
> > +#ifdef EWFLUSH
> > +	{ NFSERR_WFLUSH,	-EWFLUSH	},
> > +#endif
> > +	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
> > +	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
> > +	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
> > +	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
> > +	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
> > +	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
> > +	{ NFSERR_BADTYPE,	-EBADTYPE	},
> > +	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
> > +	{ -1,			-EIO		}
> > +};
> > +
> > +/**
> > + * nfs_stat_to_errno - convert an NFS status code to a local errno
> > + * @status: NFS status code to convert
> > + *
> > + * Returns a local errno value, or -EIO if the NFS status code is
> > + * not recognized.  nfsd_file_acquire() returns an nfsstat that
> > + * needs to be translated to an errno before being returned to a
> > + * local client application.
> > + */
> > +static int nfs_stat_to_errno(enum nfs_stat status)
> > +{
> > +	int i;
> > +
> > +	for (i =3D 0; nfs_common_errtbl[i].stat !=3D -1; i++) {
> > +		if (nfs_common_errtbl[i].stat =3D=3D (int)status)
> > +			return nfs_common_errtbl[i].errno;
> > +	}
> > +	return nfs_common_errtbl[i].errno;
> > +}
> > +
> > +static void
> > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > +{
> > +	if (rqstp->rq_client)
> > +		auth_domain_put(rqstp->rq_client);
> > +	if (rqstp->rq_cred.cr_group_info)
> > +		put_group_info(rqstp->rq_cred.cr_group_info);
> > +	/* rpcauth_map_to_svc_cred_local() clears cr_principal */
> > +	WARN_ON_ONCE(rqstp->rq_cred.cr_principal !=3D NULL);
> > +	kfree(rqstp->rq_xprt);
> > +	kfree(rqstp);
> > +}
> > +
> > +static struct svc_rqst *
> > +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
> > +			const struct cred *cred)
> > +{
> > +	struct svc_rqst *rqstp;
> > +	struct nfsd_net *nn =3D net_generic(net, nfsd_net_id);
> > +	int status;
> > +
> > +	/* FIXME: not running in nfsd context, must get reference on nfsd_serv =
*/
> > +	if (unlikely(!READ_ONCE(nn->nfsd_serv)))
> > +		return ERR_PTR(-ENXIO);
> > +
> > +	rqstp =3D kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > +	if (!rqstp)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	rqstp->rq_xprt =3D kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > +	if (!rqstp->rq_xprt) {
> > +		status =3D -ENOMEM;
> > +		goto out_err;
> > +	}
>=20
> struct svc_rqst is pretty big (like, bigger than a couple of pages).
> What happens if this allocation fails?
>=20
> And how often does it occur -- does that add significant overhead?
>=20
>=20
> > +
> > +	rqstp->rq_xprt->xpt_net =3D net;
> > +	__set_bit(RQ_SECURE, &rqstp->rq_flags);
> > +	rqstp->rq_proc =3D 1;
> > +	rqstp->rq_vers =3D 3;
>=20
> IMO these need to be symbolic constants, not integers. Or, at least
> there needs to be some documenting comments that explain these are
> fake and why that's OK to do. Or, are there better choices?
>=20
>=20
> > +	rqstp->rq_prot =3D IPPROTO_TCP;
> > +	rqstp->rq_server =3D nn->nfsd_serv;
> > +
> > +	/* Note: we're connecting to ourself, so source addr =3D=3D peer addr */
> > +	rqstp->rq_addrlen =3D rpc_peeraddr(rpc_clnt,
> > +			(struct sockaddr *)&rqstp->rq_addr,
> > +			sizeof(rqstp->rq_addr));
> > +
> > +	rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
> > +
> > +	/*
> > +	 * set up enough for svcauth_unix_set_client to be able to wait
> > +	 * for the cache downcall. Note that we do _not_ want to allow the
> > +	 * request to be deferred for later revisit since this rqst and xprt
> > +	 * are not set up to run inside of the normal svc_rqst engine.
> > +	 */
> > +	INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> > +	kref_init(&rqstp->rq_xprt->xpt_ref);
> > +	spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> > +	rqstp->rq_chandle.thread_wait =3D 5 * HZ;
> > +
> > +	status =3D svcauth_unix_set_client(rqstp);
> > +	switch (status) {
> > +	case SVC_OK:
> > +		break;
> > +	case SVC_DENIED:
> > +		status =3D -ENXIO;
> > +		goto out_err;
> > +	default:
> > +		status =3D -ETIMEDOUT;
> > +		goto out_err;
> > +	}
>=20
> Interesting. Why would svcauth_unix_set_client fail for a local I/O
> request? Wouldn't it only be because the local application is trying
> to open a file it doesn't have permission to?
>=20

I'm beginning to think this section of code is the of the sort where you
need to be twice as clever when debugging as you where when writing.  It
is trying to get the client to use interfaces written for server-side
actions, and it isn't a good fit.

I think that instead we should modify fh_verify() so that it takes
explicit net, rq_vers, rq_cred, rq_client as well as the rqstp, and
the localio client passes in a NULL rqstp.

Getting the rq_client is an interesting challenge.
The above code (if I'm reading it correctly) gets the server-side
address of the IP connection, and passes that through to the sunrpc code
as though it is the client address.  So as long as the server is
exporting to itself, and as long as no address translation is happening
on the path, this works.  It feels messy though - and fragile.

I would rather we had some rq_client (struct auth_domain) that was
dedicated to localio.  The client should be able to access it based on
the fact that it could rather the server UUID using the LOCALIO RPC
protocol.

I'm not sure what exactly this would look like, but the=20
'struct auth_domain *' should be something that can be accessed
directly, not looked up in a cache.

I can try to knock up a patch to allow fh_verify (and nfsd_file_acquire)
without an rqstp.  I won't try the auth_domain change until I hear what
others think.

NeilBrown
.

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: "NeilBrown" <neilb@suse.de>
To: "Jeff Layton" <jlayton@kernel.org>
Cc: "Chuck Lever" <chuck.lever@oracle.com>,
 "Mike Snitzer" <snitzer@kernel.org>, linux-nfs@vger.kernel.org,
 "Anna Schumaker" <anna@kernel.org>,
 "Trond Myklebust" <trondmy@hammerspace.com>, snitzer@hammerspace.com
Subject: Re: [PATCH v9 13/19] nfsd: add "localio" support
In-reply-to: <171978445670.16071.1689758767313847463@noble.neil.brown.name>
References: <>, <de0cd43fe008c32bfe6e3c983256862fb5ffb9c6.camel@kernel.org>,
 <171978445670.16071.1689758767313847463@noble.neil.brown.name>
Date: Mon, 01 Jul 2024 11:29:20 +1000
Message-id: <171979736048.16071.3809088390447928718@noble.neil.brown.name>
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86433
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

On Mon, 01 Jul 2024, NeilBrown wrote:
> On Mon, 01 Jul 2024, Jeff Layton wrote:
> > On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> > > On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > > > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > > > +
> > > > > > > > +	/* nfs_fh -> svc_fh */
> > > > > > > > +	if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > > > +		status =3D -EINVAL;
> > > > > > > > +		goto out;
> > > > > > > > +	}
> > > > > > > > +	fh_init(&fh, NFS4_FHSIZE);
> > > > > > > > +	fh.fh_handle.fh_size =3D nfs_fh->size;
> > > > > > > > +	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > > > +
> > > > > > > > +	if (fmode & FMODE_READ)
> > > > > > > > +		mayflags |=3D NFSD_MAY_READ;
> > > > > > > > +	if (fmode & FMODE_WRITE)
> > > > > > > > +		mayflags |=3D NFSD_MAY_WRITE;
> > > > > > > > +
> > > > > > > > +	beres =3D nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > > > +	if (beres) {
> > > > > > > > +		status =3D nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > > > +		goto out_fh_put;
> > > > > > > > +	}
> > > > > > >=20
> > > > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > > > nfsd_open_verified() would be simpler and/or good enough here. =
Is
> > > > > > > there a strong reason to use the file cache for locally opened
> > > > > > > files? Jeff, any thoughts?
> > > > > >=20
> > > > > > > Will there be writeback ramifications for
> > > > > > > doing this? Maybe we need a comment here explaining how these f=
iles
> > > > > > > are garbage collected (just an fput by the local I/O client, I =
would
> > > > > > > guess).
> > > > > >=20
> > > > > > OK, going back to this: Since right here we immediately call=20
> > > > > >=20
> > > > > > 	nfsd_file_put(nf);
> > > > > >=20
> > > > > > There are no writeback ramifications nor any need to comment about
> > > > > > garbage collection. But this still seems like a lot of (possibly
> > > > > > unnecessary) overhead for simply obtaining a struct file.
> > > > >=20
> > > > > Easy enough change, probably best to avoid the filecache but would =
like
> > > > > to verify with Jeff before switching:
> > > > >=20
> > > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > > index 1d6508aa931e..85ebf63789fb 100644
> > > > > --- a/fs/nfsd/localio.c
> > > > > +++ b/fs/nfsd/localio.c
> > > > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_ne=
t,
> > > > >         const struct cred *save_cred;
> > > > >         struct svc_rqst *rqstp;
> > > > >         struct svc_fh fh;
> > > > > -       struct nfsd_file *nf;
> > > > >         __be32 beres;
> > > > >=20
> > > > >         if (nfs_fh->size > NFS4_FHSIZE)
> > > > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_=
net,
> > > > >         if (fmode & FMODE_WRITE)
> > > > >                 mayflags |=3D NFSD_MAY_WRITE;
> > > > >=20
> > > > > -       beres =3D nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > +       beres =3D fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > > >         if (beres) {
> > > > >                 status =3D nfs_stat_to_errno(be32_to_cpu(beres));
> > > > >                 goto out_fh_put;
> > > > >         }
> > > > > -       *pfilp =3D get_file(nf->nf_file);
> > > > > -       nfsd_file_put(nf);
> > > > > +       status =3D nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > > >  out_fh_put:
> > > > >         fh_put(&fh);
> > > > >         nfsd_local_fakerqst_destroy(rqstp);
> > > > >=20
> > > >=20
> > > > My suggestion would be to _not_ do this. I think you do want to use t=
he
> > > > filecache (mostly for performance reasons).
> > >=20
> > > But look carefully:
> > >=20
> > >  -- we're not calling nfsd_file_acquire_gc() here
> > >=20
> > >  -- we're immediately calling nfsd_file_put() on the returned nf
> > >=20
> > > There's nothing left in the file cache when nfsd_open_local_fh()
> > > returns. Each call here will do a full file open and a full close.
> > >=20
> > >=20
> >=20
> > Good point. This should be calling nfsd_file_acquire_gc(), IMO.=20
>=20
> Or the client could do a v4 style acquire, and not call nfsd_file_put()
> until it was done with the file.  I don't see a specific problem with
> _gc, but avoiding the heuristic it implies seems best where possible.
>=20

I'm now wondering if this matters at all.
For NFSv4 the client still calls OPEN and CLOSE over the wire, so the
file will be in the cache whenever it is open so the current code is
fine.

For NFSv3 the client will only do the lookup once on the first IO
request.  The struct file is stored in a client data structure and used
subsequently without any interaction with nfsd.
So if the client opens the same file multiple times we might get extra
lookups on the server, but I'm not at all sure that justifies any
complexity.

So my current inclination is to leave this code as is.

NeilBrown
.

From: NeilBrown <neilb@suse.de>
To: Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org,
	Olga Kornievskaia <kolga@netapp.com>,
	Dai Ngo <Dai.Ngo@oracle.com>,
	Tom Talpey <tom@talpey.com>,
	Mike Snitzer <snitzer@kernel.org>
Subject: [PATCH 0/6 RFC] nfsd: provide simpler interface for LOCALIO access
Date: Mon,  1 Jul 2024 12:53:15 +1000
Message-ID: <20240701025802.22985-1-neilb@suse.de>
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86434
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

The purpose of this series is to provide an iterface to the filecache
which can be used without a struct svc_rqst, and without any xprt.

This interface is realised in the final patch as nfsd_file_acquire_local().

That function currently takes and nfs_vers option.  That was a natural
fallout of the refactorisation that lead here, but it probably isn't
wanted.  I need to review exactly what effect it has - mostly it is
about error code choice I think.

The function also takess a 'struct auth_domain*'.  It is not clear how a
caller would get hold of such a thing.  I suspect that the server should
register a "local-io" auth_domain in the nfsd_net, and it should be used
here.

Comments on design and implementation most welcome.

Thanks,
NeilBrown

 [PATCH 1/6] nfsd: introduce __fh_verify which takes explicit nfsd_net
 [PATCH 2/6] nfsd: add cred parameter to __fh_verify()
 [PATCH 3/6] nfsd: pass nfs_vers explicitly to __fh_verify()
 [PATCH 4/6] nfsd: pass client explicitly to __fh_verify()
 [PATCH 5/6] nfsd: __fh_verify now treats NULL rqstp as a trusted
 [PATCH 6/6] nfsd: add nfsd_file_acquire_local().
.

From: Christoph Hellwig <hch@lst.de>
To: Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org
Subject: NFS buffered write cleanup
Date: Mon,  1 Jul 2024 07:26:47 +0200
Message-ID: <20240701052707.1246254-1-hch@lst.de>
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86442
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hi all,

this series cleans up the nfs_page handling in the buffer write path.

The first patch was already sent independently but hasn't been picked up
and this included here again.

The last patch fixes a bug where a request could get incorrectly reused.
It would require the flexfiles layout and odd I/O timings, and without
a flexfiles server I can't actually hit it.  I'd appreciate a careful
review of that one.

The series is against Trond's testing branch.

Diffstat:
 fs/nfs/file.c                  |    6 
 fs/nfs/filelayout/filelayout.c |    1 
 fs/nfs/fscache.c               |    2 
 fs/nfs/internal.h              |    8 -
 fs/nfs/pagelist.c              |  117 ---------------
 fs/nfs/pnfs.h                  |   22 --
 fs/nfs/pnfs_nfs.c              |   47 ------
 fs/nfs/read.c                  |    2 
 fs/nfs/write.c                 |  316 ++++++++++++++++++-----------------------
 include/linux/nfs_page.h       |    7 
 10 files changed, 157 insertions(+), 371 deletions(-)
.

From: Jan Kara <jack@suse.cz>
To: Anna Schumaker <anna@kernel.org>
Cc: Trond Myklebust <trondmy@kernel.org>,
	linux-nfs@vger.kernel.org,
	Sagi Grimberg <sagi@grimberg.me>,
	Jeff Layton <jlayton@kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes
Date: Mon,  1 Jul 2024 12:50:45 +0200
Message-Id: <20240617073525.10666-1-jack@suse.cz>
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86451
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

[Resending because of messed up mailing list address]

Hello,

this is second revision of my patch series improving NFS throughput for
buffered writes.

Changes since v1:
* Added Reviewed-by tags
* Made sleep waiting for congestion to resolve killable

Original cover letter below:

I was thinking how to best address the performance regression coming from
NFS write congestion. After considering various options and concerns raised
in the previous discussion, I've got an idea for a simple option that could
help to keep the server more busy - just mimick what block devices do and
block the flush worker waiting for congestion to resolve instead of aborting
the writeback. And it actually helps enough that I don't think more complex
solutions are warranted at this point.

This patch series has two preparatory cleanups and then a patch implementing
this idea.

								Honza

Previous versions:
Link: http://lore.kernel.org/r/20240612153022.25454-1-jack@suse.cz # v1
.

Date: Mon, 1 Jul 2024 11:35:06 -0400
From: Chuck Lever <chuck.lever@oracle.com>
To: linux-nfs@vger.kernel.org
Subject: long-term stable backports of NFSD patches
Message-ID: <ZoLMqqq88vcdhbGJ@tissot.1015granger.net>
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86461
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

It's apparent that a number of distributions and their customers
remain on long-term stable kernels. We are aware of the scalability
problems and other bugs in NFSD in kernels between v5.4 and v6.1.

To address the filecache and other scalability problems in those
kernels, I'm preparing backported patches of NFSD fixes for several
popular LTS kernels. These backports are destined for the official
LTS kernel branches so that distributions can easily integrate them
into their products.

Once this effort is complete, Greg and Sasha will continue to be
responsible for backporting NFSD-related fixes from upstream into
the LTS kernels.

---

I've pushed the NFSD backports to branches in this repo:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

If you are able, I encourage you to pull these, review them or try
them out, and report any issues or successes. I'm currently using
the NFS workflows in kdevops as the testing platform, but am
planning to include other tests.


LTS v5.10.y

The bulk of this work is now in v5.10.220. There were a handful of
missing forward fixes discovered that I've now submitted to stable@
for inclusion in an upcoming release of 5.10.y.

You can find this series in the "nfsd-5.10.y" branch in the above
repo.


Out-of-kernel follow-up work

Amir Goldstein made these review requests:

- Adjust the LTP test fanotify09 to update the comment with the
appropriate 5.15.y version
- Update fanotify_init(2) "FAN_REPORT_TARGET_FID (since Linux 5.17)"
- Update fanotify_mark(2) "FAN_MARK_EVICTABLE (since Linux 5.19)"
- Update fanotify_mark(2) "FAN_RENAME (since Linux 5.17)"
- Update fanotify_mark(2) "FAN_FS_ERROR (since Linux 5.16 and 5.15.???)"
- Update fanotify_mark(2) "FAN_MARK_IGNORE (since Linux 6.0)"

I plan to provide these updates once the NFSD filecache backports
have been completed.


-- 
Chuck Lever
.

