From: Christoph Hellwig <hch@lst.de>
To: trondmy@kernel.org,
	anna@kernel.org
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH] nfs: remove nfs_page_length
Date: Fri,  5 Jul 2024 07:38:38 +0200
Message-ID: <20240705053838.2043203-1-hch@lst.de>
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86573
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

The nfs_page_length is not used anywhere, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfs/internal.h | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 87ebc4608c316a..5902a9beca1f3f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -797,25 +797,6 @@ static inline void nfs_folio_mark_unstable(struct folio *folio,
 	}
 }
 
-/*
- * Determine the number of bytes of data the page contains
- */
-static inline
-unsigned int nfs_page_length(struct page *page)
-{
-	loff_t i_size = i_size_read(page->mapping->host);
-
-	if (i_size > 0) {
-		pgoff_t index = page_index(page);
-		pgoff_t end_index = (i_size - 1) >> PAGE_SHIFT;
-		if (index < end_index)
-			return PAGE_SIZE;
-		if (index == end_index)
-			return ((i_size - 1) & ~PAGE_MASK) + 1;
-	}
-	return 0;
-}
-
 /*
  * Determine the number of bytes of data the page contains
  */
-- 
2.43.0

.

From: Christoph Hellwig <hch@lst.de>
To: trondmy@kernel.org,
	anna@kernel.org
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH] nfs: do not extent writes to the entire folio
Date: Fri,  5 Jul 2024 07:42:51 +0200
Message-ID: <20240705054251.2043864-1-hch@lst.de>
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86574
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

nfs_update_folio has code to extend a write to the entire page under
certain conditions.  With the support for large folios this now
suddenly extents to the variable sized and potentially much larger folio.
Add code to limit the extension to the page boundaries of the start and
end of the write, which matches the historic expecation and the code
comments.

Fixes: b73fe2dd6cd5 ("nfs: add support for large folios")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfs/write.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index f3b556ff35f9d2..4e81325f95d740 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1294,8 +1294,12 @@ int nfs_update_folio(struct file *file, struct folio *folio,
 		goto out;
 
 	if (nfs_can_extend_write(file, folio, pagelen)) {
-		count = max(count + offset, pagelen);
-		offset = 0;
+		unsigned int end = count + offset;
+
+		offset = round_down(offset, PAGE_SIZE);
+		if (end < pagelen)
+			end = min(round_up(end, PAGE_SIZE), pagelen);
+		count = end - offset;
 	}
 
 	status = nfs_writepage_setup(ctx, folio, offset, count);
-- 
2.43.0

.

From: Christoph Hellwig <hch@lst.de>
To: trondmy@kernel.org,
	anna@kernel.org
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH] nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type
Date: Fri,  5 Jul 2024 15:32:59 +0200
Message-ID: <20240705133259.2181984-1-hch@lst.de>
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86575
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

max_deviceinfo_size is not set anywhere, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfs/pnfs.h     | 1 -
 fs/nfs/pnfs_dev.c | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 1fc40afcbf1f50..30d2613e912b88 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -133,7 +133,6 @@ struct pnfs_layoutdriver_type {
 	const char *name;
 	struct module *owner;
 	unsigned flags;
-	unsigned max_deviceinfo_size;
 	unsigned max_layoutget_response;
 
 	int (*set_layoutdriver) (struct nfs_server *, const struct nfs_fh *);
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 178001c90156fd..bf0f2d67e96c15 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -110,9 +110,6 @@ nfs4_get_device_info(struct nfs_server *server,
 	 * GETDEVICEINFO's maxcount
 	 */
 	max_resp_sz = server->nfs_client->cl_session->fc_attrs.max_resp_sz;
-	if (server->pnfs_curr_ld->max_deviceinfo_size &&
-	    server->pnfs_curr_ld->max_deviceinfo_size < max_resp_sz)
-		max_resp_sz = server->pnfs_curr_ld->max_deviceinfo_size;
 	max_pages = nfs_page_array_len(0, max_resp_sz);
 	dprintk("%s: server %p max_resp_sz %u max_pages %d\n",
 		__func__, server, max_resp_sz, max_pages);
-- 
2.43.0

.

Return-Path: <linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org>
From: Christoph Hellwig <hch@lst.de>
To: kbusch@kernel.org,
	sagi@grimberg.me
Cc: linux-nvme@lists.infradead.org,
	linux-nfs@vger.kernel.org
Subject: [PATCH] nvme: implement ->get_unique_id
Date: Fri,  5 Jul 2024 18:46:26 +0200
Message-ID: <20240705164640.2247869-1-hch@lst.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-BeenThere: linux-nvme@lists.infradead.org
X-Mailman-Version: 2.1.34
List-Id: <linux-nvme.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-nvme>,
 <mailto:linux-nvme-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-nvme/>
List-Post: <mailto:linux-nvme@lists.infradead.org>
List-Help: <mailto:linux-nvme-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-nvme>,
 <mailto:linux-nvme-request@lists.infradead.org?subject=subscribe>
Sender: "Linux-nvme" <linux-nvme-bounces@lists.infradead.org>
Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org
Xref: photonic.trudheim.com org.infradead.lists.linux-nvme:61697 org.kernel.vger.linux-nfs:86587
Newsgroups: org.infradead.lists.linux-nvme,org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Implement the get_unique_id method to allow pNFS SCSI layout access to
NVMe namespaces.

This is the server side implementation of RFC 9561 "Using the Parallel
NFS (pNFS) SCSI Layout to Access Non-Volatile Memory Express (NVMe)
Storage Devices".

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c      | 27 +++++++++++++++++++++++++++
 drivers/nvme/host/multipath.c | 16 ++++++++++++++++
 drivers/nvme/host/nvme.h      |  3 +++
 3 files changed, 46 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 782090ce0bc10d..96e0879013b79d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2230,6 +2230,32 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 	return ret;
 }
 
+int nvme_ns_get_unique_id(struct nvme_ns *ns, u8 id[16],
+		enum blk_unique_id type)
+{
+	struct nvme_ns_ids *ids = &ns->head->ids;
+
+	if (type != BLK_UID_EUI64)
+		return -EINVAL;
+
+	if (memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) {
+		memcpy(id, &ids->nguid, sizeof(ids->nguid));
+		return sizeof(ids->nguid);
+	}
+	if (memchr_inv(ids->eui64, 0, sizeof(ids->eui64))) {
+		memcpy(id, &ids->eui64, sizeof(ids->eui64));
+		return sizeof(ids->eui64);
+	}
+
+	return -EINVAL;
+}
+
+static int nvme_get_unique_id(struct gendisk *disk, u8 id[16],
+		enum blk_unique_id type)
+{
+	return nvme_ns_get_unique_id(disk->private_data, id, type);
+}
+
 #ifdef CONFIG_BLK_SED_OPAL
 static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 		bool send)
@@ -2285,6 +2311,7 @@ const struct block_device_operations nvme_bdev_ops = {
 	.open		= nvme_open,
 	.release	= nvme_release,
 	.getgeo		= nvme_getgeo,
+	.get_unique_id	= nvme_get_unique_id,
 	.report_zones	= nvme_report_zones,
 	.pr_ops		= &nvme_pr_ops,
 };
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d8b6b4648eaff9..1aed93d792b610 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -427,6 +427,21 @@ static void nvme_ns_head_release(struct gendisk *disk)
 	nvme_put_ns_head(disk->private_data);
 }
 
+static int nvme_ns_head_get_unique_id(struct gendisk *disk, u8 id[16],
+		enum blk_unique_id type)
+{
+	struct nvme_ns_head *head = disk->private_data;
+	struct nvme_ns *ns;
+	int srcu_idx, ret = -EWOULDBLOCK;
+
+	srcu_idx = srcu_read_lock(&head->srcu);
+	ns = nvme_find_path(head);
+	if (ns)
+		ret = nvme_ns_get_unique_id(ns, id, type);
+	srcu_read_unlock(&head->srcu, srcu_idx);
+	return ret;
+}
+
 #ifdef CONFIG_BLK_DEV_ZONED
 static int nvme_ns_head_report_zones(struct gendisk *disk, sector_t sector,
 		unsigned int nr_zones, report_zones_cb cb, void *data)
@@ -454,6 +469,7 @@ const struct block_device_operations nvme_ns_head_ops = {
 	.ioctl		= nvme_ns_head_ioctl,
 	.compat_ioctl	= blkdev_compat_ptr_ioctl,
 	.getgeo		= nvme_getgeo,
+	.get_unique_id	= nvme_ns_head_get_unique_id,
 	.report_zones	= nvme_ns_head_report_zones,
 	.pr_ops		= &nvme_pr_ops,
 };
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f3a41133ac3f97..1907fbc3f5dbb3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1062,6 +1062,9 @@ static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
 }
 #endif /* CONFIG_NVME_MULTIPATH */
 
+int nvme_ns_get_unique_id(struct nvme_ns *ns, u8 id[16],
+		enum blk_unique_id type);
+
 struct nvme_zone_info {
 	u64 zone_size;
 	unsigned int max_open_zones;
-- 
2.43.0


.

Return-Path: <linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org>
From: Christoph Hellwig <hch@lst.de>
To: trondmy@kernel.org,
	anna@kernel.org
Cc: linux-nfs@vger.kernel.org,
	linux-nvme@lists.infradead.org
Subject: [PATCH] nfs/blocklayout: add support for NVMe
Date: Fri,  5 Jul 2024 18:51:41 +0200
Message-ID: <20240705165141.2248305-1-hch@lst.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-BeenThere: linux-nvme@lists.infradead.org
X-Mailman-Version: 2.1.34
List-Id: <linux-nvme.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-nvme>,
 <mailto:linux-nvme-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-nvme/>
List-Post: <mailto:linux-nvme@lists.infradead.org>
List-Help: <mailto:linux-nvme-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-nvme>,
 <mailto:linux-nvme-request@lists.infradead.org?subject=subscribe>
Sender: "Linux-nvme" <linux-nvme-bounces@lists.infradead.org>
Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org
Xref: photonic.trudheim.com org.infradead.lists.linux-nvme:61698 org.kernel.vger.linux-nfs:86588
Newsgroups: org.infradead.lists.linux-nvme,org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Look for the udev generated persistent device name for NVMe devices
in addition to the SCSI ones and the Redhat-specific device mapper
name.

This is the client side implementation of RFC 9561 "Using the Parallel
NFS (pNFS) SCSI Layout to Access Non-Volatile Memory Express (NVMe)
Storage Devices".

Note that the udev rules for nvme are a bit of a mess and udev will only
create a link for the uuid if the NVMe namespace has one, and not the
NGUID.  As the current RFCs don't support UUID based identifications this
means the layout can't be used on such namespaces out of the box.  A
small tweak to the udev rules can work around it, and as the real fix I
will submit a draft to the IETF NFSv4 working group to support UUID-based
identifiers for SCSI and NVMe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfs/blocklayout/dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index 221781fb9cf14b..6252f44479457b 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -404,6 +404,8 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
 	bdev_file = bl_open_path(v, "dm-uuid-mpath-0x");
 	if (IS_ERR(bdev_file))
 		bdev_file = bl_open_path(v, "wwn-0x");
+	if (IS_ERR(bdev_file))
+		bdev_file = bl_open_path(v, "nvme-eui.");
 	if (IS_ERR(bdev_file)) {
 		pr_warn("pNFS: no device found for volume %*phN\n",
 			v->scsi.designator_len, v->scsi.designator);
-- 
2.43.0


.

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: "NeilBrown" <neilb@suse.de>
To: "Mike Snitzer" <snitzer@kernel.org>
Cc: "Jeff Layton" <jlayton@kernel.org>, "Chuck Lever" <chuck.lever@oracle.com>,
 "Anna Schumaker" <anna@kernel.org>,
 "Trond Myklebust" <trondmy@hammerspace.com>,
 "Christoph Hellwig" <hch@infradead.org>, linux-nfs@vger.kernel.org
Subject: Re: Security issue in NFS localio
In-reply-to: <ZoYlvk7LnhyUCYU1@kernel.org>
References: <172004548435.16071.5145237815071160040@noble.neil.brown.name>,
 <ZoYlvk7LnhyUCYU1@kernel.org>
Date: Sat, 06 Jul 2024 07:39:49 +1000
Message-id: <172021558910.11489.5865399031739295244@noble.neil.brown.name>
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86592
Newsgroups: org.kernel.vger.linux-nfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

On Thu, 04 Jul 2024, Mike Snitzer wrote:
> On Thu, Jul 04, 2024 at 08:24:44AM +1000, NeilBrown wrote:
> 
> > 2/ The localio access should use exactly the same auth_domain as the
> >    network access uses.  This ensure the credentials implied by
> >    rootsquash and allsquash are used correctly.  I think the current
> >    code has the client guessing what IP address the server will see and
> >    finding an auth_domain based on that.  I'm not comfortable with that.
> 
> nfsd_local_fakerqst_create() isn't guessing.  rpc_peeraddr() returns the
> IP address of the server because the rpc_clnt is the same as
> established for traditional network access.

I think it is guessing in exactly they same way that your previous
internal code tried to use IP addresses to guess whether the server was
on the same host or not.

Whatever reasons you had for thinking that was fragile and needed to be
replaced - apply those reasons to the mechanism for choosing an
'auth_domain' (which is what the IP address is used for).  I am
confident that we need to choose the auth_domain when the client is
making a LOCALIO RPC request to the server, and to use that auth_domain
for subsequent interactions with that client (and possibly a different
auth_domain for a different client).

> 
> It's now 4th of July for me, so I'm with Jeff: I need a drink! ;)
> 

Hope you enjoyed your drink!

NeilBrown
.

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: "NeilBrown" <neilb@suse.de>
To: "Dave Chinner" <david@fromorbit.com>
Cc: "Chuck Lever III" <chuck.lever@oracle.com>,
 "Mike Snitzer" <snitzer@kernel.org>, "Jeff Layton" <jlayton@kernel.org>,
 "Anna Schumaker" <anna@kernel.org>,
 "Trond Myklebust" <trondmy@hammerspace.com>,
 "Christoph Hellwig" <hch@infradead.org>,
 "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>,
 "Linux FS Devel" <linux-fsdevel@vger.kernel.org>
Subject: Re: Security issue in NFS localio
In-reply-to: <ZocvhIoQfzzhp+mh@dread.disaster.area>
References: <>, <ZocvhIoQfzzhp+mh@dread.disaster.area>
Date: Sat, 06 Jul 2024 07:51:03 +1000
Message-id: <172021626347.11489.9592570650036340361@noble.neil.brown.name>
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86593
Newsgroups: org.kernel.vger.linux-nfs,org.kernel.vger.linux-fsdevel
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

On Fri, 05 Jul 2024, Dave Chinner wrote:
> On Thu, Jul 04, 2024 at 07:00:23PM +0000, Chuck Lever III wrote:
> > 
> > 
> > > On Jul 3, 2024, at 6:24 PM, NeilBrown <neilb@suse.de> wrote:
> > > 
> > > 
> > > I've been pondering security questions with localio - particularly
> > > wondering what questions I need to ask.  I've found three focal points
> > > which overlap but help me organise my thoughts:
> > > 1- the LOCALIO RPC protocol
> > > 2- the 'auth_domain' that nfsd uses to authorise access
> > > 3- the credential that is used to access the file
> > > 
> > > 1/ It occurs to me that I could find out the UUID reported by a given
> > > local server (just ask it over the RPC connection), find out the
> > > filehandle for some file that I don't have write access to (not too
> > > hard), and create a private NFS server (hacking nfs-ganasha?) which
> > > reports the same uuid and reports that I have access to a file with
> > > that filehandle.  If I then mount from that server inside a private
> > > container on the same host that is running the local server, I would get
> > > localio access to the target file.
> 
> This seems amazingly complex for something that is actually really
> simple.  Keep in mind that I am speaking from having direct
> experience with developing and maintaining NFS client IO bypass
> infrastructure from when I worked at SGI as an NFS engineer.
> 
> So, let's look at the Irix NFS client/server and the "Bulk Data
> Service" protocol extensions that SGI wrote for NFSv3 back in the
> mid 1990s.  Here's an overview from the 1996 product documentation
> "Getting Started with BDSpro":
> 
> https://irix7.com/techpubs/007-3274-001.pdf

Interesting work.  Thanks for the pointer.

It appear to me that BDS uses a separate network protocol - possibly
over a separate TCP connection or even a separate fabric - to connect
client to server, and this protocol is tuned for high throughput data
transfer and nothing else.  Makes perfect sense.

It would seem to still use the IP address (or similar NFS-style
mechanism) to authenticate each party to the other and to establish a
path for the data to flow over.  This is the question facing localio in
the text of mine that you quote above.  We don't want a network data
flow.  We want to hand over a file descriptor (or 'struct file').  There
is no standard way to achieve this over an IP-connected channel.  So we
are creating one.

The proposed protocol is to send a unique number over the IP-connected
channel, and use that to achieve rendezvous between the in-kernel client
and the in-kernel server.  The interesting questions are around how
unique this number should be, which direction it should travel, and
whether anything else other than the file descriptor should be passed
through when the kernel sides rendezvous.

I don't think the documentation on BDS sheds any particular light on
this question.

Thanks,
NeilBrown
.

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-Mailing-List: linux-nfs@vger.kernel.org
List-Id: <linux-nfs.vger.kernel.org>
List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
From: "NeilBrown" <neilb@suse.de>
To: "Chuck Lever III" <chuck.lever@oracle.com>
Cc: "Christoph Hellwig" <hch@infradead.org>,
 "Mike Snitzer" <snitzer@kernel.org>, "Jeff Layton" <jlayton@kernel.org>,
 "Anna Schumaker" <anna@kernel.org>,
 "Trond Myklebust" <trondmy@hammerspace.com>,
 "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>,
 "Linux FS Devel" <linux-fsdevel@vger.kernel.org>
Subject: Re: Security issue in NFS localio
In-reply-to: <297447D0-AEFF-44EB-A17B-1B66931C5AFE@oracle.com>
References: <>, <297447D0-AEFF-44EB-A17B-1B66931C5AFE@oracle.com>
Date: Sat, 06 Jul 2024 07:56:57 +1000
Message-id: <172021661746.11489.14244273828509105229@noble.neil.brown.name>
Xref: photonic.trudheim.com org.kernel.vger.linux-nfs:86594
Newsgroups: org.kernel.vger.linux-nfs,org.kernel.vger.linux-fsdevel
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

On Fri, 05 Jul 2024, Chuck Lever III wrote:
> 
> 
> > On Jul 5, 2024, at 9:45 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > On Thu, Jul 04, 2024 at 07:00:23PM +0000, Chuck Lever III wrote:
> >>> 3/ The current code uses the 'struct cred' of the application to look up
> >>>  the file in the server code.  When a request goes over the wire the
> >>>  credential is translated to uid/gid (or krb identity) and this is
> >>>  mapped back to a credential on the server which might be in a
> >>>  different uid name space (might it?  Does that even work for nfsd?)
> >>> 
> >>>  I think that if rootsquash or allsquash is in effect the correct
> >>>  server-side credential is used but otherwise the client-side
> >>>  credential is used.  That is likely correct in many cases but I'd
> >>>  like to be convinced that it is correct in all case.  Maybe it is
> >>>  time to get a deeper understanding of uid name spaces.
> >> 
> >> I've wondered about the idmapping issues, actually. Thanks
> >> for bringing that up. I think Christian and linux-fsdevel
> >> need to be involved in this conversation; added.
> > 
> > There is a lot more issues than just idmapping.  That's why I don't
> > think the current approach where the open is executed in the client
> > can work.  The right way is to ensure the open always happens in and
> > nfsd thread context which just pases the open file to client for doing
> > I/O.
> 
> I have considered that approach, but I don't yet have a clear
> enough understanding of the idmapping issues to say whether
> it is going to be necessary.

I can't see that nfsd pays any attention to uid namespaces.  I think it
always uses the init_ns.  So we would need to ensure the client
credential was interpreted against the init namespace.  I still have
made time to understand what this means in practice.

> 
> Just in terms of primitives, the server has svc_wake_up() which
> can enqueue non-transport work on an nfsd thread. Question then
> is what is the form of the response -- how does it get back
> to the "client" side.

I think it would be quite easy to establish a new work-queue (lwq?) that
clients can attach a request structure too before calling svc_wake_up(),
and which the nfsd thread would examine after svc_recv() returns.

This request structure would include the filehandle, credential,
possible other context, place to store the result, and a completion.
The server thread would complete the completion and the client, after
queuing the request, would wait for the completion.

So: easy enough to do if it turns out to be necessary.

NeilBrown


> 
> 
> --
> Chuck Lever
> 
> 
> 

.

