From: Jian-Hong Pan <jhp@endlessos.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Johan Hovold <johan@kernel.org>,
	David Box <david.e.box@linux.intel.com>,
	=?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com>,
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Damien Le Moal <dlemoal@kernel.org>,
	Nirmal Patel <nirmal.patel@linux.intel.com>,
	Jonathan Derrick <jonathan.derrick@linux.dev>,
	linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux@endlessos.org,
	Jian-Hong Pan <jhp@endlessos.org>
Subject: [PATCH v6 0/3] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe
Date: Mon, 24 Jun 2024 16:11:09 +0800
Message-ID: <20240624081108.10143-2-jhp@endlessos.org>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1256529 org.kernel.vger.linux-pci:144390
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-pci
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Notice the VMD remapped PCIe Root Port and NVMe have PCI PM L1 substates
capability, but they are disabled originally.

Here is a failed example on ASUS B1400CEAE with enabled VMD:

10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01) (prog-if 00 [Normal decode])
    ...
    Capabilities: [200 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
        	  PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
        	   T_CommonMode=45us LTR1.2_Threshold=101376ns
        L1SubCtl2: T_PwrOn=50us

10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express])
    ...
    Capabilities: [900 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
                  PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
                   T_CommonMode=0us LTR1.2_Threshold=0ns
        L1SubCtl2: T_PwrOn=10us

According to "PCIe r6.0, sec 5.5.4", to config the link between the PCIe
Root Port and the child device correctly:
* Ensure both devices are in D0 before enabling PCI-PM L1 PM Substates.
* Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and
  LTR_L1.2_THRESHOLD are programmed properly on both devices before enable
  bits for L1.2.

Prepare this series to fix that.

Jian-Hong Pan (3):
  PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates
  PCI/ASPM: Add notes about enabling PCI-PM L1SS to
    pci_enable_link_state(_locked)
  PCI: vmd: Drop resetting PCI bus action after scan mapped PCI child
    bus

 drivers/pci/controller/vmd.c | 33 +++++++++------------------------
 drivers/pci/pcie/aspm.c      |  6 ++++++
 2 files changed, 15 insertions(+), 24 deletions(-)

-- 
2.45.2

.

From: Jiwei Sun <sunjw10@outlook.com>
To: nirmal.patel@linux.intel.com,
	jonathan.derrick@linux.dev
Cc: paul.m.stillwell.jr@intel.com,
	lpieralisi@kernel.org,
	kw@linux.com,
	robh@kernel.org,
	bhelgaas@google.com,
	linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	sunjw10@lenovo.com,
	ahuang12@lenovo.com,
	sunjw10@outlook.com
Subject: [PATCH v2] PCI: vmd: Use raw spinlock for cfg_lock
Date: Mon, 24 Jun 2024 17:37:00 +0800
Message-ID: <SEZPR01MB4527790242E852C3B6EECEC5A8D42@SEZPR01MB4527.apcprd01.prod.exchangelabs.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1256650 org.kernel.vger.linux-pci:144397
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-pci
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

From: Jiwei Sun <sunjw10@lenovo.com>

If the kernel is built with the following configurations and booting
  CONFIG_VMD=y
  CONFIG_DEBUG_LOCKDEP=y
  CONFIG_DEBUG_SPINLOCK=y
  CONFIG_PROVE_LOCKING=y
  CONFIG_PROVE_RAW_LOCK_NESTING=y

The following log appears,

=============================
[ BUG: Invalid wait context ]
6.10.0-rc4 #80 Not tainted
-----------------------------
kworker/18:2/633 is trying to lock:
ffff888c474e5648 (&vmd->cfg_lock){....}-{3:3}, at: vmd_pci_write+0x185/0x2a0
other info that might help us debug this:
context-{5:5}
4 locks held by kworker/18:2/633:
 #0: ffff888100108958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xf78/0x1920
 #1: ffffc9000ae1fd90 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x7fe/0x1920
 #2: ffff888c483508a8 (&md->mutex){+.+.}-{4:4}, at: __pci_enable_msi_range+0x208/0x800
 #3: ffff888c48329bd8 (&dev->msi_lock){....}-{2:2}, at: pci_msi_update_mask+0x91/0x170
stack backtrace:
CPU: 18 PID: 633 Comm: kworker/18:2 Not tainted 6.10.0-rc4 #80 7c0f2526417bfbb7579e3c3442683c5961773c75
Hardware name: Lenovo ThinkSystem SR630/-[7X01RCZ000]-, BIOS IVEL60O-2.71 09/28/2020
Workqueue: events work_for_cpu_fn
Call Trace:
 <TASK>
 dump_stack_lvl+0x7c/0xc0
 __lock_acquire+0x9e5/0x1ed0
 lock_acquire+0x194/0x490
 _raw_spin_lock_irqsave+0x42/0x90
 vmd_pci_write+0x185/0x2a0
 pci_msi_update_mask+0x10c/0x170
 __pci_enable_msi_range+0x291/0x800
 pci_alloc_irq_vectors_affinity+0x13e/0x1d0
 pcie_portdrv_probe+0x570/0xe60
 local_pci_probe+0xdc/0x190
 work_for_cpu_fn+0x4e/0xa0
 process_one_work+0x86d/0x1920
 process_scheduled_works+0xd7/0x140
 worker_thread+0x3e9/0xb90
 kthread+0x2e9/0x3d0
 ret_from_fork+0x2d/0x60
 ret_from_fork_asm+0x1a/0x30
 </TASK>

If CONFIG_PREEMPT_RT is not set, the spinlock_t is based on
raw_spinlock, there is no any question in the above call trace. But if
CONFIG_PREEMPT_RT is set, the spinlock_t is based on rt_mutex, a task
will be scheduled when waiting for rt_mutex. For example, there are two
threads are trying to hold a rt_mutex lock, if A hold the lock firstly,
and B will be scheduled in rtlock_slowlock_locked() waiting for A to
release the lock. The raw_spinlock is a real spinning lock, which is
not allowed the task of the raw_spinlock owner is scheduled in its
critical region. In other words, we should not try to acquire rt_mutex
lock in the critical region of the raw_spinlock when CONFIG_PREEMPT_RT
is set.

CONFIG_PROVE_LOCKING and CONFIG_PROVE_RAW_LOCK_NESTING options are
used to detect the invalid lock nesting (the raw_spinlock vs. spinlock
nesting checks). Here is the call path:

  pci_msi_update_mask  ---> hold raw_spinlock dev->msi_lock
    pci_write_config_dword
     pci_bus_write_config_dword
       vmd_pci_write   ---> hold spinlock_t vmd->cfg_lock

The above call path is the invalid lock nesting becuase the vmd driver
tries to acquire the vmd->cfg_lock spinlock within the raw_spinlock
region (dev->msi_lock). That's why the message "BUG: Invalid wait contex"
is shown.

Signed-off-by: Jiwei Sun<sunjw10@lenovo.com>
Suggested-by: Adrian Huang <ahuang12@lenovo.com>
---
v2 changes:
 - Add more explanations regarding the root cause in the commit message

 drivers/pci/controller/vmd.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 87b7856f375a..45d0ebf96adc 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -125,7 +125,7 @@ struct vmd_irq_list {
 struct vmd_dev {
 	struct pci_dev		*dev;
 
-	spinlock_t		cfg_lock;
+	raw_spinlock_t		cfg_lock;
 	void __iomem		*cfgbar;
 
 	int msix_count;
@@ -402,7 +402,7 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
 	if (!addr)
 		return -EFAULT;
 
-	spin_lock_irqsave(&vmd->cfg_lock, flags);
+	raw_spin_lock_irqsave(&vmd->cfg_lock, flags);
 	switch (len) {
 	case 1:
 		*value = readb(addr);
@@ -417,7 +417,7 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg,
 		ret = -EINVAL;
 		break;
 	}
-	spin_unlock_irqrestore(&vmd->cfg_lock, flags);
+	raw_spin_unlock_irqrestore(&vmd->cfg_lock, flags);
 	return ret;
 }
 
@@ -437,7 +437,7 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg,
 	if (!addr)
 		return -EFAULT;
 
-	spin_lock_irqsave(&vmd->cfg_lock, flags);
+	raw_spin_lock_irqsave(&vmd->cfg_lock, flags);
 	switch (len) {
 	case 1:
 		writeb(value, addr);
@@ -455,7 +455,7 @@ static int vmd_pci_write(struct pci_bus *bus, unsigned int devfn, int reg,
 		ret = -EINVAL;
 		break;
 	}
-	spin_unlock_irqrestore(&vmd->cfg_lock, flags);
+	raw_spin_unlock_irqrestore(&vmd->cfg_lock, flags);
 	return ret;
 }
 
@@ -1015,7 +1015,7 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	if (features & VMD_FEAT_OFFSET_FIRST_VECTOR)
 		vmd->first_vec = 1;
 
-	spin_lock_init(&vmd->cfg_lock);
+	raw_spin_lock_init(&vmd->cfg_lock);
 	pci_set_drvdata(dev, vmd);
 	err = vmd_enable_domain(vmd, features);
 	if (err)
-- 
2.27.0

.

From: Thippeswamy Havalige <thippesw@amd.com>
To: <bhelgaas@google.com>, <kw@linux.com>, <robh@kernel.org>,
	<krzk+dt@kernel.org>, <conor+dt@kernel.org>, <lpieralisi@kernel.org>
CC: <linux-pci@vger.kernel.org>, <devicetree@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <michal.simek@amd.com>,
	<linux-arm-kernel@lists.infradead.org>, <bharat.kumar.gogada@amd.com>,
	Thippeswamy Havalige <thippesw@amd.com>
Subject: [PATCH 0/2] Add support for Xilinx XDMA Soft IP as Root Port
Date: Mon, 24 Jun 2024 16:12:37 +0530
Message-ID: <20240624104239.132159-1-thippesw@amd.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1256709 org.kernel.vger.linux-pci:144398
Newsgroups: org.kernel.vger.linux-kernel,org.infradead.lists.linux-arm-kernel,org.kernel.vger.linux-devicetree,org.kernel.vger.linux-pci
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

This series of patch add support for Xilinx QDMA Soft IP as Root Port.

The Xilinx QDMA Soft IP support's 32 bit and 64bit BAR's.
As Root Port it supports MSI and legacy interrupts.

Thippeswamy Havalige (2):
  dt-bindings: PCI: xilinx-xdma: Add schemas for Xilinx QDMA PCIe Root
    Port Bridge
  PCI: xilinx-xdma: Add Xilinx QDMA Root Port driver

 .../devicetree/bindings/pci/xlnx,xdma-host.yaml    | 41 +++++++++++++++-
 drivers/pci/controller/pcie-xilinx-dma-pl.c        | 56 ++++++++++++++++++++--
 2 files changed, 92 insertions(+), 5 deletions(-)

-- 
1.8.3.1

.

From: Thippeswamy Havalige <thippesw@amd.com>
To: <bhelgaas@google.com>, <kw@linux.com>, <robh@kernel.org>,
	<krzk+dt@kernel.org>, <conor+dt@kernel.org>, <lpieralisi@kernel.org>
CC: <linux-pci@vger.kernel.org>, <devicetree@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bharat.kumar.gogada@amd.com>, "Thippeswamy
 Havalige" <thippesw@amd.com>
Subject: [PATCH 0/2] Add support for Xilinx XDMA Soft IP as Root Port
Date: Mon, 24 Jun 2024 16:37:53 +0530
Message-ID: <20240624110755.133625-1-thippesw@amd.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1256730 org.kernel.vger.linux-pci:144401
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-devicetree,org.kernel.vger.linux-pci
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

This series of patch add support for Xilinx QDMA Soft IP as Root Port.

The Xilinx QDMA Soft IP support's 32 bit and 64bit BAR's.
As Root Port it supports MSI and legacy interrupts.

Thippeswamy Havalige (2):
  dt-bindings: PCI: xilinx-xdma: Add schemas for Xilinx QDMA PCIe Root
    Port Bridge
  PCI: xilinx-xdma: Add Xilinx QDMA Root Port driver

 .../devicetree/bindings/pci/xlnx,xdma-host.yaml    | 41 +++++++++++++++-
 drivers/pci/controller/pcie-xilinx-dma-pl.c        | 56 ++++++++++++++++++++--
 2 files changed, 92 insertions(+), 5 deletions(-)

-- 
1.8.3.1

.

From: Thippeswamy Havalige <thippesw@amd.com>
To: <bhelgaas@google.com>, <kw@linux.com>, <robh@kernel.org>,
	<krzk+dt@kernel.org>, <conor+dt@kernel.org>, <lpieralisi@kernel.org>
CC: <linux-pci@vger.kernel.org>, <devicetree@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bharat.kumar.gogada@amd.com>, "Thippeswamy
 Havalige" <thippesw@amd.com>
Subject: [PATCH] dt-bindings: PCI: xilinx-cpm: Fix ranges property to avoid overlapping of bridge register and 32-bit BAR addresses
Date: Mon, 24 Jun 2024 16:40:22 +0530
Message-ID: <20240624111022.133780-1-thippesw@amd.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1256736 org.kernel.vger.linux-pci:144403
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-devicetree,org.kernel.vger.linux-pci
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

The current configuration had non-prefetchable memory overlapping with
bridge registers by 64KB from base address. This patch fixes the 'ranges'
property in the device tree by adjusting the non-prefetchable memory
addresses beyond the 64KB mark to prevent conflicts. 

Signed-off-by: Thippeswamy Havalige <thippesw@amd.com>
---
 Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml b/Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml
index 4770ce02fcc3..989fb0fa2577 100644
--- a/Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml
+++ b/Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml
@@ -92,7 +92,7 @@ examples:
                                        <0 0 0 3 &pcie_intc_0 2>,
                                        <0 0 0 4 &pcie_intc_0 3>;
                        bus-range = <0x00 0xff>;
-                       ranges = <0x02000000 0x0 0xe0000000 0x0 0xe0000000 0x0 0x10000000>,
+                       ranges = <0x02000000 0x0 0xe0010000 0x0 0xe0010000 0x0 0x10000000>,
                                 <0x43000000 0x80 0x00000000 0x80 0x00000000 0x0 0x80000000>;
                        msi-map = <0x0 &its_gic 0x0 0x10000>;
                        reg = <0x0 0xfca10000 0x0 0x1000>,
-- 
2.25.1

.

From: Krishna Kumar <krishnak@linux.ibm.com>
To: mpe@ellerman.id.au, npiggin@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
        linux-pci@vger.kernel.org, brking@linux.vnet.ibm.com,
        gbatra@linux.ibm.com, aneesh.kumar@kernel.org,
        christophe.leroy@csgroup.eu, nathanl@linux.ibm.com,
        bhelgaas@google.com, oohall@gmail.com, tpearson@raptorengineering.com,
        mahesh.salgaonkar@in.ibm.com, Krishna Kumar <krishnak@linux.ibm.com>
Subject: [PATCH v3 0/2] PCI hotplug driver fixes
Date: Mon, 24 Jun 2024 17:39:26 +0530
Message-ID: <20240624121052.233232-1-krishnak@linux.ibm.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=yes
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1256813 org.kernel.vger.linux-pci:144409
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-pci,org.ozlabs.lists.linuxppc-dev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

The fix of Powerpc hotplug driver (drivers/pci/hotplug/pnv_php.c)
addresses below two issues.

1. Kernel Crash during hot unplug of bridge/switch slot.

2. Bridge Support Enablement - Previously, when we do a hot-unplug
operation on a bridge slot, all the ports and devices behind the
bridge-ports would be hot-unplugged/offline, but when we do a hot-plug
operation on the same bridge slot, all the ports and devices behind the
bridge would not get hot-plugged/online. In this case, Only the first
port of the bridge gets enabled and the remaining port/devices remain
unplugged/offline.  After the fix, The hot-unplug and hot-plug
operations on the slot associated with the bridge started behaving
correctly and became in sync. Now, after the hot plug operation on the
same slot, all the bridge ports and devices behind the bridge become
hot-plugged/online/restored in the same manner as it was before the
hot-unplug operation.


Krishna Kumar (2):
  pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
  powerpc: hotplug driver bridge support

 arch/powerpc/include/asm/ppc-pci.h |  4 ++++
 arch/powerpc/kernel/pci-hotplug.c  |  5 ++---
 arch/powerpc/kernel/pci_dn.c       | 32 ++++++++++++++++++++++++++++++
 drivers/pci/hotplug/pnv_php.c      |  3 +--
 4 files changed, 39 insertions(+), 5 deletions(-)

Changelog:
==========

v3: 24 June 2024
  - Removed the DPC keyword from description in cover letter and
    patch2

v2: 14 May 2024
  - Used of_property_read_u32() in place of of_get_property() and
    of_read_number(). [patch2]
  - Removed some unnecessary variable and changed the function return
    type from void* to void. [patch2]
  - Removed the export declaration of
    pci_traverse_sibling_nodes_and_scan_slot() as its not needed.
    [patch2]

-- 
2.45.0

.

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
To: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	=?UTF-8?q?Krzysztof=20Wilczy=C5=84ski?= <kw@linux.com>,
	Rob Herring <robh@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Karthikeyan Mitran <m.karthikeyan@mobiveil.co.in>
Cc: linux-kernel@vger.kernel.org,
	kernel-janitors@vger.kernel.org,
	Christophe JAILLET <christophe.jaillet@wanadoo.fr>,
	linux-pci@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: [PATCH] PCI: ls-gen4: Constify struct mobiveil_rp_ops
Date: Mon, 24 Jun 2024 22:18:20 +0200
Message-ID: <189fd881cc8fd80220e74e91820e12cf3a5be114.1719260294.git.christophe.jaillet@wanadoo.fr>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1257377 org.kernel.vger.linux-pci:144430
Newsgroups: org.kernel.vger.linux-kernel,org.infradead.lists.linux-arm-kernel,org.kernel.vger.kernel-janitors,org.kernel.vger.linux-pci
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

'struct mobiveil_rp_ops' is not modified in this driver.

Constifying this structure moves some data to a read-only section, so
increase overall security.

On a x86_64, with allmodconfig, as an example:
Before:
======
   text	   data	    bss	    dec	    hex	filename
   4446	    336	     32	   4814	   12ce	drivers/pci/controller/mobiveil/pcie-layerscape-gen4.o

After:
=====
   text	   data	    bss	    dec	    hex	filename
   4454	    328	     32	   4814	   12ce	drivers/pci/controller/mobiveil/pcie-layerscape-gen4.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Compile tested-only
---
 drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c | 2 +-
 drivers/pci/controller/mobiveil/pcie-mobiveil.h        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c b/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c
index d7b7350f02dd..5af22bee913b 100644
--- a/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c
+++ b/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c
@@ -190,7 +190,7 @@ static void ls_g4_pcie_reset(struct work_struct *work)
 	ls_g4_pcie_enable_interrupt(pcie);
 }
 
-static struct mobiveil_rp_ops ls_g4_pcie_rp_ops = {
+static const struct mobiveil_rp_ops ls_g4_pcie_rp_ops = {
 	.interrupt_init = ls_g4_pcie_interrupt_init,
 };
 
diff --git a/drivers/pci/controller/mobiveil/pcie-mobiveil.h b/drivers/pci/controller/mobiveil/pcie-mobiveil.h
index 6082b8afbc31..e63abb887ee3 100644
--- a/drivers/pci/controller/mobiveil/pcie-mobiveil.h
+++ b/drivers/pci/controller/mobiveil/pcie-mobiveil.h
@@ -151,7 +151,7 @@ struct mobiveil_rp_ops {
 struct mobiveil_root_port {
 	void __iomem *config_axi_slave_base;	/* endpoint config base */
 	struct resource *ob_io_res;
-	struct mobiveil_rp_ops *ops;
+	const struct mobiveil_rp_ops *ops;
 	int irq;
 	raw_spinlock_t intx_mask_lock;
 	struct irq_domain *intx_domain;
-- 
2.45.2

.

Date: Mon, 24 Jun 2024 20:37:28 +0000
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240624203729.1094506-1-smostafa@google.com>
Subject: [PATCH v2] PCI/MSI: Fix UAF in msi_capability_init
From: Mostafa Saleh <smostafa@google.com>
To: bhelgaas@google.com, linux-pci@vger.kernel.org, 
	linux-kernel@vger.kernel.org
Cc: ilpo.jarvinen@linux.intel.com, tglx@linutronix.de, kernel-team@android.com, 
	Mostafa Saleh <smostafa@google.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1257399 org.kernel.vger.linux-pci:144431
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-pci
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

While running Android15-6.6 and hacking on latest Qemu, I observed
the following with KFENCE:

[   10.761992][   T81] pcieport 0000:00:03.0: Adding to iommu group 0
[   10.782536][   T81] pcieport 0000:00:03.0: enabling device (0000 -> 0003=
)
[   10.814259][   T81] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
[   10.816534][   T81] BUG: KFENCE: use-after-free read in __pci_enable_msi=
_range+0x2c0/0x488
[   10.816534][   T81]
[   10.820189][   T81] Use-after-free read at 0x0000000024629571 (in kfence=
-#12):
[   10.822652][   T81]  __pci_enable_msi_range+0x2c0/0x488
[   10.824304][   T81]  pci_alloc_irq_vectors_affinity+0xec/0x14c
[   10.826129][   T81]  pci_alloc_irq_vectors+0x18/0x28
[   10.827606][   T81]  pcie_portdrv_probe+0x1e0/0x608
[   10.828979][   T81]  pci_device_probe+0xa8/0x174
[   10.830316][   T81]  really_probe+0x150/0x2b8
[   10.831717][   T81]  __driver_probe_device+0x7c/0x130
[   10.833368][   T81]  driver_probe_device+0x40/0x118
[   10.834899][   T81]  __device_attach_driver+0xc4/0x108
[   10.836464][   T81]  bus_for_each_drv+0x8c/0xf0
[   10.837915][   T81]  __device_attach+0xa4/0x198
[   10.839260][   T81]  device_initial_probe+0x18/0x28
[   10.840684][   T81]  bus_probe_device+0xb0/0xb4
[   10.842174][   T81]  deferred_probe_work_func+0xac/0xf8
[   10.843760][   T81]  process_one_work+0x18c/0x414
[   10.845129][   T81]  worker_thread+0x41c/0x53c
[   10.846611][   T81]  kthread+0x114/0x118
[   10.847841][   T81]  ret_from_fork+0x10/0x20
[   10.849298][   T81]
[   10.850435][   T81] kfence-#12: 0x0000000008614900-0x00000000e06c228d, s=
ize=3D104, cache=3Dkmalloc-128
[   10.850435][   T81]
[   10.853455][   T81] allocated by task 81 on cpu 7 at 10.808142s:
[   10.856329][   T81]  __kmem_cache_alloc_node+0x1f0/0x2bc
[   10.857988][   T81]  kmalloc_trace+0x44/0x138
[   10.859299][   T81]  msi_alloc_desc+0x3c/0x9c
[   10.860582][   T81]  msi_domain_insert_msi_desc+0x30/0x78
[   10.862058][   T81]  msi_setup_msi_desc+0x13c/0x184
[   10.862942][   T81]  __pci_enable_msi_range+0x258/0x488
[   10.863847][   T81]  pci_alloc_irq_vectors_affinity+0xec/0x14c
[   10.864821][   T81]  pci_alloc_irq_vectors+0x18/0x28
[   10.866011][   T81]  pcie_portdrv_probe+0x1e0/0x608
[   10.867047][   T81]  pci_device_probe+0xa8/0x174
[   10.867937][   T81]  really_probe+0x150/0x2b8
[   10.868774][   T81]  __driver_probe_device+0x7c/0x130
[   10.869755][   T81]  driver_probe_device+0x40/0x118
[   10.870665][   T81]  __device_attach_driver+0xc4/0x108
[   10.871654][   T81]  bus_for_each_drv+0x8c/0xf0
[   10.872585][   T81]  __device_attach+0xa4/0x198
[   10.873581][   T81]  device_initial_probe+0x18/0x28
[   10.874645][   T81]  bus_probe_device+0xb0/0xb4
[   10.875610][   T81]  deferred_probe_work_func+0xac/0xf8
[   10.876696][   T81]  process_one_work+0x18c/0x414
[   10.877573][   T81]  worker_thread+0x41c/0x53c
[   10.878468][   T81]  kthread+0x114/0x118
[   10.879331][   T81]  ret_from_fork+0x10/0x20
[   10.880238][   T81]
[   10.880767][   T81] freed by task 81 on cpu 7 at 10.811436s:
[   10.882311][   T81]  msi_domain_free_descs+0xd4/0x10c
[   10.883318][   T81]  msi_domain_free_locked.part.0+0xc0/0x1d8
[   10.884401][   T81]  msi_domain_alloc_irqs_all_locked+0xb4/0xbc
[   10.885519][   T81]  pci_msi_setup_msi_irqs+0x30/0x4c
[   10.886549][   T81]  __pci_enable_msi_range+0x2a8/0x488
[   10.887537][   T81]  pci_alloc_irq_vectors_affinity+0xec/0x14c
[   10.888576][   T81]  pci_alloc_irq_vectors+0x18/0x28
[   10.889693][   T81]  pcie_portdrv_probe+0x1e0/0x608
[   10.890719][   T81]  pci_device_probe+0xa8/0x174
[   10.891623][   T81]  really_probe+0x150/0x2b8
[   10.892534][   T81]  __driver_probe_device+0x7c/0x130
[   10.893541][   T81]  driver_probe_device+0x40/0x118
[   10.894491][   T81]  __device_attach_driver+0xc4/0x108
[   10.895433][   T81]  bus_for_each_drv+0x8c/0xf0
[   10.896381][   T81]  __device_attach+0xa4/0x198
[   10.897359][   T81]  device_initial_probe+0x18/0x28
[   10.898401][   T81]  bus_probe_device+0xb0/0xb4
[   10.899268][   T81]  deferred_probe_work_func+0xac/0xf8
[   10.900314][   T81]  process_one_work+0x18c/0x414
[   10.901279][   T81]  worker_thread+0x41c/0x53c
[   10.902138][   T81]  kthread+0x114/0x118
[   10.902916][   T81]  ret_from_fork+0x10/0x20
[   10.903816][   T81]
[   10.904429][   T81] CPU: 7 PID: 81 Comm: kworker/u16:2 Not tainted 6.6.2=
3mostafa+ #224
[   10.906362][   T81] Hardware name: linux,dummy-virt (DT)
[   10.907934][   T81] Workqueue: events_unbound deferred_probe_work_func
[   10.909959][   T81] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Looking at the upstream code, it seems to have the same issue where:

Descriptor allocation done in:
__pci_enable_msi_range
    msi_capability_init
        msi_setup_msi_desc
            msi_insert_msi_desc
                msi_domain_insert_msi_desc
                    msi_alloc_desc
                        ...

Freed in case of failure in __msi_domain_alloc_locked()
__pci_enable_msi_range
    msi_capability_init
        pci_msi_setup_msi_irqs
            msi_domain_alloc_irqs_all_locked
                msi_domain_alloc_locked
                    __msi_domain_alloc_locked =3D> fails
                    msi_domain_free_locked
                        ...

That failure would propagate back till pci_msi_setup_msi_irqs() call
in msi_capability_init() which have "goto err" which would access the
descriptor to clear the mask.

However, we can=E2=80=99t make assumptions if the descriptor is freed or no=
t
as it depends on the failure location and on MSI_FLAG_FREE_MSI_DESCS,
so we copy it and use the copy to mask it in case of error.

Also, calling pci_free_msi_irqs() after will trigger
msi_domain_free_locked() again, however it re-iterates through the xa
array, which should be safe.

I hit this only once, but I can confirm the bug(and verify the fix)
for the upstream kernel through stubbing return of
__msi_domain_alloc_locked() to fail and with KASAN enabled.

bf6e054e0e3f ("genirq/msi: Provide msi_device_populate/destroy_sysfs()")
is the first commit that introduced the free logic from the context
of pci_msi_setup_msi_irqs()

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mostafa Saleh <smostafa@google.com>

---
v2:
- Copy descriptor as suggested by tglx to unmaks freed msis
---
 drivers/pci/msi/msi.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index c5625dd9bf49..3a45879d85db 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -352,7 +352,7 @@ static int msi_capability_init(struct pci_dev *dev, int=
 nvec,
 			       struct irq_affinity *affd)
 {
 	struct irq_affinity_desc *masks =3D NULL;
-	struct msi_desc *entry;
+	struct msi_desc *entry, desc;
 	int ret;
=20
 	/* Reject multi-MSI early on irq domain enabled architectures */
@@ -377,6 +377,12 @@ static int msi_capability_init(struct pci_dev *dev, in=
t nvec,
 	/* All MSIs are unmasked by default; mask them all */
 	entry =3D msi_first_desc(&dev->dev, MSI_DESC_ALL);
 	pci_msi_mask(entry, msi_multi_mask(entry));
+	/*
+	 * Copy the MSI descriptor for the error path because
+	 * pci_msi_setup_msi_irqs() will free it for the hierarchical
+	 * interrupt domain case.
+	 */
+	memcpy(&desc, entry, sizeof(desc));
=20
 	/* Configure MSI capability structure */
 	ret =3D pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
@@ -396,7 +402,7 @@ static int msi_capability_init(struct pci_dev *dev, int=
 nvec,
 	goto unlock;
=20
 err:
-	pci_msi_unmask(entry, msi_multi_mask(entry));
+	pci_msi_unmask(&desc, msi_multi_mask(&desc));
 	pci_free_msi_irqs(dev);
 fail:
 	dev->msi_enabled =3D 0;
--=20
2.45.2.741.gdbec12cfda-goog

.

