From: Wei Yang <richard.weiyang@gmail.com>
To: akpm@linux-foundation.org,
	masahiroy@kernel.org,
	nathan@kernel.org,
	nicolas@fjasle.eu
Cc: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	linux-kbuild@vger.kernel.org,
	Wei Yang <richard.weiyang@gmail.com>,
	Mike Rapoport <rppt@kernel.org>
Subject: [PATCH 1/3] mm: use zonelist_zone() to get zone
Date: Tue,  2 Jul 2024 23:40:06 +0000
Message-Id: <20240702234008.19101-1-richard.weiyang@gmail.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1265855 org.kvack.linux-mm:202265
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-kbuild,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Instead of accessing zoneref->zone directly, use zonelist_zone() like
other places for consistency.

No functional change.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
---
 include/linux/mmzone.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cb7f265c2b96..a34a74f5b113 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1690,7 +1690,7 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
 			zone = zonelist_zone(z))
 
 #define for_next_zone_zonelist_nodemask(zone, z, highidx, nodemask) \
-	for (zone = z->zone;	\
+	for (zone = zonelist_zone(z);	\
 		zone;							\
 		z = next_zones_zonelist(++z, highidx, nodemask),	\
 			zone = zonelist_zone(z))
-- 
2.34.1

.

From: Wei Yang <richard.weiyang@gmail.com>
To: rppt@kernel.org,
	akpm@linux-foundation.org,
	brauner@kernel.org,
	oleg@redhat.com,
	mjguzik@gmail.com,
	tandersen@netflix.com
Cc: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	david@redhat.com,
	Wei Yang <richard.weiyang@gmail.com>
Subject: [PATCH v3 1/3] mm/memblock: introduce a new helper memblock_estimated_nr_pages()
Date: Wed,  3 Jul 2024 00:51:49 +0000
Message-Id: <20240703005151.28712-1-richard.weiyang@gmail.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1265895 org.kvack.linux-mm:202274
Newsgroups: org.kernel.vger.linux-kernel,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Instead of using raw memblock api, we wrap a new helper for user.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 40c62aca36ec..7d1c32b3dc12 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -486,6 +486,7 @@ static inline __init_memblock bool memblock_bottom_up(void)
 
 phys_addr_t memblock_phys_mem_size(void);
 phys_addr_t memblock_reserved_size(void);
+unsigned long memblock_estimated_nr_pages(void);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
diff --git a/mm/memblock.c b/mm/memblock.c
index e81fb68f7f88..c1f1aac0459f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1729,6 +1729,25 @@ phys_addr_t __init_memblock memblock_reserved_size(void)
 	return memblock.reserved.total_size;
 }
 
+/**
+ * memblock_estimated_nr_pages - return number of pages from memblock point of
+ * view
+ *
+ * During bootup, system may need number of pages in the whole system to do
+ * some calculation before all pages are freed to buddy system, especially
+ * when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled.
+ *
+ * At this point, we can get this information from memblock. Since the system
+ * state is not settle down and address alignment, the value is an estimation.
+ *
+ * Return:
+ * An estimated number of pages from memblock point of view.
+ */
+unsigned long __init memblock_estimated_nr_pages(void)
+{
+	return PHYS_PFN(memblock_phys_mem_size() - memblock_reserved_size());
+}
+
 /* lowest address */
 phys_addr_t __init_memblock memblock_start_of_DRAM(void)
 {
-- 
2.34.1

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 08:53:24 +0800
From: kernel test robot <lkp@intel.com>
To: Thomas =?iso-8859-1?Q?Wei=DFschuh?= <linux@weissschuh.net>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>,
	Tzung-Bi Shih <tzungbi@kernel.org>
Subject: [linux-next:master 9580/10049]
 drivers/power/supply/cros_charge-control.c:210 charge_behaviour_store()
 warn: unsigned 'behaviour' is never less than zero.
Message-ID: <202407030856.WDIxlKzW-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202278
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   82e4255305c554b0bb18b7ccf2db86041b4c8b6e
commit: c6ed48ef52599098498a8442fd60bea5bd8cd309 [9580/10049] power: supply: add ChromeOS EC based charge control driver
config: x86_64-randconfig-161-20240703 (https://download.01.org/0day-ci/archive/20240703/202407030856.WDIxlKzW-lkp@intel.com/config)
compiler: gcc-9 (Ubuntu 9.5.0-4ubuntu2) 9.5.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407030856.WDIxlKzW-lkp@intel.com/

smatch warnings:
drivers/power/supply/cros_charge-control.c:210 charge_behaviour_store() warn: unsigned 'behaviour' is never less than zero.
drivers/power/supply/cros_charge-control.c:297 cros_chctl_probe() error: buffer overflow 'priv->attributes' 3 <= 3

vim +/behaviour +210 drivers/power/supply/cros_charge-control.c

   200	
   201	static ssize_t charge_behaviour_store(struct device *dev, struct device_attribute *attr,
   202					      const char *buf, size_t count)
   203	{
   204		struct cros_chctl_priv *priv = cros_chctl_attr_to_priv(&attr->attr,
   205								       CROS_CHCTL_ATTR_CHARGE_BEHAVIOUR);
   206		enum power_supply_charge_behaviour behaviour;
   207		int ret;
   208	
   209		behaviour = power_supply_charge_behaviour_parse(EC_CHARGE_CONTROL_BEHAVIOURS, buf);
 > 210		if (behaviour < 0)
   211			return behaviour;
   212	
   213		priv->current_behaviour = behaviour;
   214	
   215		ret = cros_chctl_configure_ec(priv);
   216		if (ret < 0)
   217			return ret;
   218	
   219		return count;
   220	}
   221	
   222	static umode_t cros_chtl_attr_is_visible(struct kobject *kobj, struct attribute *attr, int n)
   223	{
   224		struct cros_chctl_priv *priv = cros_chctl_attr_to_priv(attr, n);
   225	
   226		if (priv->cmd_version < 2) {
   227			if (n == CROS_CHCTL_ATTR_START_THRESHOLD)
   228				return 0;
   229			if (n == CROS_CHCTL_ATTR_END_THRESHOLD)
   230				return 0;
   231		}
   232	
   233		return attr->mode;
   234	}
   235	
   236	static int cros_chctl_add_battery(struct power_supply *battery, struct acpi_battery_hook *hook)
   237	{
   238		struct cros_chctl_priv *priv = container_of(hook, struct cros_chctl_priv, battery_hook);
   239	
   240		if (priv->hooked_battery)
   241			return 0;
   242	
   243		priv->hooked_battery = battery;
   244		return device_add_group(&battery->dev, &priv->group);
   245	}
   246	
   247	static int cros_chctl_remove_battery(struct power_supply *battery, struct acpi_battery_hook *hook)
   248	{
   249		struct cros_chctl_priv *priv = container_of(hook, struct cros_chctl_priv, battery_hook);
   250	
   251		if (priv->hooked_battery == battery) {
   252			device_remove_group(&battery->dev, &priv->group);
   253			priv->hooked_battery = NULL;
   254		}
   255	
   256		return 0;
   257	}
   258	
   259	static int cros_chctl_probe(struct platform_device *pdev)
   260	{
   261		struct device *dev = &pdev->dev;
   262		struct cros_ec_dev *ec_dev = dev_get_drvdata(dev->parent);
   263		struct cros_ec_device *cros_ec = ec_dev->ec_dev;
   264		struct cros_chctl_priv *priv;
   265		size_t i;
   266		int ret;
   267	
   268		priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
   269		if (!priv)
   270			return -ENOMEM;
   271	
   272		ret = cros_ec_get_cmd_versions(cros_ec, EC_CMD_CHARGE_CONTROL);
   273		if (ret < 0)
   274			return ret;
   275		else if (ret & EC_VER_MASK(3))
   276			priv->cmd_version = 3;
   277		else if (ret & EC_VER_MASK(2))
   278			priv->cmd_version = 2;
   279		else if (ret & EC_VER_MASK(1))
   280			priv->cmd_version = 1;
   281		else
   282			return -ENODEV;
   283	
   284		dev_dbg(dev, "Command version: %u\n", (unsigned int)priv->cmd_version);
   285	
   286		priv->cros_ec = cros_ec;
   287		priv->device_attrs[CROS_CHCTL_ATTR_START_THRESHOLD] =
   288			(struct device_attribute)__ATTR_RW(charge_control_start_threshold);
   289		priv->device_attrs[CROS_CHCTL_ATTR_END_THRESHOLD] =
   290			(struct device_attribute)__ATTR_RW(charge_control_end_threshold);
   291		priv->device_attrs[CROS_CHCTL_ATTR_CHARGE_BEHAVIOUR] =
   292			(struct device_attribute)__ATTR_RW(charge_behaviour);
   293		for (i = 0; i < _CROS_CHCTL_ATTR_COUNT; i++) {
   294			sysfs_attr_init(&priv->device_attrs[i].attr);
   295			priv->attributes[i] = &priv->device_attrs[i].attr;
   296		}
 > 297		priv->attributes[_CROS_CHCTL_ATTR_COUNT] = NULL;
   298		priv->group.is_visible = cros_chtl_attr_is_visible;
   299		priv->group.attrs = priv->attributes;
   300	
   301		priv->battery_hook.name = dev_name(dev);
   302		priv->battery_hook.add_battery = cros_chctl_add_battery;
   303		priv->battery_hook.remove_battery = cros_chctl_remove_battery;
   304	
   305		priv->current_behaviour = POWER_SUPPLY_CHARGE_BEHAVIOUR_AUTO;
   306		priv->current_start_threshold = 0;
   307		priv->current_end_threshold = 100;
   308	
   309		/* Bring EC into well-known state */
   310		ret = cros_chctl_configure_ec(priv);
   311		if (ret < 0)
   312			return ret;
   313	
   314		return devm_battery_hook_register(dev, &priv->battery_hook);
   315	}
   316	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 08:53:25 +0800
From: kernel test robot <lkp@intel.com>
To: Frank Li <Frank.Li@nxp.com>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>,
	Shawn Guo <shawnguo@kernel.org>
Subject: [linux-next:master 8735/9748]
 arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: aux-bus:
 '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges',
 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any
 of the regexes: 'pinctrl-[0-9]+'
Message-ID: <202407031134.kIIviSVq-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202279
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   82e4255305c554b0bb18b7ccf2db86041b4c8b6e
commit: 1bc8ad8138db3e15b27ea0725edb2b574fde9eec [8735/9748] arm64: dts: layerscape: rename aux_bus to aux-bus
config: arm64-randconfig-051-20240702 (https://download.01.org/0day-ci/archive/20240703/202407031134.kIIviSVq-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 326ba38a991250a8587a399a260b0f7af2c9166a)
dtschema version: 2024.6.dev3+g650bf2d
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240703/202407031134.kIIviSVq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407031134.kIIviSVq-lkp@intel.com/

dtcheck warnings: (new ones prefixed by >>)
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian', 'clock-names' were unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1043a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/msi-controller1@1571000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/msi-controller2@1572000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/msi-controller3@1573000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: pcie@3400000: fsl,pcie-scfg:0: [23, 0] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: pcie@3500000: fsl,pcie-scfg:0: [23, 1] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: pcie@3600000: fsl,pcie-scfg:0: [23, 2] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/ucc@2000: failed to match any schema with compatible: ['fsl,ucc-hdlc']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian', 'clock-names' were unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1043a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/msi-controller1@1571000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/msi-controller2@1572000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/msi-controller3@1573000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: pcie@3400000: fsl,pcie-scfg:0: [22, 0] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: pcie@3500000: fsl,pcie-scfg:0: [22, 1] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: pcie@3600000: fsl,pcie-scfg:0: [22, 2] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian', 'clock-names' were unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1043a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/msi-controller1@1571000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/msi-controller2@1572000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/msi-controller3@1573000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: pcie@3400000: fsl,pcie-scfg:0: [25, 0] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: pcie@3500000: fsl,pcie-scfg:0: [25, 1] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: pcie@3600000: fsl,pcie-scfg:0: [25, 2] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian' was unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1046a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/msi-controller@1580000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/msi-controller@1590000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/msi-controller@15a0000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: pcie_ep@3400000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/pcie_ep@3400000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: pcie_ep@3500000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/pcie_ep@3500000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: pcie_ep@3600000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian' was unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1046a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/msi-controller@1580000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/msi-controller@1590000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/msi-controller@15a0000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: pcie_ep@3400000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/pcie_ep@3400000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: pcie_ep@3500000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: /soc/pcie_ep@3500000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb: pcie_ep@3600000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian' was unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1046a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/msi-controller@1580000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/msi-controller@1590000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/msi-controller@15a0000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: pcie_ep@3400000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/pcie_ep@3400000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: pcie_ep@3500000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: /soc/pcie_ep@3500000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: pcie_ep@3600000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2310000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2320000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/gpio@2330000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian' was unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
>> arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1046a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/msi-controller@1580000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/msi-controller@1590000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/msi-controller@15a0000: failed to match any schema with compatible: ['fsl,ls1046a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: pcie_ep@3400000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/pcie_ep@3400000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: pcie_ep@3500000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long
   	from schema $id: http://devicetree.org/schemas/pci/fsl,layerscape-pcie-ep.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: /soc/pcie_ep@3500000: failed to match any schema with compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep']
   arch/arm64/boot/dts/freescale/fsl-ls1046a-tqmls1046a-mbls10xxa.dtb: pcie_ep@3600000: compatible: ['fsl,ls1046a-pcie-ep', 'fsl,ls-pcie-ep'] is too long

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Date: Tue,  2 Jul 2024 18:53:54 -0700
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240703015354.3370503-1-surenb@google.com>
Subject: [PATCH 1/1] mm, slab: move allocation tagging code in the alloc path
 into a hook
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: vbabka@suse.cz, kent.overstreet@linux.dev, cl@linux.com, 
	penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, 
	roman.gushchin@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	surenb@google.com
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1265918 org.kvack.linux-mm:202283
Newsgroups: org.kernel.vger.linux-kernel,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Move allocation tagging specific code in the allocation path into
alloc_tagging_slab_alloc_hook, similar to how freeing path uses
alloc_tagging_slab_free_hook. No functional changes, just code
cleanup.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/slub.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4927edec6a8c..99d53190cfcf 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2033,11 +2033,18 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
 	return slab_obj_exts(slab) + obj_to_index(s, slab, p);
 }
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+static inline void
+alloc_tagging_slab_alloc_hook(struct slabobj_ext *obj_exts, unsigned int size)
+{
+	alloc_tag_add(&obj_exts->ref, current->alloc_tag, size);
+}
+
 static inline void
 alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 			     int objects)
 {
-#ifdef CONFIG_MEM_ALLOC_PROFILING
 	struct slabobj_ext *obj_exts;
 	int i;
 
@@ -2053,9 +2060,23 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 
 		alloc_tag_sub(&obj_exts[off].ref, s->size);
 	}
-#endif
 }
 
+#else /* CONFIG_MEM_ALLOC_PROFILING */
+
+static inline void
+alloc_tagging_slab_alloc_hook(struct slabobj_ext *obj_exts, unsigned int size)
+{
+}
+
+static inline void
+alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
+			     int objects)
+{
+}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING*/
+
 #else /* CONFIG_SLAB_OBJ_EXT */
 
 static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -2079,6 +2100,11 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
 	return NULL;
 }
 
+static inline void
+alloc_tagging_slab_alloc_hook(struct slabobj_ext *obj_exts, unsigned int size)
+{
+}
+
 static inline void
 alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 			     int objects)
@@ -3944,7 +3970,6 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 		kmemleak_alloc_recursive(p[i], s->object_size, 1,
 					 s->flags, init_flags);
 		kmsan_slab_alloc(s, p[i], init_flags);
-#ifdef CONFIG_MEM_ALLOC_PROFILING
 		if (need_slab_obj_ext()) {
 			struct slabobj_ext *obj_exts;
 
@@ -3955,9 +3980,8 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 			 * check should be added before alloc_tag_add().
 			 */
 			if (likely(obj_exts))
-				alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+				alloc_tagging_slab_alloc_hook(obj_exts, s->size);
 		}
-#endif
 	}
 
 	return memcg_slab_post_alloc_hook(s, lru, flags, size, p);

base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
-- 
2.45.2.803.g4e1b14247a-goog

.

From: alexs@kernel.org
To: Vitaly Wool <vitaly.wool@konsulko.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	minchan@kernel.org,
	willy@infradead.org,
	senozhatsky@chromium.org,
	david@redhat.com,
	42.hyeyoo@gmail.com,
	Yosry Ahmed <yosryahmed@google.com>,
	nphamcs@gmail.com
Cc: Alex Shi <alexs@kernel.org>
Subject: [PATCH v2 00/20] mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool
Date: Wed,  3 Jul 2024 12:05:50 +0800
Message-ID: <20240703040613.681396-1-alexs@kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266008 org.kvack.linux-mm:202295
Newsgroups: org.kernel.vger.linux-kernel,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

From: Alex Shi <alexs@kernel.org>

According to Metthew's plan, the page descriptor will be replace by a 8
bytes mem_desc on destination purpose.
https://lore.kernel.org/lkml/YvV1KTyzZ+Jrtj9x@casper.infradead.org/

Here is a implement on zsmalloc to replace page descriptor by 'zpdesc',
which is still overlay on struct page now. but it's a step move forward
above destination.

To name the struct zpdesc instead of zsdesc, since there are still 3
zpools under zswap: zbud, z3fold, zsmalloc for now(z3fold maybe removed
soon), and we could easyly extend it to other zswap.zpool in needs.

For all zswap.zpools, they are all using single page since often used
under memory pressure. So the conversion via folio series helper is
better than page's for compound_head check saving.

For now, all zpools are using some page struct members, like page.flags
for PG_private/PG_locked. and list_head lru, page.mapping for page migration.

This patachset does not increase the descriptor size nor introduce any
functional changes, and it could save about 123Kbytes zsmalloc.o size.

Thanks
Alex

---
v1->v2: 
- Take Yosry and Yoo's suggestion to add more members in zpdesc,
- Rebase on latest mm-unstable commit 31334cf98dbd
---

Alex Shi (8):
  mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool
  mm/zsmalloc: use zpdesc in trylock_zspage/lock_zspage
  mm/zsmalloc: convert create_page_chain() and its users to use zpdesc
  mm/zsmalloc: rename reset_page to reset_zpdesc and use zpdesc in it
  mm/zsmalloc: convert SetZsPageMovable and remove unused funcs
  mm/zsmalloc: introduce __zpdesc_clear_movable
  mm/zsmalloc: introduce __zpdesc_clear_zsmalloc
  mm/zsmalloc: introduce __zpdesc_set_zsmalloc()

Hyeonggon Yoo (12):
  mm/zsmalloc: convert __zs_map_object/__zs_unmap_object to use zpdesc
  mm/zsmalloc: add and use pfn/zpdesc seeking funcs
  mm/zsmalloc: convert obj_malloc() to use zpdesc
  mm/zsmalloc: convert obj_allocated() and related helpers to use zpdesc
  mm/zsmalloc: convert init_zspage() to use zpdesc
  mm/zsmalloc: convert obj_to_page() and zs_free() to use zpdesc
  mm/zsmalloc: add zpdesc_is_isolated/zpdesc_zone helper for
    zs_page_migrate
  mm/zsmalloc: convert __free_zspage() to use zdsesc
  mm/zsmalloc: convert location_to_obj() to take zpdesc
  mm/zsmalloc: convert migrate_zspage() to use zpdesc
  mm/zsmalloc: convert get_zspage() to take zpdesc
  mm/zsmalloc: convert get/set_first_obj_offset() to take zpdesc

 mm/zpdesc.h   | 143 ++++++++++++++++
 mm/zsmalloc.c | 456 +++++++++++++++++++++++++++-----------------------
 2 files changed, 393 insertions(+), 206 deletions(-)
 create mode 100644 mm/zpdesc.h

-- 
2.43.0

.

From: Vlastimil Babka <vbabka@suse.cz>
To: linux-mm@kvack.org,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Kees Cook <keescook@chromium.org>,
	Alice Ryhl <aliceryhl@google.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH v2] slab, rust: extend kmalloc() alignment guarantees to remove Rust padding
Date: Wed,  3 Jul 2024 09:25:21 +0200
Message-ID: <20240703072520.45837-2-vbabka@suse.cz>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266132 org.kvack.linux-mm:202321
Newsgroups: org.kernel.vger.linux-kernel,dev.linux.lists.patches,org.kernel.vger.rust-for-linux,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Slab allocators have been guaranteeing natural alignment for
power-of-two sizes since commit 59bb47985c1d ("mm, sl[aou]b: guarantee
natural alignment for kmalloc(power-of-two)"), while any other sizes are
guaranteed to be aligned only to ARCH_KMALLOC_MINALIGN bytes (although
in practice are aligned more than that in non-debug scenarios).

Rust's allocator API specifies size and alignment per allocation, which
have to satisfy the following rules, per Alice Ryhl [1]:

  1. The alignment is a power of two.
  2. The size is non-zero.
  3. When you round up the size to the next multiple of the alignment,
     then it must not overflow the signed type isize / ssize_t.

In order to map this to kmalloc()'s guarantees, some requested
allocation sizes have to be padded to the next power-of-two size [2].
For example, an allocation of size 96 and alignment of 32 will be padded
to an allocation of size 128, because the existing kmalloc-96 bucket
doesn't guarantee alignent above ARCH_KMALLOC_MINALIGN. Without slab
debugging active, the layout of the kmalloc-96 slabs however naturally
align the objects to 32 bytes, so extending the size to 128 bytes is
wasteful.

To improve the situation we can extend the kmalloc() alignment
guarantees in a way that

1) doesn't change the current slab layout (and thus does not increase
   internal fragmentation) when slab debugging is not active
2) reduces waste in the Rust allocator use case
3) is a superset of the current guarantee for power-of-two sizes.

The extended guarantee is that alignment is at least the largest
power-of-two divisor of the requested size. For power-of-two sizes the
largest divisor is the size itself, but let's keep this case documented
separately for clarity.

For current kmalloc size buckets, it means kmalloc-96 will guarantee
alignment of 32 bytes and kmalloc-196 will guarantee 64 bytes.

This covers the rules 1 and 2 above of Rust's API as long as the size is
a multiple of the alignment. The Rust layer should now only need to
round up the size to the next multiple if it isn't, while enforcing the
rule 3.

Implementation-wise, this changes the alignment calculation in
create_boot_cache(). While at it also do the calulation only for caches
with the SLAB_KMALLOC flag, because the function is also used to create
the initial kmem_cache and kmem_cache_node caches, where no alignment
guarantee is necessary.

In the Rust allocator's krealloc_aligned(), remove the code that padded
sizes to the next power of two (suggested by Alice Ryhl) as it's no
longer necessary with the new guarantees.

Reported-by: Alice Ryhl <aliceryhl@google.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/all/CAH5fLggjrbdUuT-H-5vbQfMazjRDpp2%2Bk3%3DYhPyS17ezEqxwcw@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAH5fLghsZRemYUwVvhk77o6y1foqnCeDzW4WZv6ScEWna2+_jw@mail.gmail.com/ [2]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
---
v2: - add Rust side change as suggested by Alice, also thanks Boqun for fixups
- clarify that the alignment already existed (unless debugging) but was
  not guaranteed, so there's no extra fragmentation in slab
- add r-b, a-b thanks tO Boqun and Roman

If it's fine with Rust folks, I can put this in the slab.git tree.

 Documentation/core-api/memory-allocation.rst |  6 ++++--
 include/linux/slab.h                         |  3 ++-
 mm/slab_common.c                             |  9 +++++----
 rust/kernel/alloc/allocator.rs               | 19 ++++++-------------
 4 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 1c58d883b273..8b84eb4bdae7 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -144,8 +144,10 @@ configuration, but it is a good practice to use `kmalloc` for objects
 smaller than page size.
 
 The address of a chunk allocated with `kmalloc` is aligned to at least
-ARCH_KMALLOC_MINALIGN bytes.  For sizes which are a power of two, the
-alignment is also guaranteed to be at least the respective size.
+ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
+alignment is also guaranteed to be at least the respective size. For other
+sizes, the alignment is guaranteed to be at least the largest power-of-two
+divisor of the size.
 
 Chunks allocated with kmalloc() can be resized with krealloc(). Similarly
 to kmalloc_array(): a helper for resizing arrays is provided in the form of
diff --git a/include/linux/slab.h b/include/linux/slab.h
index ed6bee5ec2b6..640cea6e6323 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -604,7 +604,8 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
  *
  * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
  * bytes. For @size of power of two bytes, the alignment is also guaranteed
- * to be at least to the size.
+ * to be at least to the size. For other sizes, the alignment is guaranteed to
+ * be at least the largest power-of-two divisor of @size.
  *
  * The @flags argument may be one of the GFP flags defined at
  * include/linux/gfp_types.h and described at
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1560a1546bb1..7272ef7bc55f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -617,11 +617,12 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->size = s->object_size = size;
 
 	/*
-	 * For power of two sizes, guarantee natural alignment for kmalloc
-	 * caches, regardless of SL*B debugging options.
+	 * kmalloc caches guarantee alignment of at least the largest
+	 * power-of-two divisor of the size. For power-of-two sizes,
+	 * it is the size itself.
 	 */
-	if (is_power_of_2(size))
-		align = max(align, size);
+	if (flags & SLAB_KMALLOC)
+		align = max(align, 1U << (ffs(size) - 1));
 	s->align = calculate_alignment(flags, align, size);
 
 #ifdef CONFIG_HARDENED_USERCOPY
diff --git a/rust/kernel/alloc/allocator.rs b/rust/kernel/alloc/allocator.rs
index 229642960cd1..e6ea601f38c6 100644
--- a/rust/kernel/alloc/allocator.rs
+++ b/rust/kernel/alloc/allocator.rs
@@ -18,23 +18,16 @@ pub(crate) unsafe fn krealloc_aligned(ptr: *mut u8, new_layout: Layout, flags: F
     // Customized layouts from `Layout::from_size_align()` can have size < align, so pad first.
     let layout = new_layout.pad_to_align();
 
-    let mut size = layout.size();
-
-    if layout.align() > bindings::ARCH_SLAB_MINALIGN {
-        // The alignment requirement exceeds the slab guarantee, thus try to enlarge the size
-        // to use the "power-of-two" size/alignment guarantee (see comments in `kmalloc()` for
-        // more information).
-        //
-        // Note that `layout.size()` (after padding) is guaranteed to be a multiple of
-        // `layout.align()`, so `next_power_of_two` gives enough alignment guarantee.
-        size = size.next_power_of_two();
-    }
+    // Note that `layout.size()` (after padding) is guaranteed to be a multiple of `layout.align()`
+    // which together with the slab guarantees means the `krealloc` will return a properly aligned
+    // object (see comments in `kmalloc()` for more information).
+    let size = layout.size();
 
     // SAFETY:
     // - `ptr` is either null or a pointer returned from a previous `k{re}alloc()` by the
     //   function safety requirement.
-    // - `size` is greater than 0 since it's either a `layout.size()` (which cannot be zero
-    //   according to the function safety requirement) or a result from `next_power_of_two()`.
+    // - `size` is greater than 0 since it's from `layout.size()` (which cannot be zero according
+    //   to the function safety requirement)
     unsafe { bindings::krealloc(ptr as *const core::ffi::c_void, size, flags.0) as *mut u8 }
 }
 
-- 
2.45.2

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 16:10:34 +0800
From: kernel test robot <lkp@intel.com>
To: Piotr Wojtaszczyk <piotr.wojtaszczyk@timesys.com>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>,
	Mark Brown <broonie@kernel.org>
Subject: [linux-next:master 8777/10451] sound/soc/fsl/lpc3xxx-i2s.h:42:30:
 error: implicit declaration of function 'FIELD_PREP'
Message-ID: <202407031601.Hy5RHjFB-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202328
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   0b58e108042b0ed28a71cd7edf5175999955b233
commit: 0959de657a10cc40b2cc41cff9169ab0e0fd4456 [8777/10451] ASoC: fsl: Add i2s and pcm drivers for LPC32xx CPUs
config: loongarch-randconfig-r062-20240703 (https://download.01.org/0day-ci/archive/20240703/202407031601.Hy5RHjFB-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240703/202407031601.Hy5RHjFB-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407031601.Hy5RHjFB-lkp@intel.com/

Note: the linux-next/master HEAD 0b58e108042b0ed28a71cd7edf5175999955b233 builds fine.
      It may have been fixed somewhere.

All errors (new ones prefixed by >>):

   sound/soc/fsl/lpc3xxx-i2s.c: In function '__lpc3xxx_find_clkdiv':
   sound/soc/fsl/lpc3xxx-i2s.c:42:13: warning: variable 'savedbitclkrate' set but not used [-Wunused-but-set-variable]
      42 |         u32 savedbitclkrate, diff, trate, baseclk;
         |             ^~~~~~~~~~~~~~~
   In file included from sound/soc/fsl/lpc3xxx-i2s.c:23:
   sound/soc/fsl/lpc3xxx-i2s.c: In function 'lpc3xxx_i2s_hw_params':
>> sound/soc/fsl/lpc3xxx-i2s.h:42:30: error: implicit declaration of function 'FIELD_PREP' [-Werror=implicit-function-declaration]
      42 | #define LPC3XXX_I2S_WW8      FIELD_PREP(0x3, 0) /* Word width is 8bit */
         |                              ^~~~~~~~~~
   sound/soc/fsl/lpc3xxx-i2s.c:169:24: note: in expansion of macro 'LPC3XXX_I2S_WW8'
     169 |                 tmp |= LPC3XXX_I2S_WW8 | LPC3XXX_I2S_WS_HP(LPC3XXX_I2S_WW8_HP);
         |                        ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/FIELD_PREP +42 sound/soc/fsl/lpc3xxx-i2s.h

    40	
    41	/* i2s_daO i2s_dai register definitions */
  > 42	#define LPC3XXX_I2S_WW8      FIELD_PREP(0x3, 0) /* Word width is 8bit */
    43	#define LPC3XXX_I2S_WW16     FIELD_PREP(0x3, 1) /* Word width is 16bit */
    44	#define LPC3XXX_I2S_WW32     FIELD_PREP(0x3, 3) /* Word width is 32bit */
    45	#define LPC3XXX_I2S_MONO     BIT(2)   /* Mono */
    46	#define LPC3XXX_I2S_STOP     BIT(3)   /* Stop, diables the access to FIFO, mutes the channel */
    47	#define LPC3XXX_I2S_RESET    BIT(4)   /* Reset the channel */
    48	#define LPC3XXX_I2S_WS_SEL   BIT(5)   /* Channel Master(0) or slave(1) mode select */
    49	#define LPC3XXX_I2S_WS_HP(s) FIELD_PREP(0x7FC0, s) /* Word select half period - 1 */
    50	#define LPC3XXX_I2S_MUTE     BIT(15)  /* Mute the channel, Transmit channel only */
    51	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 17:07:11 +0800
From: kernel test robot <lkp@intel.com>
To: Marek =?iso-8859-1?Q?Beh=FAn?= <kabel@kernel.org>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>,
	Arnd Bergmann <arnd@arndb.de>
Subject: [linux-next:master 9442/10451]
 drivers/platform/cznic/turris-omnia-mcu-gpio.c:1027:10: error: no member
 named 'of_gpio_n_cells' in 'struct gpio_chip'
Message-ID: <202407031646.trNSwajF-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202335
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   0b58e108042b0ed28a71cd7edf5175999955b233
commit: dfa556e45ae9ecc199e598222debc8f1883a7cce [9442/10451] platform: cznic: turris-omnia-mcu: Add support for MCU connected GPIOs
config: s390-randconfig-r054-20240703 (https://download.01.org/0day-ci/archive/20240703/202407031646.trNSwajF-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 326ba38a991250a8587a399a260b0f7af2c9166a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240703/202407031646.trNSwajF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407031646.trNSwajF-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/platform/cznic/turris-omnia-mcu-gpio.c:13:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:173:
   In file included from arch/s390/include/asm/mmu_context.h:11:
   In file included from arch/s390/include/asm/pgalloc.h:18:
   In file included from include/linux/mm.h:2253:
   include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     514 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from drivers/platform/cznic/turris-omnia-mcu-gpio.c:16:
   In file included from include/linux/gpio/driver.h:8:
   In file included from include/linux/irqchip/chained_irq.h:10:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/s390/include/asm/io.h:93:
   include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     548 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     561 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
         |                                                           ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
     102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
         |                                                      ^
   In file included from drivers/platform/cznic/turris-omnia-mcu-gpio.c:16:
   In file included from include/linux/gpio/driver.h:8:
   In file included from include/linux/irqchip/chained_irq.h:10:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/s390/include/asm/io.h:93:
   include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     574 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
         |                                                           ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
     115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
         |                                                      ^
   In file included from drivers/platform/cznic/turris-omnia-mcu-gpio.c:16:
   In file included from include/linux/gpio/driver.h:8:
   In file included from include/linux/irqchip/chained_irq.h:10:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/s390/include/asm/io.h:93:
   include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     585 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     595 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     605 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     693 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     701 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     709 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     718 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     727 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     736 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
>> drivers/platform/cznic/turris-omnia-mcu-gpio.c:1027:10: error: no member named 'of_gpio_n_cells' in 'struct gpio_chip'
    1027 |         mcu->gc.of_gpio_n_cells = 3;
         |         ~~~~~~~ ^
>> drivers/platform/cznic/turris-omnia-mcu-gpio.c:1028:10: error: no member named 'of_xlate' in 'struct gpio_chip'
    1028 |         mcu->gc.of_xlate = omnia_gpio_of_xlate;
         |         ~~~~~~~ ^
   13 warnings and 2 errors generated.


vim +1027 drivers/platform/cznic/turris-omnia-mcu-gpio.c

   999	
  1000	int omnia_mcu_register_gpiochip(struct omnia_mcu *mcu)
  1001	{
  1002		bool new_api = mcu->features & OMNIA_FEAT_NEW_INT_API;
  1003		struct device *dev = &mcu->client->dev;
  1004		unsigned long irqflags;
  1005		int err;
  1006	
  1007		err = devm_mutex_init(dev, &mcu->lock);
  1008		if (err)
  1009			return err;
  1010	
  1011		mcu->gc.request = omnia_gpio_request;
  1012		mcu->gc.get_direction = omnia_gpio_get_direction;
  1013		mcu->gc.direction_input = omnia_gpio_direction_input;
  1014		mcu->gc.direction_output = omnia_gpio_direction_output;
  1015		mcu->gc.get = omnia_gpio_get;
  1016		mcu->gc.get_multiple = omnia_gpio_get_multiple;
  1017		mcu->gc.set = omnia_gpio_set;
  1018		mcu->gc.set_multiple = omnia_gpio_set_multiple;
  1019		mcu->gc.init_valid_mask = omnia_gpio_init_valid_mask;
  1020		mcu->gc.can_sleep = true;
  1021		mcu->gc.names = omnia_mcu_gpio_templates;
  1022		mcu->gc.base = -1;
  1023		mcu->gc.ngpio = ARRAY_SIZE(omnia_gpios);
  1024		mcu->gc.label = "Turris Omnia MCU GPIOs";
  1025		mcu->gc.parent = dev;
  1026		mcu->gc.owner = THIS_MODULE;
> 1027		mcu->gc.of_gpio_n_cells = 3;
> 1028		mcu->gc.of_xlate = omnia_gpio_of_xlate;

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 17:17:49 +0800
From: kernel test robot <lkp@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: [linux-next:master 9885/10451] kernel/rcu/tree_stall.h:797:undefined
 reference to `csd_lock_is_stuck'
Message-ID: <202407031722.nBIh2u7x-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202337
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   0b58e108042b0ed28a71cd7edf5175999955b233
commit: 3be88389f46263f166973e80e528dcc9268e24cb [9885/10451] rcu: Summarize RCU CPU stall warnings during CSD-lock stalls
config: x86_64-randconfig-a014-20211016 (https://download.01.org/0day-ci/archive/20240703/202407031722.nBIh2u7x-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240703/202407031722.nBIh2u7x-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407031722.nBIh2u7x-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `check_cpu_stall':
>> kernel/rcu/tree_stall.h:797:(.text+0x273b89): undefined reference to `csd_lock_is_stuck'


vim +797 kernel/rcu/tree_stall.h

   724	
   725	static void check_cpu_stall(struct rcu_data *rdp)
   726	{
   727		bool self_detected;
   728		unsigned long gs1;
   729		unsigned long gs2;
   730		unsigned long gps;
   731		unsigned long j;
   732		unsigned long jn;
   733		unsigned long js;
   734		struct rcu_node *rnp;
   735	
   736		lockdep_assert_irqs_disabled();
   737		if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) ||
   738		    !rcu_gp_in_progress())
   739			return;
   740		rcu_stall_kick_kthreads();
   741	
   742		/*
   743		 * Check if it was requested (via rcu_cpu_stall_reset()) that the FQS
   744		 * loop has to set jiffies to ensure a non-stale jiffies value. This
   745		 * is required to have good jiffies value after coming out of long
   746		 * breaks of jiffies updates. Not doing so can cause false positives.
   747		 */
   748		if (READ_ONCE(rcu_state.nr_fqs_jiffies_stall) > 0)
   749			return;
   750	
   751		j = jiffies;
   752	
   753		/*
   754		 * Lots of memory barriers to reject false positives.
   755		 *
   756		 * The idea is to pick up rcu_state.gp_seq, then
   757		 * rcu_state.jiffies_stall, then rcu_state.gp_start, and finally
   758		 * another copy of rcu_state.gp_seq.  These values are updated in
   759		 * the opposite order with memory barriers (or equivalent) during
   760		 * grace-period initialization and cleanup.  Now, a false positive
   761		 * can occur if we get an new value of rcu_state.gp_start and a old
   762		 * value of rcu_state.jiffies_stall.  But given the memory barriers,
   763		 * the only way that this can happen is if one grace period ends
   764		 * and another starts between these two fetches.  This is detected
   765		 * by comparing the second fetch of rcu_state.gp_seq with the
   766		 * previous fetch from rcu_state.gp_seq.
   767		 *
   768		 * Given this check, comparisons of jiffies, rcu_state.jiffies_stall,
   769		 * and rcu_state.gp_start suffice to forestall false positives.
   770		 */
   771		gs1 = READ_ONCE(rcu_state.gp_seq);
   772		smp_rmb(); /* Pick up ->gp_seq first... */
   773		js = READ_ONCE(rcu_state.jiffies_stall);
   774		smp_rmb(); /* ...then ->jiffies_stall before the rest... */
   775		gps = READ_ONCE(rcu_state.gp_start);
   776		smp_rmb(); /* ...and finally ->gp_start before ->gp_seq again. */
   777		gs2 = READ_ONCE(rcu_state.gp_seq);
   778		if (gs1 != gs2 ||
   779		    ULONG_CMP_LT(j, js) ||
   780		    ULONG_CMP_GE(gps, js))
   781			return; /* No stall or GP completed since entering function. */
   782		rnp = rdp->mynode;
   783		jn = jiffies + ULONG_MAX / 2;
   784		self_detected = READ_ONCE(rnp->qsmask) & rdp->grpmask;
   785		if (rcu_gp_in_progress() &&
   786		    (self_detected || ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) &&
   787		    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
   788			/*
   789			 * If a virtual machine is stopped by the host it can look to
   790			 * the watchdog like an RCU stall. Check to see if the host
   791			 * stopped the vm.
   792			 */
   793			if (kvm_check_and_clear_guest_paused())
   794				return;
   795	
   796			rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_NORM, (void *)j - gps);
 > 797			if (READ_ONCE(csd_lock_suppress_rcu_stall) && csd_lock_is_stuck()) {
   798				pr_err("INFO: %s detected stall, but suppressed full report due to a stuck CSD-lock.\n", rcu_state.name);
   799			} else if (self_detected) {
   800				/* We haven't checked in, so go dump stack. */
   801				print_cpu_stall(gps);
   802			} else {
   803				/* They had a few time units to dump stack, so complain. */
   804				print_other_cpu_stall(gs2, gps);
   805			}
   806	
   807			if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
   808				rcu_ftrace_dump(DUMP_ALL);
   809	
   810			if (READ_ONCE(rcu_state.jiffies_stall) == jn) {
   811				jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
   812				WRITE_ONCE(rcu_state.jiffies_stall, jn);
   813			}
   814		}
   815	}
   816	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Return-Path: <owner-linux-mm@kvack.org>
Message-ID: <5a083f16-cdbc-4d60-8890-58de8d80eaa1@suse.com>
Date: Wed, 3 Jul 2024 20:03:04 +0930
MIME-Version: 1.0
Content-Language: en-US
To: Linux Memory Management List <linux-mm@kvack.org>,
 "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
From: Qu Wenruo <wqu@suse.com>
Subject: Soft lockup for cgroup charge?
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202348
Newsgroups: org.kvack.linux-mm,org.kernel.vger.linux-btrfs
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hi,

Recently I'm hitting the following soft lockup related to cgroup charge 
on aarch64:

  watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [btrfs:698546]
  Modules linked in: dm_log_writes dm_flakey nls_ascii nls_cp437 vfat 
crct10dif_ce polyval_ce polyval_generic ghash_ce rtc_efi fat processor 
btrfs xor xor_neon raid6_pq zstd_compress fuse loop nfnetlink 
qemu_fw_cfg ext4 mbcache jbd2 dm_mod xhci_pci virtio_net 
xhci_pci_renesas net_failover xhci_hcd virtio_balloon virtio_scsi 
failover dimlib virtio_blk virtio_console virtio_mmio
  irq event stamp: 47291484
  hardirqs last  enabled at (47291483): [<ffffabe6d1a5d294>] 
try_charge_memcg+0x3ac/0x780
  hardirqs last disabled at (47291484): [<ffffabe6d2401244>] 
el1_interrupt+0x24/0x80
  softirqs last  enabled at (47282714): [<ffffabe6d168e7a4>] 
handle_softirqs+0x2bc/0x310
  softirqs last disabled at (47282709): [<ffffabe6d16301e4>] 
__do_softirq+0x1c/0x28
  CPU: 3 PID: 698546 Comm: btrfs Not tainted 6.10.0-rc6-custom+ #34
  Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : try_charge_memcg+0x154/0x780
  lr : try_charge_memcg+0x3ac/0x780
  sp : ffff800089b83430
  x29: ffff800089b834a0 x28: 0000000000000002 x27: ffffabe6d2b515e8
  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000008c40
  x23: ffffabe6d2b515e8 x22: 0000000000000000 x21: 0000000000000040
  x20: ffff4854c6b32000 x19: 0000000000000004 x18: 0000000000000000
  x17: 0000000000000000 x16: ffffabe6d19474a8 x15: ffff4854d24b6f88
  x14: 0000000000000000 x13: 0000000000000000 x12: ffff4854ff1cdfd0
  x11: ffffabe6d4330370 x10: ffffabe6d46442ec x9 : ffffabe6d2b3f6e4
  x8 : ffff800089b83340 x7 : ffff800089b84000 x6 : ffff800089b80000
  x5 : 0000000000000000 x4 : 0000000000000006 x3 : 000000ffffffffff
  x2 : 0000000000000001 x1 : ffffabe6d2b3f6e0 x0 : 0000000002d19c5b
  Call trace:
   try_charge_memcg+0x154/0x780
   __mem_cgroup_charge+0x5c/0xc0
   filemap_add_folio+0x5c/0x118
   attach_eb_folio_to_filemap+0x84/0x4e0 [btrfs]
   alloc_extent_buffer+0x1d4/0x730 [btrfs]
   btrfs_find_create_tree_block+0x20/0x48 [btrfs]
   btrfs_readahead_tree_block+0x4c/0xd8 [btrfs]
   relocate_tree_blocks+0x1d8/0x3a0 [btrfs]
   relocate_block_group+0x37c/0x508 [btrfs]
   btrfs_relocate_block_group+0x274/0x458 [btrfs]
   btrfs_relocate_chunk+0x54/0x1b8 [btrfs]
   __btrfs_balance+0x2dc/0x4e0 [btrfs]
   btrfs_balance+0x3b4/0x730 [btrfs]
   btrfs_ioctl_balance+0x12c/0x300 [btrfs]
   btrfs_ioctl+0xf90/0x1380 [btrfs]
   __arm64_sys_ioctl+0xb4/0x100
   invoke_syscall+0x74/0x100
   el0_svc_common.constprop.0+0x48/0xf0
   do_el0_svc+0x24/0x38
   el0_svc+0x54/0x1c0
   el0t_64_sync_handler+0x120/0x130
   el0t_64_sync+0x194/0x198

I can hit that somewhat reliably (around 2/3)

The code is modified btrfs code 
(https://github.com/adam900710/linux/tree/larger_meta_folio), which does 
something like:

- Allocate an order 2 folio using GFP_NOFS | __GFP_NOFAIL
- Attach that order 2 folio to filemap using GFP_NOFS | __GFP_NOFAIL
   With extra handling for EEXIST.

Meanwhile for the original btrfs code, the only difference is in the 
folio order (order 2 vs 0).

Considering the gfp flag is the same and only order is different, I'm 
wondering if it's memory cgroup doing something weird, or it's not the 
correct way to add a higher order folio to page cache?

Thanks,
Qu

.

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: linux-mm@kvack.org,
	cgroups@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>
Cc: linux-kernel@vger.kernel.org,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Subject: [PATCH] mm/page_counter: Move calculating protection values to page_counter
Date: Wed,  3 Jul 2024 13:25:10 +0200
Message-ID: <20240703112510.36424-1-maarten.lankhorst@linux.intel.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266466 org.kvack.linux-mm:202352
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.cgroups,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

It's a lot of math, and there is nothing memcontrol specific about it.
This makes it easier to use inside of the drm cgroup controller.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 include/linux/page_counter.h |   4 +
 mm/memcontrol.c              | 154 +------------------------------
 mm/page_counter.c            | 173 +++++++++++++++++++++++++++++++++++
 3 files changed, 180 insertions(+), 151 deletions(-)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 8cd858d912c4..904c52f97284 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -81,4 +81,8 @@ static inline void page_counter_reset_watermark(struct page_counter *counter)
 	counter->watermark = page_counter_read(counter);
 }
 
+void page_counter_calculate_protection(struct page_counter *root,
+				       struct page_counter *counter,
+				       bool recursive_protection);
+
 #endif /* _LINUX_PAGE_COUNTER_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 71fe2a95b8bd..9454e1a3120e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7316,122 +7316,6 @@ struct cgroup_subsys memory_cgrp_subsys = {
 	.early_init = 0,
 };
 
-/*
- * This function calculates an individual cgroup's effective
- * protection which is derived from its own memory.min/low, its
- * parent's and siblings' settings, as well as the actual memory
- * distribution in the tree.
- *
- * The following rules apply to the effective protection values:
- *
- * 1. At the first level of reclaim, effective protection is equal to
- *    the declared protection in memory.min and memory.low.
- *
- * 2. To enable safe delegation of the protection configuration, at
- *    subsequent levels the effective protection is capped to the
- *    parent's effective protection.
- *
- * 3. To make complex and dynamic subtrees easier to configure, the
- *    user is allowed to overcommit the declared protection at a given
- *    level. If that is the case, the parent's effective protection is
- *    distributed to the children in proportion to how much protection
- *    they have declared and how much of it they are utilizing.
- *
- *    This makes distribution proportional, but also work-conserving:
- *    if one cgroup claims much more protection than it uses memory,
- *    the unused remainder is available to its siblings.
- *
- * 4. Conversely, when the declared protection is undercommitted at a
- *    given level, the distribution of the larger parental protection
- *    budget is NOT proportional. A cgroup's protection from a sibling
- *    is capped to its own memory.min/low setting.
- *
- * 5. However, to allow protecting recursive subtrees from each other
- *    without having to declare each individual cgroup's fixed share
- *    of the ancestor's claim to protection, any unutilized -
- *    "floating" - protection from up the tree is distributed in
- *    proportion to each cgroup's *usage*. This makes the protection
- *    neutral wrt sibling cgroups and lets them compete freely over
- *    the shared parental protection budget, but it protects the
- *    subtree as a whole from neighboring subtrees.
- *
- * Note that 4. and 5. are not in conflict: 4. is about protecting
- * against immediate siblings whereas 5. is about protecting against
- * neighboring subtrees.
- */
-static unsigned long effective_protection(unsigned long usage,
-					  unsigned long parent_usage,
-					  unsigned long setting,
-					  unsigned long parent_effective,
-					  unsigned long siblings_protected)
-{
-	unsigned long protected;
-	unsigned long ep;
-
-	protected = min(usage, setting);
-	/*
-	 * If all cgroups at this level combined claim and use more
-	 * protection than what the parent affords them, distribute
-	 * shares in proportion to utilization.
-	 *
-	 * We are using actual utilization rather than the statically
-	 * claimed protection in order to be work-conserving: claimed
-	 * but unused protection is available to siblings that would
-	 * otherwise get a smaller chunk than what they claimed.
-	 */
-	if (siblings_protected > parent_effective)
-		return protected * parent_effective / siblings_protected;
-
-	/*
-	 * Ok, utilized protection of all children is within what the
-	 * parent affords them, so we know whatever this child claims
-	 * and utilizes is effectively protected.
-	 *
-	 * If there is unprotected usage beyond this value, reclaim
-	 * will apply pressure in proportion to that amount.
-	 *
-	 * If there is unutilized protection, the cgroup will be fully
-	 * shielded from reclaim, but we do return a smaller value for
-	 * protection than what the group could enjoy in theory. This
-	 * is okay. With the overcommit distribution above, effective
-	 * protection is always dependent on how memory is actually
-	 * consumed among the siblings anyway.
-	 */
-	ep = protected;
-
-	/*
-	 * If the children aren't claiming (all of) the protection
-	 * afforded to them by the parent, distribute the remainder in
-	 * proportion to the (unprotected) memory of each cgroup. That
-	 * way, cgroups that aren't explicitly prioritized wrt each
-	 * other compete freely over the allowance, but they are
-	 * collectively protected from neighboring trees.
-	 *
-	 * We're using unprotected memory for the weight so that if
-	 * some cgroups DO claim explicit protection, we don't protect
-	 * the same bytes twice.
-	 *
-	 * Check both usage and parent_usage against the respective
-	 * protected values. One should imply the other, but they
-	 * aren't read atomically - make sure the division is sane.
-	 */
-	if (!(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT))
-		return ep;
-	if (parent_effective > siblings_protected &&
-	    parent_usage > siblings_protected &&
-	    usage > protected) {
-		unsigned long unclaimed;
-
-		unclaimed = parent_effective - siblings_protected;
-		unclaimed *= usage - protected;
-		unclaimed /= parent_usage - siblings_protected;
-
-		ep += unclaimed;
-	}
-
-	return ep;
-}
-
 /**
  * mem_cgroup_calculate_protection - check if memory consumption is in the normal range
  * @root: the top ancestor of the sub-tree being checked
@@ -7443,8 +7327,8 @@ static unsigned long effective_protection(unsigned long usage,
 void mem_cgroup_calculate_protection(struct mem_cgroup *root,
 				     struct mem_cgroup *memcg)
 {
-	unsigned long usage, parent_usage;
-	struct mem_cgroup *parent;
+	bool recursive_protection =
+		cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT;
 
 	if (mem_cgroup_disabled())
 		return;
@@ -7452,39 +7336,7 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
 	if (!root)
 		root = root_mem_cgroup;
 
-	/*
-	 * Effective values of the reclaim targets are ignored so they
-	 * can be stale. Have a look at mem_cgroup_protection for more
-	 * details.
-	 * TODO: calculation should be more robust so that we do not need
-	 * that special casing.
-	 */
-	if (memcg == root)
-		return;
-
-	usage = page_counter_read(&memcg->memory);
-	if (!usage)
-		return;
-
-	parent = parent_mem_cgroup(memcg);
-
-	if (parent == root) {
-		memcg->memory.emin = READ_ONCE(memcg->memory.min);
-		memcg->memory.elow = READ_ONCE(memcg->memory.low);
-		return;
-	}
-
-	parent_usage = page_counter_read(&parent->memory);
-
-	WRITE_ONCE(memcg->memory.emin, effective_protection(usage, parent_usage,
-			READ_ONCE(memcg->memory.min),
-			READ_ONCE(parent->memory.emin),
-			atomic_long_read(&parent->memory.children_min_usage)));
-
-	WRITE_ONCE(memcg->memory.elow, effective_protection(usage, parent_usage,
-			READ_ONCE(memcg->memory.low),
-			READ_ONCE(parent->memory.elow),
-			atomic_long_read(&parent->memory.children_low_usage)));
+	page_counter_calculate_protection(&root->memory, &memcg->memory, recursive_protection);
 }
 
 static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
diff --git a/mm/page_counter.c b/mm/page_counter.c
index db20d6452b71..8ee49cbf71be 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -262,3 +262,176 @@ int page_counter_memparse(const char *buf, const char *max,
 
 	return 0;
 }
+
+
+/*
+ * This function calculates an individual page counter's effective
+ * protection which is derived from its own memory.min/low, its
+ * parent's and siblings' settings, as well as the actual memory
+ * distribution in the tree.
+ *
+ * The following rules apply to the effective protection values:
+ *
+ * 1. At the first level of reclaim, effective protection is equal to
+ *    the declared protection in memory.min and memory.low.
+ *
+ * 2. To enable safe delegation of the protection configuration, at
+ *    subsequent levels the effective protection is capped to the
+ *    parent's effective protection.
+ *
+ * 3. To make complex and dynamic subtrees easier to configure, the
+ *    user is allowed to overcommit the declared protection at a given
+ *    level. If that is the case, the parent's effective protection is
+ *    distributed to the children in proportion to how much protection
+ *    they have declared and how much of it they are utilizing.
+ *
+ *    This makes distribution proportional, but also work-conserving:
+ *    if one counter claims much more protection than it uses memory,
+ *    the unused remainder is available to its siblings.
+ *
+ * 4. Conversely, when the declared protection is undercommitted at a
+ *    given level, the distribution of the larger parental protection
+ *    budget is NOT proportional. A counter's protection from a sibling
+ *    is capped to its own memory.min/low setting.
+ *
+ * 5. However, to allow protecting recursive subtrees from each other
+ *    without having to declare each individual counter's fixed share
+ *    of the ancestor's claim to protection, any unutilized -
+ *    "floating" - protection from up the tree is distributed in
+ *    proportion to each counter's *usage*. This makes the protection
+ *    neutral wrt sibling cgroups and lets them compete freely over
+ *    the shared parental protection budget, but it protects the
+ *    subtree as a whole from neighboring subtrees.
+ *
+ * Note that 4. and 5. are not in conflict: 4. is about protecting
+ * against immediate siblings whereas 5. is about protecting against
+ * neighboring subtrees.
+ */
+static unsigned long effective_protection(unsigned long usage,
+					  unsigned long parent_usage,
+					  unsigned long setting,
+					  unsigned long parent_effective,
+					  unsigned long siblings_protected,
+					  bool recursive_protection)
+{
+	unsigned long protected;
+	unsigned long ep;
+
+	protected = min(usage, setting);
+	/*
+	 * If all cgroups at this level combined claim and use more
+	 * protection than what the parent affords them, distribute
+	 * shares in proportion to utilization.
+	 *
+	 * We are using actual utilization rather than the statically
+	 * claimed protection in order to be work-conserving: claimed
+	 * but unused protection is available to siblings that would
+	 * otherwise get a smaller chunk than what they claimed.
+	 */
+	if (siblings_protected > parent_effective)
+		return protected * parent_effective / siblings_protected;
+
+	/*
+	 * Ok, utilized protection of all children is within what the
+	 * parent affords them, so we know whatever this child claims
+	 * and utilizes is effectively protected.
+	 *
+	 * If there is unprotected usage beyond this value, reclaim
+	 * will apply pressure in proportion to that amount.
+	 *
+	 * If there is unutilized protection, the cgroup will be fully
+	 * shielded from reclaim, but we do return a smaller value for
+	 * protection than what the group could enjoy in theory. This
+	 * is okay. With the overcommit distribution above, effective
+	 * protection is always dependent on how memory is actually
+	 * consumed among the siblings anyway.
+	 */
+	ep = protected;
+
+	/*
+	 * If the children aren't claiming (all of) the protection
+	 * afforded to them by the parent, distribute the remainder in
+	 * proportion to the (unprotected) memory of each cgroup. That
+	 * way, cgroups that aren't explicitly prioritized wrt each
+	 * other compete freely over the allowance, but they are
+	 * collectively protected from neighboring trees.
+	 *
+	 * We're using unprotected memory for the weight so that if
+	 * some cgroups DO claim explicit protection, we don't protect
+	 * the same bytes twice.
+	 *
+	 * Check both usage and parent_usage against the respective
+	 * protected values. One should imply the other, but they
+	 * aren't read atomically - make sure the division is sane.
+	 */
+	if (!recursive_protection)
+		return ep;
+
+	if (parent_effective > siblings_protected &&
+	    parent_usage > siblings_protected &&
+	    usage > protected) {
+		unsigned long unclaimed;
+
+		unclaimed = parent_effective - siblings_protected;
+		unclaimed *= usage - protected;
+		unclaimed /= parent_usage - siblings_protected;
+
+		ep += unclaimed;
+	}
+
+	return ep;
+}
+
+
+/**
+ * page_counter_calculate_protection - check if memory consumption is in the normal range
+ * @root: the top ancestor of the sub-tree being checked
+ * @memcg: the memory cgroup to check
+ * @recursive_protection: Whether to use memory_recursiveprot behavior.
+ *
+ * Calculates elow/emin thresholds for given page_counter.
+ *
+ * WARNING: This function is not stateless! It can only be used as part
+ *          of a top-down tree iteration, not for isolated queries.
+ */
+void page_counter_calculate_protection(struct page_counter *root,
+				       struct page_counter *counter,
+				       bool recursive_protection)
+{
+	unsigned long usage, parent_usage;
+	struct page_counter *parent = counter->parent;
+
+	/*
+	 * Effective values of the reclaim targets are ignored so they
+	 * can be stale. Have a look at mem_cgroup_protection for more
+	 * details.
+	 * TODO: calculation should be more robust so that we do not need
+	 * that special casing.
+	 */
+	if (root == counter)
+		return;
+
+	usage = page_counter_read(counter);
+	if (!usage)
+		return;
+
+	if (parent == root) {
+		counter->emin = READ_ONCE(counter->min);
+		counter->elow = READ_ONCE(counter->low);
+		return;
+	}
+
+	parent_usage = page_counter_read(parent);
+
+	WRITE_ONCE(counter->emin, effective_protection(usage, parent_usage,
+			READ_ONCE(counter->min),
+			READ_ONCE(parent->emin),
+			atomic_long_read(&parent->children_min_usage),
+			recursive_protection));
+
+	WRITE_ONCE(counter->elow, effective_protection(usage, parent_usage,
+			READ_ONCE(counter->low),
+			READ_ONCE(parent->elow),
+			atomic_long_read(&parent->children_low_usage),
+			recursive_protection));
+}
-- 
2.45.2

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 19:17:26 +0800
From: kernel test robot <lkp@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: [linux-next:master 9887/10451] kernel/rcu/tree_exp.h:556:undefined
 reference to `csd_lock_is_stuck'
Message-ID: <202407031959.r1UufFHc-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202353
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   0b58e108042b0ed28a71cd7edf5175999955b233
commit: 3d2660b7a83bf2d9ff0abb7485478d533b269b6d [9887/10451] rcu: Summarize expedited RCU CPU stall warnings during CSD-lock stalls
config: x86_64-randconfig-a014-20211016 (https://download.01.org/0day-ci/archive/20240703/202407031959.r1UufFHc-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240703/202407031959.r1UufFHc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407031959.r1UufFHc-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `synchronize_rcu_expedited_stall':
>> kernel/rcu/tree_exp.h:556:(.text+0x26ed6c): undefined reference to `csd_lock_is_stuck'
   ld: vmlinux.o: in function `check_cpu_stall':
   kernel/rcu/tree_stall.h:797:(.text+0x273c24): undefined reference to `csd_lock_is_stuck'


vim +556 kernel/rcu/tree_exp.h

   544	
   545	/*
   546	 * Print out an expedited RCU CPU stall warning message.
   547	 */
   548	static void synchronize_rcu_expedited_stall(unsigned long jiffies_start, unsigned long j)
   549	{
   550		int cpu;
   551		unsigned long mask;
   552		int ndetected;
   553		struct rcu_node *rnp;
   554		struct rcu_node *rnp_root = rcu_get_root();
   555	
 > 556		if (READ_ONCE(csd_lock_suppress_rcu_stall) && csd_lock_is_stuck()) {
   557			pr_err("INFO: %s detected expedited stalls, but suppressed full report due to a stuck CSD-lock.\n", rcu_state.name);
   558			return;
   559		}
   560		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", rcu_state.name);
   561		ndetected = 0;
   562		rcu_for_each_leaf_node(rnp) {
   563			ndetected += rcu_print_task_exp_stall(rnp);
   564			for_each_leaf_node_possible_cpu(rnp, cpu) {
   565				struct rcu_data *rdp;
   566	
   567				mask = leaf_node_cpu_bit(rnp, cpu);
   568				if (!(READ_ONCE(rnp->expmask) & mask))
   569					continue;
   570				ndetected++;
   571				rdp = per_cpu_ptr(&rcu_data, cpu);
   572				pr_cont(" %d-%c%c%c%c", cpu,
   573					"O."[!!cpu_online(cpu)],
   574					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
   575					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
   576					"D."[!!data_race(rdp->cpu_no_qs.b.exp)]);
   577			}
   578		}
   579		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
   580			j - jiffies_start, rcu_state.expedited_sequence, data_race(rnp_root->expmask),
   581			".T"[!!data_race(rnp_root->exp_tasks)]);
   582		if (ndetected) {
   583			pr_err("blocking rcu_node structures (internal RCU debug):");
   584			rcu_for_each_node_breadth_first(rnp) {
   585				if (rnp == rnp_root)
   586					continue; /* printed unconditionally */
   587				if (sync_rcu_exp_done_unlocked(rnp))
   588					continue;
   589				pr_cont(" l=%u:%d-%d:%#lx/%c",
   590					rnp->level, rnp->grplo, rnp->grphi, data_race(rnp->expmask),
   591					".T"[!!data_race(rnp->exp_tasks)]);
   592			}
   593			pr_cont("\n");
   594		}
   595		rcu_for_each_leaf_node(rnp) {
   596			for_each_leaf_node_possible_cpu(rnp, cpu) {
   597				mask = leaf_node_cpu_bit(rnp, cpu);
   598				if (!(READ_ONCE(rnp->expmask) & mask))
   599					continue;
   600				preempt_disable(); // For smp_processor_id() in dump_cpu_task().
   601				dump_cpu_task(cpu);
   602				preempt_enable();
   603			}
   604			rcu_exp_print_detail_task_stall_rnp(rnp);
   605		}
   606	}
   607	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 19:17:26 +0800
From: kernel test robot <lkp@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: [linux-next:master 10381/10451] kernel/rcu/rcu.h:138:undefined
 reference to `csd_lock_is_stuck'
Message-ID: <202407031919.QUBn1G5D-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202354
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   0b58e108042b0ed28a71cd7edf5175999955b233
commit: eb52b064da252ef2ecc1fd112ebd3f687f9affd5 [10381/10451] Merge branch 'non-rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
config: riscv-randconfig-r133-20240211 (https://download.01.org/0day-ci/archive/20240703/202407031919.QUBn1G5D-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240703/202407031919.QUBn1G5D-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407031919.QUBn1G5D-lkp@intel.com/

All errors (new ones prefixed by >>):

   riscv64-linux-ld: kernel/rcu/tree.o: in function `sync_rcu_exp_done':
   kernel/rcu/tree_exp.h:155:(.text+0xd0c): undefined reference to `csd_lock_is_stuck'
   riscv64-linux-ld: kernel/rcu/tree.o: in function `rcu_seq_current':
>> kernel/rcu/rcu.h:138:(.text+0x3222): undefined reference to `csd_lock_is_stuck'


vim +138 kernel/rcu/rcu.h

2e8c28c2dd96c6 Paul E. McKenney 2017-02-20  134  
8660b7d8a54522 Paul E. McKenney 2017-03-13  135  /* Return the current value the update side's sequence number, no ordering. */
8660b7d8a54522 Paul E. McKenney 2017-03-13  136  static inline unsigned long rcu_seq_current(unsigned long *sp)
8660b7d8a54522 Paul E. McKenney 2017-03-13  137  {
8660b7d8a54522 Paul E. McKenney 2017-03-13 @138  	return READ_ONCE(*sp);
8660b7d8a54522 Paul E. McKenney 2017-03-13  139  }
8660b7d8a54522 Paul E. McKenney 2017-03-13  140  

:::::: The code at line 138 was first introduced by commit
:::::: 8660b7d8a545227fd9ee80508aa82528ea9947d7 srcu: Use rcu_segcblist to track SRCU callbacks

:::::: TO: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
:::::: CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, "Liam R . Howlett" <Liam.Howlett@oracle.com>,
        Vlastimil Babka <vbabka@suse.cz>, Matthew Wilcox <willy@infradead.org>,
        Alexander Viro <viro@zeniv.linux.org.uk>,
        Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
        Eric Biederman <ebiederm@xmission.com>, Kees Cook <kees@kernel.org>,
        Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH 0/7] Make core VMA operations internal and testable
Date: Wed,  3 Jul 2024 12:57:31 +0100
Message-ID: <cover.1720006125.git.lorenzo.stoakes@oracle.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266494 org.kvack.linux-mm:202355
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.linux-fsdevel,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

There are a number of "core" VMA manipulation functions implemented in
mm/mmap.c, notably those concerning VMA merging, splitting, modifying,
expanding and shrinking, which logically don't belong there.

More importantly this functionality represents an internal implementation
detail of memory management and should not be exposed outside of mm/
itself.

This patch series isolates core VMA manipulation functionality into its own
file, mm/vma.c, and provides an API to the rest of the mm code in mm/vma.h.

Importantly, it also carefully implements mm/vma_internal.h, which
specifies which headers need to be imported by vma.c, leading to the very
useful property that vma.c depends only on mm/vma.h and mm/vma_internal.h.

This means we can then re-implement vma_internal.h in userland, adding
shims for kernel mechanisms as required, allowing us to unit test internal
VMA functionality.

This testing is useful as opposed to an e.g. kunit implementation as this
way we can avoid all external kernel side-effects while testing, run tests
VERY quickly, and iterate on and debug problems quickly.

Excitingly this opens the door to, in the future, recreating precise
problems observed in production in userland and very quickly debugging
problems that might otherwise be very difficult to reproduce.

This patch series takes advantage of existing shim logic and full userland
maple tree support contained in tools/testing/radix-tree/ and
tools/include/linux/, separating out shared components of the radix tree
implementation to provide this testing.

Kernel functionality is stubbed and shimmed as needed in tools/testing/vma/
which contains a fully functional userland vma_internal.h file and which
imports mm/vma.c and mm/vma.h to be directly tested from userland.

A simple, skeleton testing implementation is provided in
tools/testing/vma/vma.c as a proof-of-concept, asserting that simple VMA
merge, modify (testing split), expand and shrink functionality work
correctly.

v1:
* Fix test_simple_modify() to specify correct prev.
* Improve vma test Makefile so it picks up dependency changes correctly.
* Rename relocate_vma() to relocate_vma_down().
* Remove shift_arg_pages() and invoked relocate_vma_down() directly from
  setup_arg_pages().
* MAINTAINERS fixups.

RFC v2:
* Reword commit messages.
* Replace vma_expand() / vma_shrink() wrappers with relocate_vma().
* Make move_page_tables() internal too.
* Have internal.h import vma.h.
* Use header guards to more cleanly implement userland testing code.
* Rename main.c to vma.c.
* Update mm/vma_internal.h to have fewer superfluous comments.
* Rework testing logic so we count test failures, and output test results.
* Correct some SPDX license prefixes.
* Make VM_xxx_ON() debug asserts forward to xxx_ON() macros.
* Update VMA tests to correctly free memory, and re-enable ASAN leak
  detection.
https://lore.kernel.org/all/cover.1719584707.git.lstoakes@gmail.com/

RFC v1:
https://lore.kernel.org/all/cover.1719481836.git.lstoakes@gmail.com/


Lorenzo Stoakes (7):
  userfaultfd: move core VMA manipulation logic to mm/userfaultfd.c
  mm: move vma_modify() and helpers to internal header
  mm: move vma_shrink(), vma_expand() to internal header
  mm: move internal core VMA manipulation functions to own file
  MAINTAINERS: Add entry for new VMA files
  tools: separate out shared radix-tree components
  tools: add skeleton code for userland testing of VMA logic

 MAINTAINERS                                   |   14 +
 fs/exec.c                                     |   81 +-
 fs/userfaultfd.c                              |  160 +-
 include/linux/atomic.h                        |    2 +-
 include/linux/mm.h                            |  112 +-
 include/linux/mmzone.h                        |    3 +-
 include/linux/userfaultfd_k.h                 |   19 +
 mm/Makefile                                   |    2 +-
 mm/internal.h                                 |  167 +-
 mm/mmap.c                                     | 2069 ++---------------
 mm/mmu_notifier.c                             |    2 +
 mm/userfaultfd.c                              |  168 ++
 mm/vma.c                                      | 1766 ++++++++++++++
 mm/vma.h                                      |  362 +++
 mm/vma_internal.h                             |   52 +
 tools/testing/radix-tree/Makefile             |   68 +-
 tools/testing/radix-tree/maple.c              |   14 +-
 tools/testing/radix-tree/xarray.c             |    9 +-
 tools/testing/shared/autoconf.h               |    2 +
 tools/testing/{radix-tree => shared}/bitmap.c |    0
 tools/testing/{radix-tree => shared}/linux.c  |    0
 .../{radix-tree => shared}/linux/bug.h        |    0
 .../{radix-tree => shared}/linux/cpu.h        |    0
 .../{radix-tree => shared}/linux/idr.h        |    0
 .../{radix-tree => shared}/linux/init.h       |    0
 .../{radix-tree => shared}/linux/kconfig.h    |    0
 .../{radix-tree => shared}/linux/kernel.h     |    0
 .../{radix-tree => shared}/linux/kmemleak.h   |    0
 .../{radix-tree => shared}/linux/local_lock.h |    0
 .../{radix-tree => shared}/linux/lockdep.h    |    0
 .../{radix-tree => shared}/linux/maple_tree.h |    0
 .../{radix-tree => shared}/linux/percpu.h     |    0
 .../{radix-tree => shared}/linux/preempt.h    |    0
 .../{radix-tree => shared}/linux/radix-tree.h |    0
 .../{radix-tree => shared}/linux/rcupdate.h   |    0
 .../{radix-tree => shared}/linux/xarray.h     |    0
 tools/testing/shared/maple-shared.h           |    9 +
 tools/testing/shared/maple-shim.c             |    7 +
 tools/testing/shared/shared.h                 |   34 +
 tools/testing/shared/shared.mk                |   68 +
 .../testing/shared/trace/events/maple_tree.h  |    5 +
 tools/testing/shared/xarray-shared.c          |    5 +
 tools/testing/shared/xarray-shared.h          |    4 +
 tools/testing/vma/.gitignore                  |    6 +
 tools/testing/vma/Makefile                    |   16 +
 tools/testing/vma/errors.txt                  |    0
 tools/testing/vma/generated/autoconf.h        |    2 +
 tools/testing/vma/linux/atomic.h              |   12 +
 tools/testing/vma/linux/mmzone.h              |   38 +
 tools/testing/vma/vma.c                       |  207 ++
 tools/testing/vma/vma_internal.h              |  882 +++++++
 51 files changed, 3914 insertions(+), 2453 deletions(-)
 create mode 100644 mm/vma.c
 create mode 100644 mm/vma.h
 create mode 100644 mm/vma_internal.h
 create mode 100644 tools/testing/shared/autoconf.h
 rename tools/testing/{radix-tree => shared}/bitmap.c (100%)
 rename tools/testing/{radix-tree => shared}/linux.c (100%)
 rename tools/testing/{radix-tree => shared}/linux/bug.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/cpu.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/idr.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/init.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/kconfig.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/kernel.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/kmemleak.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/local_lock.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/lockdep.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/maple_tree.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/percpu.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/preempt.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/radix-tree.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/rcupdate.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/xarray.h (100%)
 create mode 100644 tools/testing/shared/maple-shared.h
 create mode 100644 tools/testing/shared/maple-shim.c
 create mode 100644 tools/testing/shared/shared.h
 create mode 100644 tools/testing/shared/shared.mk
 create mode 100644 tools/testing/shared/trace/events/maple_tree.h
 create mode 100644 tools/testing/shared/xarray-shared.c
 create mode 100644 tools/testing/shared/xarray-shared.h
 create mode 100644 tools/testing/vma/.gitignore
 create mode 100644 tools/testing/vma/Makefile
 create mode 100644 tools/testing/vma/errors.txt
 create mode 100644 tools/testing/vma/generated/autoconf.h
 create mode 100644 tools/testing/vma/linux/atomic.h
 create mode 100644 tools/testing/vma/linux/mmzone.h
 create mode 100644 tools/testing/vma/vma.c
 create mode 100644 tools/testing/vma/vma_internal.h

--
2.45.2
.

From: yangge1116@126.com
To: akpm@linux-foundation.org
Cc: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	stable@vger.kernel.org,
	21cnbao@gmail.com,
	david@redhat.com,
	baolin.wang@linux.alibaba.com,
	aneesh.kumar@linux.ibm.com,
	liuzixing@hygon.cn,
	yangge <yangge1116@126.com>
Subject: [PATCH V3] mm/gup: Clear the LRU flag of a page before adding to LRU batch
Date: Wed,  3 Jul 2024 20:02:33 +0800
Message-Id: <1720008153-16035-1-git-send-email-yangge1116@126.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266509 org.kvack.linux-mm:202364
Newsgroups: org.kernel.vger.linux-kernel,org.kernel.vger.stable,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

From: yangge <yangge1116@126.com>

If a large number of CMA memory are configured in system (for example, the
CMA memory accounts for 50% of the system memory), starting a virtual
virtual machine with device passthrough, it will
call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory.
Normally if a page is present and in CMA area, pin_user_pages_remote()
will migrate the page from CMA area to non-CMA area because of
FOLL_LONGTERM flag. But the current code will cause the migration failure
due to unexpected page refcounts, and eventually cause the virtual machine
fail to start.

If a page is added in LRU batch, its refcount increases one, remove the
page from LRU batch decreases one. Page migration requires the page is not
referenced by others except page mapping. Before migrating a page, we
should try to drain the page from LRU batch in case the page is in it,
however, folio_test_lru() is not sufficient to tell whether the page is
in LRU batch or not, if the page is in LRU batch, the migration will fail.

To solve the problem above, we modify the logic of adding to LRU batch.
Before adding a page to LRU batch, we clear the LRU flag of the page so
that we can check whether the page is in LRU batch by folio_test_lru(page).
Seems making the LRU flag of the page invisible a long time is no problem,
because a new page is allocated from buddy and added to the lru batch,
its LRU flag is also not visible for a long time.

Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Cc: <stable@vger.kernel.org>
Signed-off-by: yangge <yangge1116@126.com>
---
 mm/swap.c | 43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

V3:
   Add fixes tag
V2:
   Adjust code and commit message according to David's comments

diff --git a/mm/swap.c b/mm/swap.c
index dc205bd..9caf6b0 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -211,10 +211,6 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
 	for (i = 0; i < folio_batch_count(fbatch); i++) {
 		struct folio *folio = fbatch->folios[i];
 
-		/* block memcg migration while the folio moves between lru */
-		if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
-			continue;
-
 		folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
 		move_fn(lruvec, folio);
 
@@ -255,11 +251,16 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio)
 void folio_rotate_reclaimable(struct folio *folio)
 {
 	if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
-	    !folio_test_unevictable(folio) && folio_test_lru(folio)) {
+	    !folio_test_unevictable(folio)) {
 		struct folio_batch *fbatch;
 		unsigned long flags;
 
 		folio_get(folio);
+		if (!folio_test_clear_lru(folio)) {
+			folio_put(folio);
+			return;
+		}
+
 		local_lock_irqsave(&lru_rotate.lock, flags);
 		fbatch = this_cpu_ptr(&lru_rotate.fbatch);
 		folio_batch_add_and_move(fbatch, folio, lru_move_tail_fn);
@@ -352,11 +353,15 @@ static void folio_activate_drain(int cpu)
 
 void folio_activate(struct folio *folio)
 {
-	if (folio_test_lru(folio) && !folio_test_active(folio) &&
-	    !folio_test_unevictable(folio)) {
+	if (!folio_test_active(folio) && !folio_test_unevictable(folio)) {
 		struct folio_batch *fbatch;
 
 		folio_get(folio);
+		if (!folio_test_clear_lru(folio)) {
+			folio_put(folio);
+			return;
+		}
+
 		local_lock(&cpu_fbatches.lock);
 		fbatch = this_cpu_ptr(&cpu_fbatches.activate);
 		folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
@@ -700,6 +705,11 @@ void deactivate_file_folio(struct folio *folio)
 		return;
 
 	folio_get(folio);
+	if (!folio_test_clear_lru(folio)) {
+		folio_put(folio);
+		return;
+	}
+
 	local_lock(&cpu_fbatches.lock);
 	fbatch = this_cpu_ptr(&cpu_fbatches.lru_deactivate_file);
 	folio_batch_add_and_move(fbatch, folio, lru_deactivate_file_fn);
@@ -716,11 +726,16 @@ void deactivate_file_folio(struct folio *folio)
  */
 void folio_deactivate(struct folio *folio)
 {
-	if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
-	    (folio_test_active(folio) || lru_gen_enabled())) {
+	if (!folio_test_unevictable(folio) && (folio_test_active(folio) ||
+	    lru_gen_enabled())) {
 		struct folio_batch *fbatch;
 
 		folio_get(folio);
+		if (!folio_test_clear_lru(folio)) {
+			folio_put(folio);
+			return;
+		}
+
 		local_lock(&cpu_fbatches.lock);
 		fbatch = this_cpu_ptr(&cpu_fbatches.lru_deactivate);
 		folio_batch_add_and_move(fbatch, folio, lru_deactivate_fn);
@@ -737,12 +752,16 @@ void folio_deactivate(struct folio *folio)
  */
 void folio_mark_lazyfree(struct folio *folio)
 {
-	if (folio_test_lru(folio) && folio_test_anon(folio) &&
-	    folio_test_swapbacked(folio) && !folio_test_swapcache(folio) &&
-	    !folio_test_unevictable(folio)) {
+	if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
+	    !folio_test_swapcache(folio) && !folio_test_unevictable(folio)) {
 		struct folio_batch *fbatch;
 
 		folio_get(folio);
+		if (!folio_test_clear_lru(folio)) {
+			folio_put(folio);
+			return;
+		}
+
 		local_lock(&cpu_fbatches.lock);
 		fbatch = this_cpu_ptr(&cpu_fbatches.lru_lazyfree);
 		folio_batch_add_and_move(fbatch, folio, lru_lazyfree_fn);
-- 
2.7.4

.

Return-Path: <owner-linux-mm@kvack.org>
Date: Wed, 3 Jul 2024 22:27:20 +0800
From: kernel test robot <lkp@intel.com>
To: Frank Li <Frank.Li@nxp.com>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>,
	Shawn Guo <shawnguo@kernel.org>
Subject: [linux-next:master 8736/9748]
 arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: watchdog@2ad0000:
 Unevaluated properties are not allowed ('big-endian' was unexpected)
Message-ID: <202407040022.ChwakxXg-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202376
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   0b58e108042b0ed28a71cd7edf5175999955b233
commit: 38397efe228326ab4f7928ee9e0c1f1f752d56a5 [8736/9748] arm64: dts: fsl-ls1043a: remove unused clk-name at watchdog node
config: arm64-randconfig-051-20240702 (https://download.01.org/0day-ci/archive/20240704/202407040022.ChwakxXg-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 326ba38a991250a8587a399a260b0f7af2c9166a)
dtschema version: 2024.6.dev3+g650bf2d
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240704/202407040022.ChwakxXg-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407040022.ChwakxXg-lkp@intel.com/

dtcheck warnings: (new ones prefixed by >>)
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000: failed to match any schema with compatible: ['fsl,qe', 'simple-bus']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/qeic@80: failed to match any schema with compatible: ['fsl,qe-ic']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian' was unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1043a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/msi-controller1@1571000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/msi-controller2@1572000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/msi-controller3@1573000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: pcie@3400000: fsl,pcie-scfg:0: [23, 0] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/qeic@80: failed to match any schema with compatible: ['fsl,qe-ic']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/ucc@2000: failed to match any schema with compatible: ['fsl,ucc-hdlc']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian' was unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1043a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/msi-controller1@1571000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/msi-controller2@1572000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/msi-controller3@1573000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: pcie@3400000: fsl,pcie-scfg:0: [22, 0] is too long
--
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000: failed to match any schema with compatible: ['fsl,qe', 'simple-bus']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/qeic@80: failed to match any schema with compatible: ['fsl,qe-ic']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/si@700: failed to match any schema with compatible: ['fsl,ls1043-qe-si', 'fsl,t1040-qe-si']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/siram@1000: failed to match any schema with compatible: ['fsl,ls1043-qe-siram', 'fsl,t1040-qe-siram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
>> arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: watchdog@2ad0000: Unevaluated properties are not allowed ('big-endian' was unexpected)
   	from schema $id: http://devicetree.org/schemas/watchdog/fsl-imx-wdt.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: aux-bus: '#address-cells', '#size-cells', 'compatible', 'dma-ranges', 'ranges', 'sata@3200000', 'usb@2f00000', 'usb@3000000', 'usb@3100000' do not match any of the regexes: 'pinctrl-[0-9]+'
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: aux-bus: 'panel' is a required property
   	from schema $id: http://devicetree.org/schemas/display/dp-aux-bus.yaml#
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/aux-bus/sata@3200000: failed to match any schema with compatible: ['fsl,ls1043a-ahci']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/msi-controller1@1571000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/msi-controller2@1572000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: /soc/msi-controller3@1573000: failed to match any schema with compatible: ['fsl,ls1043a-msi']
   arch/arm64/boot/dts/freescale/fsl-ls1043a-tqmls1043a-mbls10xxa.dtb: pcie@3400000: fsl,pcie-scfg:0: [25, 0] is too long

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Message-ID: <d2841226-e27b-4d3d-a578-63587a3aa4f3@amd.com>
Date: Wed, 3 Jul 2024 20:41:03 +0530
Content-Language: en-US
From: Bharata B Rao <bharata@amd.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, nikunj@amd.com,
 "Upadhyay, Neeraj" <Neeraj.Upadhyay@amd.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 David Hildenbrand <david@redhat.com>, willy@infradead.org, vbabka@suse.cz,
 yuzhao@google.com, kinseyho@google.com, Mel Gorman <mgorman@suse.de>
Subject: Hard and soft lockups with FIO and LTP runs on a large system
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266754 org.kvack.linux-mm:202382
Newsgroups: org.kernel.vger.linux-kernel,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Many soft and hard lockups are seen with upstream kernel when running a 
bunch of tests that include FIO and LTP filesystem test on 10 NVME 
disks. The lockups can appear anywhere between 2 to 48 hours. Originally 
this was reported on a large customer VM instance with passthrough NVME 
disks on older kernels(v5.4 based). However, similar problems were 
reproduced when running the tests on bare metal with latest upstream 
kernel (v6.10-rc3). Other lockups with different signatures are seen but 
in this report, only those related to MM area are being discussed.
Also note that the subsequent description is related to the lockups in 
bare metal upstream (and not VM).

The general observation is that the problem usually surfaces when the 
system free memory goes very low and page cache/buffer consumption hits 
the ceiling. Most of the times the two contended locks are lruvec and 
inode->i_lock spinlocks.

- Could this be a scalability issue in LRU list handling and/or page 
cache invalidation typical to a large system configuration?
- Are there any MM/FS tunables that could help here?

Hardware configuration
======================
Dual socket  AMD EPYC 128 Core processor (256 cores, 512 threads)
Memory: 1.5 TB
10 NVME - 3.5TB each
available: 2 nodes (0-1)
node 0 cpus: 0-127,256-383
node 0 size: 773727 MB
node 1 cpus: 128-255,384-511
node 1 size: 773966 MB

Workload details
================
Workload includes concurrent runs of FIO and a few FS tests from LTP.

FIO is run with a size of 1TB on each NVME partition with different 
combinations of ioengine/blocksize/mode parameters and buffered-IO. 
Selected FS tests from LTP are run on 256GB partitions of all NVME 
disks. This is the typical NVME partition layout.

nvme2n1      259:4   0   3.5T  0 disk
├─nvme2n1p1  259:6   0   256G  0 part /data_nvme2n1p1
└─nvme2n1p2  259:7   0   3.2T  0 part

Though many different runs exist in the workload, the combination that 
results in the problem is buffered-IO run with sync engine.

fio -filename=/dev/nvme1n1p2 -direct=0 -thread -size=1024G \
-rwmixwrite=30  --norandommap --randrepeat=0 -ioengine=sync -bs=4k \
-numjobs=400 -runtime=25000 --time_based -group_reporting -name=mytest

Watchdog threshold was reduced to 5s to reproduce the problem early and 
all CPU backtrace enabled.

Problem details and analysis
============================
One of the hard lockups which was observed and analyzed in detail is this:

kernel: watchdog: Watchdog detected hard LOCKUP on cpu 284
kernel: CPU: 284 PID: 924096 Comm: cat Not tainted 6.10.0-rc3-lruvec #9
kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x2b4/0x300
kernel: Call Trace:
kernel:  <NMI>
kernel:  ? show_regs+0x69/0x80
kernel:  ? watchdog_hardlockup_check+0x19e/0x360
<SNIP>
kernel:  ? native_queued_spin_lock_slowpath+0x2b4/0x300
kernel:  </NMI>
kernel:  <TASK>
kernel:  ? __pfx_lru_add_fn+0x10/0x10
kernel: _raw_spin_lock_irqsave+0x42/0x50
kernel: folio_lruvec_lock_irqsave+0x62/0xb0
kernel: folio_batch_move_lru+0x79/0x2a0
kernel: folio_add_lru+0x6d/0xf0
kernel: filemap_add_folio+0xba/0xe0
kernel: __filemap_get_folio+0x137/0x2e0
kernel: ext4_da_write_begin+0x12c/0x270
kernel: generic_perform_write+0xbf/0x200
kernel: ext4_buffered_write_iter+0x67/0xf0
kernel: ext4_file_write_iter+0x70/0x780
kernel: vfs_write+0x301/0x420
kernel: ksys_write+0x67/0xf0
kernel: __x64_sys_write+0x19/0x20
kernel: x64_sys_call+0x1689/0x20d0
kernel: do_syscall_64+0x6b/0x110
kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e	kernel: RIP: 
0033:0x7fe21c314887

With all CPU backtraces enabled, many CPUs are waiting for lruvec_lock 
acquisition. We measured the lruvec spinlock start, end and hold 
time(htime) using sched_clock(), along with a BUG() if the hold time was 
more than 10s. The below case shows that lruvec spin lock was held for ~25s.

kernel: vmscan: unlock_page_lruvec_irq: stime 27963327514341, etime 
27963324369895, htime 25889317166
kernel: ------------[ cut here ]------------
kernel: kernel BUG at include/linux/memcontrol.h:1677!
kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
kernel: CPU: 21 PID: 3211 Comm: kswapd0 Tainted: G        W 
6.10.0-rc3-qspindbg #10
kernel: RIP: 0010:shrink_active_list+0x40a/0x520

And the corresponding trace point for the above:
kswapd0-3211    [021] dN.1. 27963.324332: mm_vmscan_lru_isolate: 
classzone=0 order=0 nr_requested=1 nr_scanned=156946361 
nr_skipped=156946360 nr_taken=1 lru=active_file

This shows that isolate_lru_folios() is scanning through a huge number 
(~150million) of folios (order=0) with lruvec spinlock held. This is 
happening because a large number of folios are being skipped to isolate 
a few ZONE_DMA folios. Though the number of folios to be scanned is 
bounded (32), there exists a genuine case where this can become 
unbounded, i.e. in case where folios are skipped.

Meminfo output shows that the free memory is around ~2% and page/buffer 
cache grows very high when the lockup happens.

MemTotal:       1584835956 kB
MemFree:        27805664 kB
MemAvailable:   1568099004 kB
Buffers:        1386120792 kB
Cached:         151894528 kB
SwapCached:        30620 kB
Active:         1043678892 kB
Inactive:       494456452 kB

Often times, the perf output at the time of the problem shows heavy 
contention on lruvec spin lock. Similar contention is also observed with 
inode i_lock (in clear_shadow_entry path)

98.98%  fio    [kernel.kallsyms]   [k] native_queued_spin_lock_slowpath
    |
     --98.96%--native_queued_spin_lock_slowpath
        |
         --98.96%--_raw_spin_lock_irqsave
                   folio_lruvec_lock_irqsave
                   |
                    --98.78%--folio_batch_move_lru
                        |
                         --98.63%--deactivate_file_folio
                                   mapping_try_invalidate
                                   invalidate_mapping_pages
                                   invalidate_bdev
                                   blkdev_common_ioctl
                                   blkdev_ioctl
                                   __x64_sys_ioctl
                                   x64_sys_call
                                   do_syscall_64
                                   entry_SYSCALL_64_after_hwframe

Some experiments tried
======================
1) When MGLRU was enabled many soft lockups were observed, no hard 
lockups were seen for 48 hours run. Below is once such soft lockup.

kernel: watchdog: BUG: soft lockup - CPU#29 stuck for 11s! [fio:2701649]
kernel: CPU: 29 PID: 2701649 Comm: fio Tainted: G             L 
6.10.0-rc3-mglru-irqstrc #24
kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x2b4/0x300
kernel: Call Trace:
kernel:  <IRQ>
kernel:  ? show_regs+0x69/0x80
kernel:  ? watchdog_timer_fn+0x223/0x2b0
kernel:  ? __pfx_watchdog_timer_fn+0x10/0x10
<SNIP>
kernel:  </IRQ>
kernel:  <TASK>
kernel:  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
kernel:  ? native_queued_spin_lock_slowpath+0x2b4/0x300
kernel:  _raw_spin_lock+0x38/0x50
kernel:  clear_shadow_entry+0x3d/0x100
kernel:  ? __pfx_workingset_update_node+0x10/0x10
kernel:  mapping_try_invalidate+0x117/0x1d0
kernel:  invalidate_mapping_pages+0x10/0x20
kernel:  invalidate_bdev+0x3c/0x50
kernel:  blkdev_common_ioctl+0x5f7/0xa90
kernel:  blkdev_ioctl+0x109/0x270
kernel:  x64_sys_call+0x1215/0x20d0
kernel:  do_syscall_64+0x7e/0x130

This happens to be contending on inode i_lock spinlock.

Below preemptirqsoff trace points to preemption being disabled for more 
than 10s and the lock in picture is lruvec spinlock.

     # tracer: preemptirqsoff
     #
     # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-mglru-irqstrc
     # --------------------------------------------------------------------
     # latency: 10382682 us, #4/4, CPU#128 | (M:desktop VP:0, KP:0, SP:0 
HP:0 #P:512)
     #    -----------------
     #    | task: fio-2701523 (uid:0 nice:0 policy:0 rt_prio:0)
     #    -----------------
     #  => started at: deactivate_file_folio
     #  => ended at:   deactivate_file_folio
     #
     #
     #                    _------=> CPU#
     #                   / _-----=> irqs-off/BH-disabled
     #                  | / _----=> need-resched
     #                  || / _---=> hardirq/softirq
     #                  ||| / _--=> preempt-depth
     #                  |||| / _-=> migrate-disable
     #                  ||||| /     delay
     #  cmd     pid     |||||| time  |   caller
     #     \   /        ||||||  \    |    /
          fio-2701523 128...1.    0us$: deactivate_file_folio 
<-deactivate_file_folio
          fio-2701523 128.N.1. 10382681us : deactivate_file_folio 
<-deactivate_file_folio
          fio-2701523 128.N.1. 10382683us : tracer_preempt_on 
<-deactivate_file_folio
          fio-2701523 128.N.1. 10382691us : <stack trace>
      => deactivate_file_folio
      => mapping_try_invalidate
      => invalidate_mapping_pages
      => invalidate_bdev
      => blkdev_common_ioctl
      => blkdev_ioctl
      => __x64_sys_ioctl
      => x64_sys_call
      => do_syscall_64
      => entry_SYSCALL_64_after_hwframe

2) Increased low_watermark_threshold to 10% to prevent system from 
entering into extremely low memory situation. Although hard lockups 
weren't seen, but soft lockups (clear_shadow_entry()) were still seen.

3) AMD has a BIOS setting called NPS (Nodes per socket), using which a 
socket can be further partitioned into smaller NUMA nodes. With NPS=4, 
there will be four NUMA nodes in one socket, and hence 8 NUMA nodes in 
the system. This was done to check if having more number of kswapd 
threads working on lesser number of folios per node would make a 
difference. However here too, multiple  soft lockups were seen (in 
clear_shadow_entry() as seen in MGLRU case). No hard lockups were observed.

Any insights/suggestion into these lockups and suggestions are welcome!

Regards,
Bharata.
.

Date: Wed, 3 Jul 2024 23:42:05 +0800
From: kernel test robot <lkp@intel.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: oe-kbuild-all@lists.linux.dev, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Kees Cook <keescook@chromium.org>
Subject: WARNING: modpost: vmlinux: section mismatch in reference:
 alloc_tag_restore+0x3c (section: .text.unlikely) -> initcall_level_names
 (section: .init.data)
Message-ID: <202407032306.gi9nZsBi-lkp@intel.com>
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266798 org.kvack.linux-mm:202385
Newsgroups: org.kernel.vger.linux-kernel,dev.linux.lists.oe-kbuild-all,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
head:   e9d22f7a6655941fc8b2b942ed354ec780936b3e
commit: b951aaff503502a7fe066eeed2744ba8a6413c89 mm: enable page allocation tagging
date:   10 weeks ago
config: xtensa-randconfig-r051-20240703 (https://download.01.org/0day-ci/archive/20240703/202407032306.gi9nZsBi-lkp@intel.com/config)
compiler: xtensa-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240703/202407032306.gi9nZsBi-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407032306.gi9nZsBi-lkp@intel.com/

All warnings (new ones prefixed by >>, old ones prefixed by <<):

WARNING: modpost: missing MODULE_DESCRIPTION() in vmlinux.o
>> WARNING: modpost: vmlinux: section mismatch in reference: alloc_tag_restore+0x3c (section: .text.unlikely) -> initcall_level_names (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: bitmap_copy_clear_tail+0x44 (section: .text.unlikely) -> __setup_str_initcall_blacklist (section: .init.rodata)

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
.

Return-Path: <owner-linux-mm@kvack.org>
Date: Thu, 4 Jul 2024 00:45:47 +0800
From: kernel test robot <lkp@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: oe-kbuild-all@lists.linux.dev,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: [linux-next:master 9885/10451] sparc64-linux-ld:
 kernel/rcu/tree_stall.h:797:undefined reference to `csd_lock_is_stuck'
Message-ID: <202407040055.af1kwNa1-lkp@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202393
Newsgroups: org.kvack.linux-mm,dev.linux.lists.oe-kbuild-all
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   0b58e108042b0ed28a71cd7edf5175999955b233
commit: 3be88389f46263f166973e80e528dcc9268e24cb [9885/10451] rcu: Summarize RCU CPU stall warnings during CSD-lock stalls
config: sparc-randconfig-r034-20220519 (https://download.01.org/0day-ci/archive/20240704/202407040055.af1kwNa1-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240704/202407040055.af1kwNa1-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407040055.af1kwNa1-lkp@intel.com/

All errors (new ones prefixed by >>):

   sparc64-linux-ld: kernel/rcu/tree.o: in function `check_cpu_stall':
   kernel/rcu/tree_stall.h:797:(.text+0x734c): undefined reference to `csd_lock_is_stuck'
>> sparc64-linux-ld: kernel/rcu/tree_stall.h:797:(.text+0x7384): undefined reference to `csd_lock_is_stuck'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

.

Date: Wed,  3 Jul 2024 10:42:25 -0700
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-ID: <20240703174225.3891393-1-surenb@google.com>
Subject: [PATCH 1/1] mm: add comments for allocation helpers explaining why
 they are macros
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: kent.overstreet@linux.dev, jack@suse.cz, thorsten.blum@toblux.com, 
	christian.koenig@amd.com, hch@lst.de, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org, surenb@google.com
Content-Type: text/plain; charset="UTF-8"
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1266964 org.kvack.linux-mm:202396
Newsgroups: org.kernel.vger.linux-kernel,org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

A number of allocation helper functions were converted into macros to
account them at the call sites. Add a comment for each converted
allocation helper explaining why it has to be a macro and why we
typecast the return value wherever required.
The patch also moves acpi_os_acquire_object() closer to other allocation
helpers to group them together under the same comment.
The patch has no functional changes.

Fixes: 2c321f3f70bc ("mm: change inlined allocation helpers to account at the call site")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 fs/nfs/iostat.h                   |  4 ++++
 include/acpi/platform/aclinuxex.h |  9 ++++++---
 include/linux/bpf.h               |  4 ++++
 include/linux/dma-fence-chain.h   |  4 ++++
 include/linux/hid_bpf.h           |  5 +++++
 include/linux/jbd2.h              | 10 ++++++++++
 include/linux/skbuff.h            |  8 ++++++++
 include/linux/skmsg.h             |  5 +++++
 8 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/iostat.h b/fs/nfs/iostat.h
index b17a9eb9b148..49862c95b224 100644
--- a/fs/nfs/iostat.h
+++ b/fs/nfs/iostat.h
@@ -46,6 +46,10 @@ static inline void nfs_add_stats(const struct inode *inode,
 	nfs_add_server_stats(NFS_SERVER(inode), stat, addend);
 }
 
+/*
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag).
+ */
 #define nfs_alloc_iostats()	alloc_percpu(struct nfs_iostats)
 
 static inline void nfs_free_iostats(struct nfs_iostats __percpu *stats)
diff --git a/include/acpi/platform/aclinuxex.h b/include/acpi/platform/aclinuxex.h
index 62cac266a1c8..eeff40295b4b 100644
--- a/include/acpi/platform/aclinuxex.h
+++ b/include/acpi/platform/aclinuxex.h
@@ -46,6 +46,9 @@ acpi_status acpi_os_terminate(void);
  * Interrupts are off during resume, just like they are for boot.
  * However, boot has  (system_state != SYSTEM_RUNNING)
  * to quiet __might_sleep() in kmalloc() and resume does not.
+ *
+ * These specialized allocators have to be macros for their allocations to be
+ * accounted separately (to have separate alloc_tag).
  */
 #define acpi_os_allocate(_size)	\
 		kmalloc(_size, irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL)
@@ -53,14 +56,14 @@ acpi_status acpi_os_terminate(void);
 #define acpi_os_allocate_zeroed(_size)	\
 		kzalloc(_size, irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL)
 
+#define acpi_os_acquire_object(_cache)	\
+		kmem_cache_zalloc(_cache, irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL)
+
 static inline void acpi_os_free(void *memory)
 {
 	kfree(memory);
 }
 
-#define acpi_os_acquire_object(_cache)	\
-		kmem_cache_zalloc(_cache, irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL)
-
 static inline acpi_thread_id acpi_os_get_thread_id(void)
 {
 	return (acpi_thread_id) (unsigned long)current;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5e694a308081..4cef340737c4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2261,6 +2261,10 @@ void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags);
 #else
+/*
+ * These specialized allocators have to be macros for their allocations to be
+ * accounted separately (to have separate alloc_tag).
+ */
 #define bpf_map_kmalloc_node(_map, _size, _flags, _node)	\
 		kmalloc_node(_size, _flags, _node)
 #define bpf_map_kzalloc(_map, _size, _flags)			\
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index ad9e2506c2f4..68c3c1e41014 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -85,6 +85,10 @@ dma_fence_chain_contained(struct dma_fence *fence)
  * dma_fence_chain_alloc
  *
  * Returns a new struct dma_fence_chain object or NULL on failure.
+ *
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag). The typecast is
+ * intentional to enforce typesafety.
  */
 #define dma_fence_chain_alloc()	\
 		((struct dma_fence_chain *)kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL))
diff --git a/include/linux/hid_bpf.h b/include/linux/hid_bpf.h
index eec2592dec12..99a3edb6cf07 100644
--- a/include/linux/hid_bpf.h
+++ b/include/linux/hid_bpf.h
@@ -152,6 +152,11 @@ static inline int hid_bpf_connect_device(struct hid_device *hdev) { return 0; }
 static inline void hid_bpf_disconnect_device(struct hid_device *hdev) {}
 static inline void hid_bpf_destroy_device(struct hid_device *hid) {}
 static inline void hid_bpf_device_init(struct hid_device *hid) {}
+/*
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag). The typecast is
+ * intentional to enforce typesafety.
+ */
 #define call_hid_bpf_rdesc_fixup(_hdev, _rdesc, _size)	\
 		((u8 *)kmemdup(_rdesc, *(_size), GFP_KERNEL))
 
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index ab04c1c27fae..a280f42c8c76 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1588,6 +1588,11 @@ void jbd2_journal_put_journal_head(struct journal_head *jh);
  */
 extern struct kmem_cache *jbd2_handle_cache;
 
+/*
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag). The typecast is
+ * intentional to enforce typesafety.
+ */
 #define jbd2_alloc_handle(_gfp_flags)	\
 		((handle_t *)kmem_cache_zalloc(jbd2_handle_cache, _gfp_flags))
 
@@ -1602,6 +1607,11 @@ static inline void jbd2_free_handle(handle_t *handle)
  */
 extern struct kmem_cache *jbd2_inode_cache;
 
+/*
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag). The typecast is
+ * intentional to enforce typesafety.
+ */
 #define jbd2_alloc_inode(_gfp_flags)	\
 		((struct jbd2_inode *)kmem_cache_alloc(jbd2_inode_cache, _gfp_flags))
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1c2902eaebd3..fd51f2de51da 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3400,6 +3400,10 @@ static inline struct page *__dev_alloc_pages_noprof(gfp_t gfp_mask,
 }
 #define __dev_alloc_pages(...)	alloc_hooks(__dev_alloc_pages_noprof(__VA_ARGS__))
 
+/*
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag).
+ */
 #define dev_alloc_pages(_order) __dev_alloc_pages(GFP_ATOMIC | __GFP_NOWARN, _order)
 
 /**
@@ -3416,6 +3420,10 @@ static inline struct page *__dev_alloc_page_noprof(gfp_t gfp_mask)
 }
 #define __dev_alloc_page(...)	alloc_hooks(__dev_alloc_page_noprof(__VA_ARGS__))
 
+/*
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag).
+ */
 #define dev_alloc_page()	dev_alloc_pages(0)
 
 /**
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c9efda9df285..d9b03e0746e7 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -414,6 +414,11 @@ void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock);
 int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 			 struct sk_msg *msg);
 
+/*
+ * This specialized allocator has to be a macro for its allocations to be
+ * accounted separately (to have a separate alloc_tag). The typecast is
+ * intentional to enforce typesafety.
+ */
 #define sk_psock_init_link()	\
 		((struct sk_psock_link *)kzalloc(sizeof(struct sk_psock_link),	\
 						 GFP_ATOMIC | __GFP_NOWARN))

base-commit: 8a9c6c40432e265600232b864f97d7c675e8be52
-- 
2.45.2.803.g4e1b14247a-goog

.

Return-Path: <owner-linux-mm@kvack.org>
Message-ID: <1cfae0c0-96a2-4308-9c62-f7a640520242@arm.com>
Date: Wed, 3 Jul 2024 18:37:48 +0100
MIME-Version: 1.0
Content-Language: en-GB
From: Ryan Roberts <ryan.roberts@arm.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
 Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@techsingularity.net>
Cc: Linux-MM <linux-mm@kvack.org>, Catalin Marinas <Catalin.Marinas@arm.com>,
 David Hildenbrand <david@redhat.com>, Matthew Wilcox <willy@infradead.org>
Subject: huge zero page confusion
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Sender: owner-linux-mm@kvack.org
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>
Xref: photonic.trudheim.com org.kvack.linux-mm:202397
Newsgroups: org.kvack.linux-mm
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

Hi Kirill, Hugh, Mel,

We recently had a problem reported at [1] that due to aarch64 arch requiring
that atomic RMW instructions raise a read fault, followed by a write fault, this
causes a huge zero page to be faulted in during the read fault, then the write
fault shatters the huge zero page, installing small zero pages for every PTE in
the PMD region, except the faulting address which gets a writable private page.

A number of ways were discussed to solve that problem. But it got me wondering
why we have this behaviour in general for huge zero page? This seems like odd
behaviour to me. Surely it would be less effort and more aligned with the app's
expectations to notice the huge zero page in the PMD, remove it, and install a
THP, as would have been done if pmd_none() was true? Or if there is a reason to
shatter on write, why not do away with the huge zero page and save some memory,
and just install a PMD's worth of small zero pages on fault?

Perhaps replacing the huge zero page with a huge THP on write fault would have
been a better behavior at the time, but perhaps changing that behaviour now
risks a memory bloat regression in some workloads?

I had some brief discussion with David H starting at [2].

Would appreciate your thoughts!

[1]
https://lore.kernel.org/all/20240626191830.3819324-1-yang@os.amperecomputing.com/
[2] https://lore.kernel.org/all/3743d7e1-0b79-4eaf-82d5-d1ca29fe347d@arm.com/

Thanks,
Ryan

.

From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Dave Jiang <dave.jiang@intel.com>,
	linuxppc-dev@lists.ozlabs.org,
	Michael Ellerman <mpe@ellerman.id.au>,
	Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Huang Ying <ying.huang@intel.com>,
	Oscar Salvador <osalvador@suse.de>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	Borislav Petkov <bp@alien8.de>,
	peterx@redhat.com,
	Hugh Dickins <hughd@google.com>,
	Rick P Edgecombe <rick.p.edgecombe@intel.com>
Subject: [PATCH v2 0/8] mm/mprotect: Fix dax puds
Date: Wed,  3 Jul 2024 17:29:10 -0400
Message-ID: <20240703212918.2417843-1-peterx@redhat.com>
Content-Type: text/plain; charset="utf-8"
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Xref: photonic.trudheim.com org.kernel.vger.linux-kernel:1267169 org.kvack.linux-mm:202403
Newsgroups: org.kernel.vger.linux-kernel,org.kvack.linux-mm,org.ozlabs.lists.linuxppc-dev
Path: photonic.trudheim.com!nntp.lore.kernel.org!not-for-mail

[Based on mm-unstable, commit 31334cf98dbd, July 2nd]

v2:
- Added tags
- Fix wrong pmd helper used in powerpc
- Added patch "mm/x86: arch_check_zapped_pud()" [Rick]
- Do proper dirty bit shifts for shadow stack on puds [Dave]
- Add missing page_table_check hooks in pudp_establish() [Dave]

v1: https://lore.kernel.org/r/20240621142504.1940209-1-peterx@redhat.com

Dax supports pud pages for a while, but mprotect on puds was missing since
the start.  This series tries to fix that by providing pud handling in
mprotect().  The goal is to add more types of pud mappings like hugetlb or
pfnmaps.  This series paves way for it by fixing known pud entries.

Considering nobody reported this until when I looked at those other types
of pud mappings, I am thinking maybe it doesn't need to be a fix for stable
and this may not need to be backported.  I would guess whoever cares about
mprotect() won't care 1G dax puds yet, vice versa.  I hope fixing that in
new kernels would be fine, but I'm open to suggestions.

There're a few small things changed to teach mprotect work on PUDs. E.g. it
will need to start with dropping NUMA_HUGE_PTE_UPDATES which may stop
making sense when there can be more than one type of huge pte.  OTOH, we'll
also need to push the mmu notifiers from pmd to pud layers, which might
need some attention but so far I think it's safe.  For such details, please
refer to each patch's commit message.

The mprotect() pud process should be straightforward, as I kept it as
simple as possible.  There's no NUMA handled as dax simply doesn't support
that.  There's also no userfault involvements as file memory (even if work
with userfault-wp async mode) will need to split a pud, so pud entry
doesn't need to yet know userfault's existance (but hugetlb entries will;
that's also for later).

Tests
=====

What I did test:

- cross-build tests that I normally cover [1]

- smoke tested on x86_64 the simplest program [2] on dev_dax 1G PUD
  mprotect() using QEMU's nvdimm emulations [3] and ndctl to create
  namespaces with proper alignments, which used to throw "bad pud" but now
  it'll run through all fine.  I checked sigbus happens if with illegal
  access on protected puds.

What I didn't test:

- fsdax: I wanted to also give it a shot, but only until then I noticed it
  doesn't seem to be supported (according to dax_iomap_fault(), which will
  always fallback on PUD_ORDER).  I did remember it was supported before, I
  could miss something important there.. please shoot if so.

- userfault wp-async: I also wanted to test userfault-wp async be able to
  split huge puds (here it's simply a clear_pud.. though), but it won't
  work for devdax anyway due to not allowed to do smaller than 1G faults in
  this case. So skip too.

- Power, as no hardware on hand.

Thanks,

[1] https://gitlab.com/peterx/lkb-harness/-/blob/main/config.json
[2] https://github.com/xzpeter/clibs/blob/master/misc/dax.c
[3] https://github.com/qemu/qemu/blob/master/docs/nvdimm.txt

Peter Xu (8):
  mm/dax: Dump start address in fault handler
  mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES
  mm/mprotect: Push mmu notifier to PUDs
  mm/powerpc: Add missing pud helpers
  mm/x86: Make pud_leaf() only cares about PSE bit
  mm/x86: arch_check_zapped_pud()
  mm/x86: Add missing pud helpers
  mm/mprotect: fix dax pud handlings

 arch/powerpc/include/asm/book3s/64/pgtable.h |  3 +
 arch/powerpc/mm/book3s64/pgtable.c           | 20 ++++++
 arch/x86/include/asm/pgtable.h               | 68 +++++++++++++++---
 arch/x86/mm/pgtable.c                        | 18 +++++
 drivers/dax/device.c                         |  6 +-
 include/linux/huge_mm.h                      | 24 +++++++
 include/linux/pgtable.h                      |  7 ++
 include/linux/vm_event_item.h                |  1 -
 mm/huge_memory.c                             | 56 ++++++++++++++-
 mm/mprotect.c                                | 74 ++++++++++++--------
 mm/vmstat.c                                  |  1 -
 11 files changed, 233 insertions(+), 45 deletions(-)

-- 
2.45.0

.

