linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-05-12	net: dsa: remove port argument from ->change_tag_protocol()	Vladimir Oltean	1	-1/+5
	DSA has not supported (and probably will not support in the future either) independent tagging protocols per CPU port. Different switch drivers have different requirements, some may need to replicate some settings for each CPU port, some may need to apply some settings on a single CPU port, while some may have to configure some global settings and then some per-CPU-port settings. In any case, the current model where DSA calls ->change_tag_protocol for each CPU port turns out to be impractical for drivers where there are global things to be done. For example, felix calls dsa_tag_8021q_register(), which makes no sense per CPU port, so it suppresses the second call. Let drivers deal with replication towards all CPU ports, and remove the CPU port argument from the function prototype. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: manage host flooding using a specific driver callback	Vladimir Oltean	1	-0/+2
	At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") - not introducing a dedicated switch callback for host flooding made sense, because for the only user, the felix driver, there was nothing different to do for the CPU port than set the flood flags on the CPU port just like on any other bridge port. There are 2 reasons why this approach is not good enough, however. (1) Other drivers, like sja1105, support configuring flooding as a function of {ingress port, egress port}, whereas the DSA ->port_bridge_flags() function only operates on an egress port. So with that driver we'd have useless host flooding from user ports which don't need it. (2) Even with the felix driver, support for multiple CPU ports makes it difficult to piggyback on ->port_bridge_flags(). The way in which the felix driver is going to support host-filtered addresses with multiple CPU ports is that it will direct these addresses towards both CPU ports (in a sort of multicast fashion), then restrict the forwarding to only one of the two using the forwarding masks. Consequently, flooding will also be enabled towards both CPU ports. However, ->port_bridge_flags() gets passed the index of a single CPU port, and that leaves the flood settings out of sync between the 2 CPU ports. This is to say, it's better to have a specific driver method for host flooding, which takes the user port as argument. This solves problem (1) by allowing the driver to do different things for different user ports, and problem (2) by abstracting the operation and letting the driver do whatever, rather than explicitly making the DSA core point to the CPU port it thinks needs to be touched. This new method also creates a problem, which is that cross-chip setups are not handled. However I don't have hardware right now where I can test what is the proper thing to do, and there isn't hardware compatible with multi-switch trees that supports host flooding. So it remains a problem to be tackled in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: introduce the dsa_cpu_ports() helper	Vladimir Oltean	1	-0/+11
	Similar to dsa_user_ports() which retrieves a port mask of all user ports, introduce dsa_cpu_ports() which retrieves the mask of all CPU ports of a switch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	skbuff: replace a BUG_ON() with the new DEBUG_NET_WARN_ON_ONCE()	Jakub Kicinski	1	-3/+1
	Very few drivers actually have Kconfig knobs for adding -DDEBUG. 8 according to a quick grep, while there are 93 users of skb_checksum_none_assert(). Switch to the new DEBUG_NET_WARN_ON_ONCE() to catch bad skbs. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220511172305.1382810-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	9	-13/+20
	No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters	Jaegeuk Kim	1	-9/+9
	No functional change. - remove checkpoint=disable check for f2fs_write_checkpoint - get sec_freed all the time Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-12	Merge tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds	5	-4/+8
	Pull networking fixes from Jakub Kicinski: "Including fixes from wireless, and bluetooth. No outstanding fires. Current release - regressions: - eth: atlantic: always deep reset on pm op, fix null-deref Current release - new code bugs: - rds: use maybe_get_net() when acquiring refcount on TCP sockets [refinement of a previous fix] - eth: ocelot: mark traps with a bool instead of guessing type based on list membership Previous releases - regressions: - net: fix skipping features in for_each_netdev_feature() - phy: micrel: fix null-derefs on suspend/resume and probe - bcmgenet: check for Wake-on-LAN interrupt probe deferral Previous releases - always broken: - ipv4: drop dst in multicast routing path, prevent leaks - ping: fix address binding wrt vrf - net: fix wrong network header length when BPF protocol translation is used on skbs with a fraglist - bluetooth: fix the creation of hdev->name - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing - wifi: ath11k: reduce the wait time of 11d scan and hw scan while adding an interface - mac80211: fix rx reordering with non explicit / psmp ack policy - mac80211: reset MBSSID parameters upon connection - nl80211: fix races in nl80211_set_tx_bitrate_mask() - tls: fix context leak on tls_device_down - sched: act_pedit: really ensure the skb is writable - batman-adv: don't skb_split skbuffs with frag_list - eth: ocelot: fix various issues with TC actions (null-deref; bad stats; ineffective drops; ineffective filter removal)" * tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) tls: Fix context leak on tls_device_down net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() mlxsw: Avoid warning during ip6gre device removal net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral net: ethernet: mediatek: ppe: fix wrong size passed to memset() Bluetooth: Fix the creation of hdev->name i40e: i40e_main: fix a missing check on list iterator net/sched: act_pedit: really ensure the skb is writable s390/lcs: fix variable dereferenced before check s390/ctcm: fix potential memory leak s390/ctcm: fix variable dereferenced before check net: atlantic: verify hw_head_ lies within TX buffer ring net: atlantic: add check for MAX_SKB_FRAGS net: atlantic: reduce scope of is_rsc_complete net: atlantic: fix "frag[0] not initialized" net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe() net: phy: micrel: Fix incorrect variable type in micrel decnet: Use container_of() for struct dn_neigh casts ...
2022-05-12	module.h: simplify MODULE_IMPORT_NS	Greg Kroah-Hartman	1	-2/+1
	In commit ca321ec74322 ("module.h: allow #define strings to work with MODULE_IMPORT_NS") I fixed up the MODULE_IMPORT_NS() macro to allow defined strings to work with it. Unfortunatly I did it in a two-stage process, when it could just be done with the __stringify() macro as pointed out by Masahiro Yamada. Clean this up to only be one macro instead of two steps to achieve the same end result. Fixes: ca321ec74322 ("module.h: allow #define strings to work with MODULE_IMPORT_NS") Reported-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Matthias Maennich <maennich@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12	kunit: Rework kunit_resource allocation policy	David Gow	1	-27/+115
	KUnit's test-managed resources can be created in two ways: - Using the kunit_add_resource() family of functions, which accept a struct kunit_resource pointer, typically allocated statically or on the stack during the test. - Using the kunit_alloc_resource() family of functions, which allocate a struct kunit_resource using kzalloc() behind the scenes. Both of these families of functions accept a 'free' function to be called when the resource is finally disposed of. At present, KUnit will kfree() the resource if this 'free' function is specified, and will not if it is NULL. However, this can lead kunit_alloc_resource() to leak memory (if no 'free' function is passed in), or kunit_add_resource() to incorrectly kfree() memory which was allocated by some other means (on the stack, as part of a larger allocation, etc), if a 'free' function is provided. Instead, always kfree() if the resource was allocated with kunit_alloc_resource(), and never kfree() if it was passed into kunit_add_resource() by the user. (If the user of kunit_add_resource() wishes the resource be kfree()ed, they can call kfree() on the resource from within the 'free' function. This is implemented by adding a 'should_free' member to struct kunit_resource and setting it appropriately. To facilitate this, the various resource add/alloc functions have been refactored somewhat, making them all call a __kunit_add_resource() helper after setting the 'should_free' member appropriately. In the process, all other functions have been made static inline functions. Signed-off-by: David Gow <davidgow@google.com> Tested-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-05-12	f2fs: change the current atomic write way	Daeho Jeong	1	-22/+0
	Current atomic write has three major issues like below. - keeps the updates in non-reclaimable memory space and they are even hard to be migrated, which is not good for contiguous memory allocation. - disk spaces used for atomic files cannot be garbage collected, so this makes it difficult for the filesystem to be defragmented. - If atomic write operations hit the threshold of either memory usage or garbage collection failure count, All the atomic write operations will fail immediately. To resolve the issues, I will keep a COW inode internally for all the updates to be flushed from memory, when we need to flush them out in a situation like high memory pressure. These COW inodes will be tagged as orphan inodes to be reclaimed in case of sudden power-cut or system failure during atomic writes. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-12	ipmi: Add an intializer for ipmi_recv_msg struct	Corey Minyard	1	-0/+5
	Don't hand-initialize the struct here, create a macro to initialize it so new fields added don't get forgotten in places. Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12	ipmi: Add an intializer for ipmi_smi_msg struct	Corey Minyard	1	-0/+6
	There was a "type" element added to this structure, but some static values were missed. The default value will be zero, which is correct, but create an initializer for the type and initialize the type properly in the initializer to avoid future issues. Reported-by: Joe Wiese <jwiese@rackspace.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12	spi: Doc fix - Describe add_lock and dma_map_dev in spi_controller	Siddh Raman Pant	1	-0/+2
	This fixes the corresponding warnings during building the docs. Signed-off-by: Siddh Raman Pant <siddhpant.gh@gmail.com> Link: https://lore.kernel.org/r/4e6187a4-d0f8-4750-e407-e09cc1c91789@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-12	trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations	Tony Luck	1	-0/+41
	Add tracing support which may be useful for debugging systems that fail to complete In Field Scan tests. Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220506225410.1652287-11-tony.luck@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12	stop_machine: Add stop_core_cpuslocked() for per-core operations	Peter Zijlstra	1	-0/+16
	Hardware core level testing features require near simultaneous execution of WRMSR instructions on all threads of a core to initiate a test. Provide a customized cut down version of stop_machine_cpuslocked() that just operates on the threads of a single core. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12	block: Fix the bio.bi_opf comment	Bart Van Assche	1	-3/+2
	Commit ef295ecf090d modified the Linux kernel such that the bottom bits of the bi_opf member contain the operation instead of the topmost bits. That commit did not update the comment next to bi_opf. Hence this patch. From commit ef295ecf090d: -#define bio_op(bio) ((bio)->bi_opf >> BIO_OP_SHIFT) +#define bio_op(bio) ((bio)->bi_opf & REQ_OP_MASK) Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Fixes: ef295ecf090d ("block: better op and flags encoding") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220511235152.1082246-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12	block: reorder the REQ_ flags	Christoph Hellwig	1	-7/+8
	Keep the op-specific flag last so that they are clearly separate from the generic flags. Various recent commits just kept adding new flags at the end. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220512061408.1826595-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12	io_uring: fix ordering of args in io_uring_queue_async_work	Dylan Yudaken	1	-1/+1
	Fix arg ordering in TP_ARGS macro, which fixes the output. Fixes: 502c87d65564c ("io-uring: Make tracepoints consistent.") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220512091834.728610-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12	usb: core: hcd: Add support for deferring roothub registration	Kishon Vijay Abraham I	1	-0/+2
	It has been observed with certain PCIe USB cards (like Inateck connected to AM64 EVM or J7200 EVM) that as soon as the primary roothub is registered, port status change is handled even before xHC is running leading to cold plug USB devices not detected. For such cases, registering both the root hubs along with the second HCD is required. Add support for deferring roothub registration in usb_add_hcd(), so that both primary and secondary roothubs are registered along with the second HCD. This patch has been added and reverted earier as it triggered a race in usb device enumeration. That race is now fixed in 5.16-rc3, and in stable back to 5.4 commit 6cca13de26ee ("usb: hub: Fix locking issues with address0_mutex") commit 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race") CC: stable@vger.kernel.org # 5.4+ Suggested-by: Mathias Nyman <mathias.nyman@linux.intel.com> Tested-by: Chris Chiu <chris.chiu@canonical.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20220510091630.16564-2-kishon@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12	ASoC: SOF: Add header for IPC4 manifest	Ranjani Sridharan	1	-0/+119
	Add the header for the IPC4 manifest. Co-developed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Rander Wang <rander.wang@intel.com> Co-developed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20220511171648.1622993-4-ranjani.sridharan@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-12	fortify: Provide a memcpy trap door for sharp corners	Kees Cook	2	-0/+20
	As we continue to narrow the scope of what the FORTIFY memcpy() will accept and build alternative APIs that give the compiler appropriate visibility into more complex memcpy scenarios, there is a need for "unfortified" memcpy use in rare cases where combinations of compiler behaviors, source code layout, etc, result in cases where the stricter memcpy checks need to be bypassed until appropriate solutions can be developed (i.e. fix compiler bugs, code refactoring, new API, etc). The intention is for this to be used only if there's no other reasonable solution, for its use to include a justification that can be used to assess future solutions, and for it to be temporary. Example usage included, based on analysis and discussion from: https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-11	bpf: add bpf_map_lookup_percpu_elem for percpu map	Feng Zhou	2	-0/+11
	Add new ebpf helpers bpf_map_lookup_percpu_elem. The implementation method is relatively simple, refer to the implementation method of map_lookup_elem of percpu map, increase the parameters of cpu, and obtain it according to the specified cpu. Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-11	Merge tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	Jakub Kicinski	1	-0/+3
	Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix the creation of hdev->name when index is greater than 9999 * tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix the creation of hdev->name ==================== Link: https://lore.kernel.org/r/20220512002901.823647-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	Merge tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless	Jakub Kicinski	1	-1/+1
	Kalle Valo says: ==================== wireless fixes for v5.18 Second set of fixes for v5.18 and hopefully the last one. We have a new iwlwifi maintainer, a fix to rfkill ioctl interface and important fixes to both stack and two drivers. * tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition nl80211: fix locking in nl80211_set_tx_bitrate_mask() mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection mac80211_hwsim: fix RCU protected chanctx access mailmap: update Kalle Valo's email mac80211: Reset MBSSID parameters upon connection cfg80211: retrieve S1G operating channel number nl80211: validate S1G channel width mac80211: fix rx reordering with non explicit / psmp ack policy ath11k: reduce the wait time of 11d scan and hw scan while add interface MAINTAINERS: update iwlwifi driver maintainer iwlwifi: iwl-dbg: Use del_timer_sync() before freeing ==================== Link: https://lore.kernel.org/r/20220511154535.A1A12C340EE@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	Bluetooth: Fix the creation of hdev->name	Itay Iellin	1	-0/+3
	Set a size limit of 8 bytes of the written buffer to "hdev->name" including the terminating null byte, as the size of "hdev->name" is 8 bytes. If an id value which is greater than 9999 is allocated, then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)" function call would lead to a truncation of the id value in decimal notation. Set an explicit maximum id parameter in the id allocation function call. The id allocation function defines the maximum allocated id value as the maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined as 10000. Signed-off-by: Itay Iellin <ieitayie@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-05-11	bpf: Fix sparse warning for bpf_kptr_xchg_proto	Kumar Kartikeya Dwivedi	1	-0/+1
	Kernel Test Robot complained about missing static storage class annotation for bpf_kptr_xchg_proto variable. sparse: symbol 'bpf_kptr_xchg_proto' was not declared. Should it be static? This caused by missing extern definition in the header. Add it to suppress the sparse warning. Fixes: c0a5a21c25f3 ("bpf: Allow storing referenced kptr in map") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220511194654.765705-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-12	sched/tracing: Append prev_state to tp args instead	Delyan Kratunov	1	-3/+3
	Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20) added a new prev_state argument to the sched_switch tracepoint, before the prev task_struct pointer. This reordering of arguments broke BPF programs that use the raw tracepoint (e.g. tp_btf programs). The type of the second argument has changed and existing programs that assume a task_struct* argument (e.g. for bpf_task_storage access) will now fail to verify. If we instead append the new argument to the end, all existing programs would continue to work and can conditionally extract the prev_state argument on supported kernel versions. Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20) Signed-off-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
2022-05-11	net/sched: act_pedit: really ensure the skb is writable	Paolo Abeni	1	-0/+1
	Currently pedit tries to ensure that the accessed skb offset is writable via skb_unclone(). The action potentially allows touching any skb bytes, so it may end-up modifying shared data. The above causes some sporadic MPTCP self-test failures, due to this code: tc -n $ns2 filter add dev ns2eth$i egress \ protocol ip prio 1000 \ handle 42 fw \ action pedit munge offset 148 u8 invert \ pipe csum tcp \ index 100 The above modifies a data byte outside the skb head and the skb is a cloned one, carrying a TCP output packet. This change addresses the issue by keeping track of a rough over-estimate highest skb offset accessed by the action and ensuring such offset is really writable. Note that this may cause performance regressions in some scenarios, but hopefully pedit is not in the critical path. Fixes: db2c24175d14 ("act_pedit: access skb->data safely") Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/1fcf78e6679d0a287dd61bb0f04730ce33b3255d.1652194627.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state	Peter Zijlstra	3	-9/+24
	Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this state exists only in task->__state and nowhere else. There's two spots of bother with this: - PREEMPT_RT has task->saved_state which complicates matters, meaning task_is_{traced,stopped}() needs to check an additional variable. - An alternative freezer implementation that itself relies on a special TASK state would loose TASK_TRACED/TASK_STOPPED and will result in misbehaviour. As such, add additional state to task->jobctl to track this state outside of task->__state. NOTE: this doesn't actually fix anything yet, just adds extra state. --EWB * didn't add a unnecessary newline in signal.h * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up instead of in signal_wake_up_state. This prevents the clearing of TASK_STOPPED and TASK_TRACED from getting lost. * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-05-11	ptrace: Don't change __state	Eric W. Biederman	3	-3/+6
	Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing. Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep). In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL when the wake up is for a fatal signal. Skip adding __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up. Handle a ptrace_stop being called with a pending fatal signal. Previously it would have been handled by schedule simply failing to sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule will sleep with a fatal_signal_pending. The code in signal_wake_up guarantees that the code will be awaked by any fatal signal that codes after TASK_TRACED is set. Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced clears JOBCTL_PTRACE_FROZEN. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP	Eric W. Biederman	1	-6/+0
	xtensa is the last user of the PT_SINGLESTEP flag. Changing tsk->ptrace in user_enable_single_step and user_disable_single_step without locking could potentiallly cause problems. So use a thread info flag instead of a flag in tsk->ptrace. Use TIF_SINGLESTEP that xtensa already had defined but unused. Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users. Cc: stable@vger.kernel.org Acked-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP	Eric W. Biederman	1	-1/+0
	User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate single stepping is a little confusing and worse changing tsk->ptrace without locking could potentionally cause problems. So use a thread info flag with a better name instead of flag in tsk->ptrace. Remove the definition PT_DTRACE as uml is the last user. Cc: stable@vger.kernel.org Acked-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	signal: Replace __group_send_sig_info with send_signal_locked	Eric W. Biederman	1	-1/+0
	The function __group_send_sig_info is just a light wrapper around send_signal_locked with one parameter fixed to a constant value. As the wrapper adds no real value update the code to directly call the wrapped function. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	signal: Rename send_signal send_signal_locked	Eric W. Biederman	1	-0/+2
	Rename send_signal and __send_signal to send_signal_locked and __send_signal_locked to make send_signal usable outside of signal.c. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11	vfio/pci: Remove vfio_device_get_from_dev()	Jason Gunthorpe	2	-3/+2
	The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio: Remove dead code	Jason Gunthorpe	1	-11/+0
	Now that callers have been updated to use the vfio_device APIs the driver facing group interface is no longer used, delete it: - vfio_group_get_external_user_from_dev() - vfio_group_pin_pages() - vfio_group_unpin_pages() - vfio_group_iommu_domain() -- Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio/mdev: Pass in a struct vfio_device * to vfio_dma_rw()	Jason Gunthorpe	1	-1/+1
	Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. Change vfio_dma_rw() to take in the struct vfio_device and move the container users that would have been held by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like vfio_pin/unpin_pages(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio/mdev: Pass in a struct vfio_device * to vfio_pin/unpin_pages()	Jason Gunthorpe	1	-2/+2
	Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	vfio: Make vfio_(un)register_notifier accept a vfio_device	Jason Gunthorpe	1	-2/+2
	All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	Merge tag 'gvt-next-2022-04-29' into v5.19/vfio/next	Alex Williamson	1	-78/+4
	Merge GVT-g dependencies for vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11	Merge tag 'mlx5-lm-parallel' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next	Alex Williamson	5	-242/+29
	Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-11	asm-generic: qrwlock: Document the spinlock fairness requirements	Palmer Dabbelt	1	-0/+4
	I could only find the fairness requirements documented as the C code, this calls them out in a comment just to be a bit more explicit. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-11	asm-generic: qspinlock: Indicate the use of mixed-size atomics	Peter Zijlstra	1	-0/+29
	The qspinlock implementation depends on having well behaved mixed-size atomics. This is true on the more widely-used platforms, but these requirements are somewhat subtle and may not be satisfied by all the platforms that qspinlock is used on. Document these requirements, so ports that use qspinlock can more easily determine if they meet these requirements. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-11	asm-generic: ticket-lock: New generic ticket-based spinlock	Peter Zijlstra	2	-7/+104
	This is a simple, fair spinlock. Specifically it doesn't have all the subtle memory model dependencies that qspinlock has, which makes it more suitable for simple systems as it is more likely to be correct. It is implemented entirely in terms of standard atomics and thus works fine without any arch-specific code. This replaces the existing asm-generic/spinlock.h, which just errored out on SMP systems. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-11	Drivers: hv: vmbus: Remove support for Hyper-V 2008 and Hyper-V 2008R2/Win7	Michael Kelley	1	-3/+7
	The VMbus driver has special case code for running on the first released versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions are now out of support (except for extended security updates) and lack the performance features needed for effective production usage of Linux guests. Simplify the code by removing the negotiation of the VMbus protocol versions required for these releases of Hyper-V, and by removing the special case code for handling these VMbus protocol versions. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/1651509391-2058-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-05-11	swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm	Christoph Hellwig	3	-33/+0
	swiotlb-xen uses very different ways to allocate coherent memory on x86 vs arm. On the former it allocates memory from the page allocator, while on the later it reuses the dma-direct allocator the handles the complexities of non-coherent DMA on arm platforms. Unfortunately the complexities of trying to deal with the two cases in the swiotlb-xen.c code lead to a bug in the handling of DMA_ATTR_NO_KERNEL_MAPPING on arm. With the DMA_ATTR_NO_KERNEL_MAPPING flag the coherent memory allocator does not actually allocate coherent memory, but just a DMA handle for some memory that is DMA addressable by the device, but which does not have to have a kernel mapping. Thus dereferencing the return value will lead to kernel crashed and memory corruption. Fix this by using the dma-direct allocator directly for arm, which works perfectly fine because on arm swiotlb-xen is only used when the domain is 1:1 mapped, and then simplifying the remaining code to only cater for the x86 case with DMA coherent device. Reported-by: Rahul Singh <Rahul.Singh@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rahul Singh <rahul.singh@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Tested-by: Rahul Singh <rahul.singh@arm.com>
2022-05-11	USB: gadget: Add ID numbers to gadget names	Alan Stern	1	-0/+2
	Putting USB gadgets on a new bus of their own encounters a problem when multiple gadgets are present: They all have the same name! The driver core fails with a "sys: cannot create duplicate filename" error when creating any of the /sys/bus/gadget/devices/<gadget-name> symbolic links after the first. This patch fixes the problem by adding a ".N" suffix to each gadget's name when the gadget is registered (where N is a unique ID number), thus making the names distinct. Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: fc274c1e9973 ("USB: gadget: Add a new bus for gadgets") Link: https://lore.kernel.org/r/YnqKAXKyp9Vq/pbn@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-11	platform_data/mlxreg: Add field for notification callback	Michael Shych	1	-0/+4
	Add notification callback to inform caller that platform driver probing has been completed. It allows to caller to perform some initialization flow steps depending on specific driver probing completion. Signed-off-by: Michael Shych <michaelsh@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20220430115809.54565-2-michaelsh@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-11	Merge branch 'v5.18-rc5'	Peter Zijlstra	64	-377/+399
	Obtain the new INTEL_FAM6 stuff required. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-05-11	locking/qrwlock: Change "queue rwlock" to "queued rwlock"	Waiman Long	2	-15/+15
	Queued rwlock was originally named "queue rwlock" which wasn't quite grammatically correct. However there are still some "queue rwlock" references in the code. Change those to "queued rwlock" for consistency. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220510192134.434753-1-longman@redhat.com