linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-04-26	cgroup: Factor out and expose cgroup_rstat_*() interface functions	Tejun Heo	3	-15/+39
	cgroup_rstat is being generalized so that controllers can use it too. This patch factors out and exposes the following interface functions. * cgroup_rstat_updated(): Renamed from cgroup_rstat_cpu_updated() for consistency. * cgroup_rstat_flush_hold/release(): Factored out from base stat implementation. * cgroup_rstat_flush(): Verbatim expose. While at it, drop assert on cgroup_rstat_mutex in cgroup_base_stat_flush() as it crosses layers and make a minor comment update. v2: Added EXPORT_SYMBOL_GPL(cgroup_rstat_updated) to fix a build bug. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-04-26	cgroup: Reorganize kernel/cgroup/rstat.c	Tejun Heo	2	-89/+95
	Currently, rstat.c has rstat and base stat implementations intermixed. Collect base stat implementation at the end of the file. Also, reorder the prototypes. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-04-26	cgroup: Distinguish base resource stat implementation from rstat	Tejun Heo	4	-50/+52
	Base resource stat accounts universial (not specific to any controller) resource consumptions on top of rstat. Currently, its implementation is intermixed with rstat implementation making the code confusing to follow. This patch clarifies the distintion by doing the followings. * Encapsulate base resource stat counters, currently only cputime, in struct cgroup_base_stat. * Move prev_cputime into struct cgroup and initialize it with cgroup. * Rename the related functions so that they start with cgroup_base_stat. * Prefix the related variables and field names with b. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-04-26	cgroup: Rename stat to rstat	Tejun Heo	4	-108/+112
	stat is too generic a name and ends up causing subtle confusions. It'll be made generic so that controllers can plug into it, which will make the problem worse. Let's rename it to something more specific - cgroup_rstat for cgroup recursive stat. This patch does the following renames. No other changes. * cpu_stat -> rstat_cpu * stat -> rstat * ?cstat -> ?rstatc Note that the renames are selective. The unrenamed are the ones which implement basic resource statistics on top of rstat. This will be further cleaned up in the following patches. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-04-26	cgroup: Rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c	Tejun Heo	2	-1/+1
	stat is too generic a name and ends up causing subtle confusions. It'll be made generic so that controllers can plug into it, which will make the problem worse. Let's rename it to something more specific - cgroup_rstat for cgroup recursive stat. First, rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c. No content changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-04-26	cgroup: Limit event generation frequency	Tejun Heo	2	-2/+25
	".events" files generate file modified event to notify userland of possible new events. Some of the events can be quite bursty (e.g. memory high event) and generating notification each time is costly and pointless. This patch implements a event rate limit mechanism. If a new notification is requested before 10ms has passed since the previous notification, the new notification is delayed till then. As this only delays from the second notification on in a given close cluster of notifications, userland reactions to notifications shouldn't be delayed at all in most cases while avoiding notification storms. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-04-26	cgroup: Explicitly remove core interface files	Tejun Heo	1	-13/+22
	The "cgroup." core interface files bypass the usual interface removal path and get removed recursively along with the cgroup itself. While this works now, the subtle discrepancy gets in the way of implementing common mechanisms. This patch updates cgroup core interface file handling so that it's consistent with controller interface files. When added, the css is marked CSS_VISIBLE and they're explicitly removed before the cgroup is destroyed. This doesn't cause user-visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-04-25	Revert "blk-mq: remove code for dealing with remapping queue"	Ming Lei	1	-3/+31
	This reverts commit 37c7c6c76d431dd7ef9c29d95f6052bd425f004c. Turns out some drivers(most are FC drivers) may not use managed IRQ affinity, and has their customized .map_queues meantime, so still keep this code for avoiding regression. Reported-by: Laurence Oberman <loberman@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Ewan Milne <emilne@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25	block: mq: Add some minor doc for core structs	Linus Walleij	2	-0/+6
	As it came up in discussion on the mailing list that the semantic meaning of 'blk_mq_ctx' and 'blk_mq_hw_ctx' isn't completely obvious to everyone, let's add some minimal kerneldoc for a starter. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25	bcache: mark Coly Li as bcache maintainer	Jens Axboe	1	-0/+1
	Since Michael had to step back, Coly has agreed to be the new maintainer. Mark him as such. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25	MAINTAINERS: Remove me as maintainer of bcache	Michael Lyle	1	-1/+0
	Too much to do with other projects. I've enjoyed working with everyone here, and hope to occasionally contribute on bcache. Signed-off-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25	random: rate limit unseeded randomness warnings	Theodore Ts'o	1	-5/+34
	On systems without sufficient boot randomness, no point spamming dmesg. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-04-24	ACPI / video: Only default only_lcd to true on Win8-ready _desktops_	Hans de Goede	1	-2/+25
	Commit 5928c281524f (ACPI / video: Default lcd_only to true on Win8-ready and newer machines) made only_lcd default to true on all machines where acpi_osi_is_win8() returns true, including laptops. The purpose of this is to avoid the bogus / non-working acpi backlight interface which many newer BIOS-es define on desktop machines. But this is causing a regression on some laptops, specifically on the Dell XPS 13 2013 model, which does not have the LCD flag set for its fully functional ACPI backlight interface. Rather then DMI quirking our way out of this, this commits changes the logic for setting only_lcd to true, to only do this on machines with a desktop (or server) dmi chassis-type. Note that we cannot simply only check the chassis-type and not register the backlight interface based on that as there are some laptops and tablets which have their chassis-type set to "3" aka desktop. Hopefully the combination of checking the LCD flag, but only on devices with a desktop(ish) chassis-type will avoid the needs for DMI quirks for this, or at least limit the amount of DMI quirks which we need to a minimum. Fixes: 5928c281524f (ACPI / video: Default lcd_only to true on Win8-ready and newer machines) Reported-and-tested-by: James Hogan <jhogan@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: 4.15+ <stable@vger.kernel.org> # 4.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-24	ice: Fix insufficient memory issue in ice_aq_manage_mac_read	Md Fahad Iqbal Polash	1	-5/+17
	For the MAC read operation, the device can return up to two (LAN and WoL) MAC addresses. Without access to adequate memory, the device will return an error. Fixed this by allocating the right amount of memory. Also, logic to detect and copy the LAN MAC address into the port_info structure has been added. Note that the WoL MAC address is ignored currently as the WoL feature isn't supported yet. Fixes: dc49c7723676 ("ice: Get MAC/PHY/link info and scheduler topology") Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24	RISC-V: build vdso-dummy.o with -no-pie	Aurelien Jarno	1	-1/+1
	Debian toolcahin defaults to PIE, and I guess that will also be the case of most distributions. This causes the following build failure: AS arch/riscv/kernel/vdso/getcpu.o AS arch/riscv/kernel/vdso/flush_icache.o VDSOLD arch/riscv/kernel/vdso/vdso.so.dbg OBJCOPY arch/riscv/kernel/vdso/vdso.so AS arch/riscv/kernel/vdso/vdso.o VDSOLD arch/riscv/kernel/vdso/vdso-dummy.o LD arch/riscv/kernel/vdso/vdso-syms.o riscv64-linux-gnu-ld: attempted static link of dynamic object `arch/riscv/kernel/vdso/vdso-dummy.o' make[2]: * [arch/riscv/kernel/vdso/Makefile:43: arch/riscv/kernel/vdso/vdso-syms.o] Error 1 make[1]: * [scripts/Makefile.build:575: arch/riscv/kernel/vdso] Error 2 make: *** [Makefile:1018: arch/riscv/kernel] Error 2 While the root Makefile correctly passes "-fno-PIE" to build individual object files, the RISC-V kernel also builds vdso-dummy.o as an executable, which is therefore linked as PIE. Fix that by updating this specific link rule to also include "-no-pie". Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24	riscv: there is no <asm/handle_irq.h>	Christoph Hellwig	1	-1/+0
	So don't list it as generic-y. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24	riscv: select DMA_DIRECT_OPS instead of redefining it	Christoph Hellwig	1	-3/+1
	DMA_DIRECT_OPS is defined in lib/Kconfig, so don't duplicate it in arch/riscv/Kconfig. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24	sfc: ARFS filter IDs	Edward Cree	6	-46/+337
	Associate an arbitrary ID with each ARFS filter, allowing to properly query for expiry. The association is maintained in a hash table, which is protected by a spinlock. v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot). v2: fixed uninitialised variable (thanks davem and lkp-robot). Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24	net: ethtool: Add missing kernel doc for FEC parameters	Florian Fainelli	1	-0/+2
	While adding support for ethtool::get_fecparam and set_fecparam, kernel doc for these functions was missed, add those. Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24	packet: fix bitfield update race	Willem de Bruijn	2	-21/+49
	Updates to the bitfields in struct packet_sock are not atomic. Serialize these read-modify-write cycles. Move po->running into a separate variable. Its writes are protected by po->bind_lock (except for one startup case at packet_create). Also replace a textual precondition warning with lockdep annotation. All others are set only in packet_setsockopt. Serialize these updates by holding the socket lock. Analogous to other field updates, also hold the lock when testing whether a ring is active (pg_vec). Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg") Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24	ice: Do not check INTEVENT bit for OICR interrupts	Ben Shelton	2	-6/+0
	According to the hardware spec, checking the INTEVENT bit isn't a reliable way to detect if an OICR interrupt has occurred. This is because this bit can be cleared by the hardware/firmware before the interrupt service routine has run. So instead, just check for OICR events every time. Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24	random: fix possible sleeping allocation from irq context	Theodore Ts'o	1	-1/+8
	We can do a sleeping allocation from an irq context when CONFIG_NUMA is enabled. Fix this by initializing the NUMA crng instances in a workqueue. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+9de458f6a5e713ee8c1a@syzkaller.appspotmail.com Fixes: 8ef35c866f8862df ("random: set up the NUMA crng instances...") Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-04-24	ice: Fix incorrect comment for action type	Anirudh Venkataramanan	1	-1/+1
	Action type 5 defines large action generic values. Fix comment to reflect that better. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24	ice: Fix initialization for num_nodes_added	Anirudh Venkataramanan	1	-2/+2
	ice_sched_add_nodes_to_layer is used recursively, and so we start with num_nodes_added being 0. This way, in case of an error or if num_nodes is NULL, the function just returns 0 to indicate that no nodes were added. Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24	igb: Fix the transmission mode of queue 0 for Qav mode	Vinicius Costa Gomes	1	-1/+16
	When Qav mode is enabled, queue 0 should be kept on Stream Reservation mode. From the i210 datasheet, section 8.12.19: "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to Qav." ("QueueMode 1b" represents the Stream Reservation mode) The solution is to give queue 0 the all the credits it might need, so it has priority over queue 1. A situation where this can happen is when cbs is "installed" only on queue 1, leaving queue 0 alone. For example: $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24	ixgbevf: ensure xdp_ring resources are free'd on error exit	Colin Ian King	1	-1/+1
	The current error handling for failed resource setup for xdp_ring data is a break out of the loop and returning 0 indicated everything was OK, when in fact it is not. Fix this by exiting via the error exit label err_setup_tx that will clean up the resources correctly and return and error status. Detected by CoverityScan, CID#1466879 ("Logically dead code") Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24	team: fix netconsole setup over team	Xin Long	1	-7/+12
	The same fix in Commit dbe173079ab5 ("bridge: fix netconsole setup over bridge") is also needed for team driver. While at it, remove the unnecessary parameter *team from team_port_enable_netpoll(). v1->v2: - fix it in a better way, as does bridge. Fixes: 0fb52a27a04a ("team: cleanup netpoll clode") Reported-by: João Avelino Bellomo Filho <jbellomo@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24	ACPI / button: make module loadable when booted in non-ACPI mode	Ard Biesheuvel	1	-1/+23
	Modules such as nouveau.ko and i915.ko have a link time dependency on acpi_lid_open(), and due to its use of acpi_bus_register_driver(), the button.ko module that provides it is only loadable when booted in ACPI mode. However, the ACPI button driver can be built into the core kernel as well, in which case the dependency can always be satisfied, and the dependent modules can be loaded regardless of whether the system was booted in ACPI mode or not. So let's fix this asymmetry by making the ACPI button driver loadable as a module even if not booted in ACPI mode, so it can provide the acpi_lid_open() symbol in the same way as when built into the kernel. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [ rjw: Minor adjustments of comments, whitespace and names. ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-24	cpufreq: brcmstb-avs-cpufreq: remove development debug support	Markus Mayer	2	-332/+1
	This debug code was helpful while developing the driver, but it isn't being used for anything anymore. Signed-off-by: Markus Mayer <mmayer@broadcom.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-24	ACPI / watchdog: Prefer iTCO_wdt on Lenovo Z50-70	Mika Westerberg	1	-10/+49
	WDAT table on Lenovo Z50-70 is using RTC SRAM (ports 0x70 and 0x71) to store state of the timer. This conflicts with Linux RTC driver (rtc-cmos.c) who fails to reserve those ports for itself preventing RTC from functioning. In addition the WDAT table seems not to be fully functional because it does not reset the system when the watchdog times out. On this system iTCO_wdt works just fine so we simply prefer to use it instead of WDAT. This makes RTC working again and also results working watchdog via iTCO_wdt. Reported-by: Peter Milley <pbmilley@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199033 Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-23	amd-xgbe: Only use the SFP supported transceiver signals	Tom Lendacky	1	-17/+54
	The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.) that it supports. Update the driver to include checking the eeprom data when deciding whether to use a transceiver signal. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	amd-xgbe: Improve KR auto-negotiation and training	Tom Lendacky	7	-4/+160
	Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks to improve the ability to successfully complete Clause 73 AN when running at 10gbps. Hardware can sometimes have issues with CDR lock when the AN DME page exchange is being performed. The AN and KR training hooks are used as follows: - The pre AN hook is used to disable CDR tracking in the PHY so that the DME page exchange can be successfully and consistently completed. - The post KR training hook is used to re-enable the CDR tracking so that KR training can successfully complete. - The post AN hook is used to check for an unsuccessful AN which will increase a CDR tracking enablement delay (up to a maximum value). Add two debugfs entries to allow control over use of the CDR tracking workaround. The debugfs entries allow the CDR tracking workaround to be disabled and determine whether to re-enable CDR tracking before or after link training has been initiated. Also, with these changes the receiver reset cycle that is performed during the link status check can be performed less often. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	amd-xgbe: Add pre/post auto-negotiation phy hooks	Tom Lendacky	2	-2/+19
	Add hooks to the driver auto-negotiation (AN) flow to allow the different phy implementations to perform any steps necessary to improve AN. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	pppoe: check sockaddr length in pppoe_connect()	Guillaume Nault	1	-0/+4
	We must validate sockaddr_len, otherwise userspace can pass fewer data than we expect and we end up accessing invalid data. Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers") Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	l2tp: check sockaddr length in pppol2tp_connect()	Guillaume Nault	1	-0/+7
	Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that it actually points to valid data. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	net: phy: marvell: clear wol event before setting it	Jingju Hou	1	-0/+9
	If WOL event happened once, the LED[2] interrupt pin will not be cleared unless we read the CSISR register. If interrupts are in use, the normal interrupt handling will clear the WOL event. Let's clear the WOL event before enabling it if !phy_interrupt_is_valid(). Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy	Eric Dumazet	1	-0/+2
	KMSAN reported use of uninit-value that I tracked to lack of proper size check on RTA_TABLE attribute. I also believe RTA_PREFSRC lacks a similar check. Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config") Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave	Xin Long	1	-2/+1
	After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set. However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo. One way to reproduce it: # modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole The netpoll won't really work. This patch is to remove that slave_dev npinfo setting in bond_enslave(). Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	tcp: don't read out-of-bounds opsize	Jann Horn	1	-5/+2
	The old code reads the "opsize" variable from out-of-bounds memory (first byte behind the segment) if a broken TCP segment ends directly after an opcode that is neither EOL nor NOP. The result of the read isn't used for anything, so the worst thing that could theoretically happen is a pagefault; and since the physmap is usually mostly contiguous, even that seems pretty unlikely. The following C reproducer triggers the uninitialized read - however, you can't actually see anything happen unless you put something like a pr_warn() in tcp_parse_md5sig_option() to print the opsize. ==================================== #define _GNU_SOURCE #include <arpa/inet.h> #include <stdlib.h> #include <errno.h> #include <stdarg.h> #include <net/if.h> #include <linux/if.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/in.h> #include <linux/if_tun.h> #include <err.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <assert.h> void systemf(const char command, ...) { char full_command; va_list ap; va_start(ap, command); if (vasprintf(&full_command, command, ap) == -1) err(1, "vasprintf"); va_end(ap); printf("systemf: <<<%s>>>\n", full_command); system(full_command); } char devname; int tun_alloc(char name) { int fd = open("/dev/net/tun", O_RDWR); if (fd == -1) err(1, "open tun dev"); static struct ifreq req = { .ifr_flags = IFF_TUN\|IFF_NO_PI }; strcpy(req.ifr_name, name); if (ioctl(fd, TUNSETIFF, &req)) err(1, "TUNSETIFF"); devname = req.ifr_name; printf("device name: %s\n", devname); return fd; } #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24)) void sum_accumulate(unsigned int sum, void data, int len) { assert((len&2)==0); for (int i=0; i<len/2; i++) { sum += ntohs(((unsigned short )data)[i]); } } unsigned short sum_final(unsigned int sum) { sum = (sum >> 16) + (sum & 0xffff); sum = (sum >> 16) + (sum & 0xffff); return htons(~sum); } void fix_ip_sum(struct iphdr ip) { unsigned int sum = 0; sum_accumulate(&sum, ip, sizeof(ip)); ip->check = sum_final(sum); } void fix_tcp_sum(struct iphdr ip, struct tcphdr tcp) { unsigned int sum = 0; struct { unsigned int saddr; unsigned int daddr; unsigned char pad; unsigned char proto_num; unsigned short tcp_len; } fakehdr = { .saddr = ip->saddr, .daddr = ip->daddr, .proto_num = ip->protocol, .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl4) }; sum_accumulate(&sum, &fakehdr, sizeof(fakehdr)); sum_accumulate(&sum, tcp, tcp->doff4); tcp->check = sum_final(sum); } int main(void) { int tun_fd = tun_alloc("inject_dev%d"); systemf("ip link set %s up", devname); systemf("ip addr add 192.168.42.1/24 dev %s", devname); struct { struct iphdr ip; struct tcphdr tcp; unsigned char tcp_opts[20]; } __attribute__((packed)) syn_packet = { .ip = { .ihl = sizeof(struct iphdr)/4, .version = 4, .tot_len = htons(sizeof(syn_packet)), .ttl = 30, .protocol = IPPROTO_TCP, /* FIXUP check / .saddr = IPADDR(192,168,42,2), .daddr = IPADDR(192,168,42,1) }, .tcp = { .source = htons(1), .dest = htons(1337), .seq = 0x12345678, .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4, .syn = 1, .window = htons(64), .check = 0 /FIXUP/ }, .tcp_opts = { / INVALID: trailing MD5SIG opcode after NOPs */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 19 } }; fix_ip_sum(&syn_packet.ip); fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp); while (1) { int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet)); if (write_res != sizeof(syn_packet)) err(1, "packet write failed"); } } ==================================== Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23	dma-mapping: postpone cpu addr translation on mmap	Jacopo Mondi	1	-4/+2
	Postpone calling virt_to_page() translation on memory locations not guaranteed to be backed by a struct page. Try first to map memory from the device coherent memory pool, then perform translation if that fails. On some architectures, specifically SH when configured with the SPARSEMEM memory model, assuming a struct page is always assigned to a memory address lead to unexpected hangs during the virtual to page address translation. This patch fixes that specific issue but applies in the general case too. Suggested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-04-23	dma-coherent: clarify dma_mmap_from_dev_coherent documentation	Robin Murphy	1	-2/+3
	The use of "correctly mapped" here is misleading, since it can give the wrong expectation in the case that the memory should have been mapped from the per-device pool, but doing so failed for other reasons. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-04-23	dma-direct: don't retry allocation for no-op GFP_DMA	Takashi Iwai	1	-1/+2
	When an allocation with lower dma_coherent mask fails, dma_direct_alloc() retries the allocation with GFP_DMA. But, this is useless for architectures that hav no ZONE_DMA. Fix it by adding the check of CONFIG_ZONE_DMA before retrying the allocation. Fixes: 95f183916d4b ("dma-direct: retry allocations using GFP_DMA for small masks") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-04-23	PCI / PM: Do not clear state_saved in pci_pm_freeze() when smart suspend is set	Mika Westerberg	1	-2/+3
	If a driver uses DPM_FLAG_SMART_SUSPEND and the device is already runtime suspended when hibernate is started PCI core skips runtime resuming the device but still clears pci_dev->state_saved. After the hibernation image is written pci_pm_thaw_noirq() makes sure subsequent thaw phases for the device are also skipped leaving it runtime suspended with pci_dev->state_saved == false. When the device is eventually runtime resumed pci_pm_runtime_resume() restores config space by calling pci_restore_standard_config(), however because pci_dev->state_saved == false pci_restore_state() never actually restores the config space leaving the device in a state that is not what the driver might expect. For example here is what happens for intel-lpss I2C devices once the hibernation snapshot is taken: intel-lpss 0000:00:15.0: power state changed by ACPI to D0 intel-lpss 0000:00:1e.0: power state changed by ACPI to D3cold video LNXVIDEO:00: Restoring backlight state PM: hibernation exit i2c_designware i2c_designware.1: Unknown Synopsys component type: 0xffffffff i2c_designware i2c_designware.0: Unknown Synopsys component type: 0xffffffff i2c_designware i2c_designware.1: timeout in disabling adapter i2c_designware i2c_designware.0: timeout in disabling adapter Since PCI config space is not restored the device is still in D3hot making MMIO register reads return 0xffffffff. Fix this by clearing pci_dev->state_saved only if we actually end up runtime resuming the device. Fixes: c4b65157aeef (PCI / PM: Take SMART_SUSPEND driver flag into account) Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: 4.15+ <stable@vger.kernel.org> # 4.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-23	ACPI / scan: Initialize watchdog before PNP	Mika Westerberg	1	-1/+1
	At least on one Dell system the PNP motherboard resources device includes resources used by WDAT table. Since PNP gets initialized before WDAT it results following error and no watchdog: platform wdat_wdt: failed to claim resource 3: [io 0x046a-0x046c] ACPI: watchdog: Device creation failed: -16 Now, the PNP system driver is already accustomed with the situation that it cannot reserve all those motherboard resources because drivers using those might have reserved them already. In addition putting WDAT table resources under motherboard resources device makes sense in general. Fix this by initializing WDAT right before PNP. This allows WDAT to reserve all its resources and still keeps PNP system driver happy. Reported-by: Shubhrata.Priyadarsh@dell.com Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-23	ACPI / PM: Blacklist Low Power S0 Idle _DSM for ThinkPad X1 Tablet(2016)	Chen Yu	1	-0/+13
	ThinkPad X1 Tablet(2016) is reported to have issues with the Low Power S0 Idle _DSM interface and since this machine model generally can do ACPI S3 just fine, and user would like to use S3 as default sleep model, add a blacklist entry to disable that interface for ThinkPad X1 Tablet(2016). Link: https://bugzilla.kernel.org/show_bug.cgi?id=199057 Reported-and-tested-by: Robin Lee <robinlee.sysu@gmail.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-23	s390: correct module section names for expoline code revert	Martin Schwidefsky	1	-2/+2
	The main linker script vmlinux.lds.S for the kernel image merges the expoline code patch tables into two section ".nospec_call_table" and ".nospec_return_table". This is not done for the modules, there the sections retain their original names as generated by gcc: ".s390_indirect_call", ".s390_return_mem" and ".s390_return_reg". The module_finalize code has to check for the compiler generated section names, otherwise no code patching is done. This slows down the module code in case of "spectre_v2=off". Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-23	vfio: ccw: process ssch with interrupts disabled	Cornelia Huck	1	-7/+12
	When we call ssch, an interrupt might already be pending once we return from the START SUBCHANNEL instruction. Therefore we need to make sure interrupts are disabled while holding the subchannel lock until after we're done with our processing. Cc: stable@vger.kernel.org #v4.12+ Reviewed-by: Dong Jia Shi <bjsdjshi@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com> Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-23	s390: update sampling tag after task pid change	Martin Schwidefsky	2	-0/+13
	In a multi-threaded program any thread can call execve(). If this is not done by the thread group leader, the de_thread() function replaces the pid of the task that calls execve() with the pid of thread group leader. If the task reaches user space again without going over __switch_to() the sampling tag is still set to the old pid. Define the arch_setup_new_exec function to verify the task pid and udpate the tag with LPP if it has changed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-23	s390/cpum_cf: rename IBM z13/z14 counter names	André Wild	1	-4/+4
	Change the IBM z13/z14 counter names to be in sync with all other models. Cc: stable@vger.kernel.org # v4.12+ Fixes: 3593eb944c ("s390/cpum_cf: add hardware counter support for IBM z14") Fixes: 3fc7acebae ("s390/cpum_cf: add IBM z13 counter event names") Signed-off-by: André Wild <wild@linux.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-23	s390/dasd: fix IO error for newly defined devices	Stefan Haberland	1	-2/+11
	When a new CKD storage volume is defined at the storage server, Linux may be relying on outdated information about that volume, which leads to the following errors: 1. Command Reject Errors for minidisk on z/VM: dasd-eckd.b3193d: 0.0.XXXX: An error occurred in the DASD device driver, reason=09 dasd(eckd): I/O status report for device 0.0.XXXX: dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:02 CS:00 RC:0 dasd(eckd): device 0.0.2046: Failing CCW: 00000000XXXXXXXX dasd(eckd): Sense(hex) 0- 7: 80 00 00 00 00 00 00 00 dasd(eckd): Sense(hex) 8-15: 00 00 00 00 00 00 00 00 dasd(eckd): Sense(hex) 16-23: 00 00 00 00 e1 00 0f 00 dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 00 00 00 dasd(eckd): 24 Byte: 0 MSG 0, no MSGb to SYSOP 2. Equipment Check errors on LPAR or for dedicated devices on z/VM: dasd(eckd): I/O status report for device 0.0.XXXX: dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:0E CS:40 fcxs:01 schxs:00 RC:0 dasd(eckd): device 0.0.9713: Failing TCW: 00000000XXXXXXXX dasd(eckd): Sense(hex) 0- 7: 10 00 00 00 13 58 4d 0f dasd(eckd): Sense(hex) 8-15: 67 00 00 00 00 00 00 04 dasd(eckd): Sense(hex) 16-23: e5 18 05 33 97 01 0f 0f dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 04 58 0d dasd(eckd): 24 Byte: 0 MSG f, no MSGb to SYSOP Fix this problem by using the up-to-date information provided during online processing via the device specific SNEQ to detect the case of outdated LCU data. If there is a difference, perform a re-read of that data. Cc: stable@vger.kernel.org Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>