aboutsummaryrefslogtreecommitdiffstats
path: root/include/uapi/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-02-24net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.Martin Varghese1-0/+11
The Bareudp tunnel module provides a generic L3 encapsulation tunnelling module for tunnelling different protocols like MPLS, IP,NSH etc inside a UDP tunnel. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24media: atmel: atmel-isc-base: expose white balance as v4l2 controlsEugen Hristev1-0/+6
This exposes the white balance configuration of the ISC as v4l2 controls into userspace. There are 8 controls available: 4 gain controls, sliders, for each of the BAYER components: R, B, GR, GB. These gains are multipliers for each component, in format unsigned 0:4:9 with a default value of 512 (1.0 multiplier). 4 offset controls, sliders, for each of the BAYER components: R, B, GR, GB. These offsets are added/substracted from each component, in format signed 1:12:0 with a default value of 0 (+/- 0) To expose this to userspace, added 8 custom controls, in an auto cluster. To summarize the functionality: The auto cluster switch is the auto white balance control, and it works like this: AWB == 1: autowhitebalance is on, the do_white_balance button is inactive, the gains/offsets are inactive, but volatile and readable. Thus, the results of the whitebalance algorithm are available to userspace to read at any time. AWB == 0: autowhitebalance is off, cluster is in manual mode, user can configure the gain/offsets directly. More than that, if the do_white_balance button is pressed, the driver will perform one-time-adjustment, (preferably with color checker card) and the userspace can read again the new values. With this feature, the userspace can save the coefficients and reinstall them for example after reboot or reprobing the driver. [hverkuil: fix checkpatch warning] [hverkuil: minor spacing adjustments in the functionality description] Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-02-24nl80211: Add support to configure TID specific RTSCTS configurationTamizh chelvam1-0/+4
This patch adds support to configure per TID RTSCTS control configuration to enable/disable through the NL80211_TID_CONFIG_ATTR_RTSCTS_CTRL attribute. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-5-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: Add support to configure TID specific AMPDU configurationTamizh chelvam1-0/+4
This patch adds support to configure per TID AMPDU control configuration to enable/disable aggregation through the NL80211_TID_CONFIG_ATTR_AMPDU_CTRL attribute. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-4-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: Add support to configure TID specific retry configurationTamizh chelvam1-0/+12
This patch adds support to configure per TID retry configuration through the NL80211_TID_CONFIG_ATTR_RETRY_SHORT and NL80211_TID_CONFIG_ATTR_RETRY_LONG attributes. This TID specific retry configuration will have more precedence than phy level configuration. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-3-git-send-email-tamizhr@codeaurora.org [rebase completely on top of my previous API changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: modify TID-config APIJohannes Berg1-22/+29
Make some changes to the TID-config API: * use u16 in nl80211 (only, and restrict to using 8 bits for now), to avoid issues in the future if we ever want to use higher TIDs. * reject empty TIDs mask (via netlink policy) * change feature advertising to not use extended feature flags but have own mechanism for this, which simplifies the code * fix all variable names from 'tid' to 'tids' since it's a mask * change to cfg80211_ name prefixes, not ieee80211_ * fix some minor docs/spelling things. Change-Id: Ia234d464b3f914cdeab82f540e018855be580dce Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: Add NL command to support TID speicific configurationsTamizh chelvam1-0/+71
Add the new NL80211_CMD_SET_TID_CONFIG command to support data TID specific configuration. Per TID configuration is passed in the nested NL80211_ATTR_TID_CONFIG attribute. This patch adds support to configure per TID noack policy through the NL80211_TID_CONFIG_ATTR_NOACK attribute. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-2-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: Support key configuration for Beacon protection (BIGTK)Jouni Malinen1-0/+6
IEEE P802.11-REVmd/D3.0 adds support for protecting Beacon frames using a new set of keys (BIGTK; key index 6..7) similarly to the way group-addressed Robust Management frames are protected (IGTK; key index 4..5). Extend cfg80211 and nl80211 to allow the new BIGTK to be configured. Add an extended feature flag to indicate driver support for the new key index values to avoid array overflows in driver implementations and also to indicate to user space when this functionality is available. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Link: https://lore.kernel.org/r/20200222132548.20835-2-jouni@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24Revert "nl80211: add src and dst addr attributes for control port tx/rx"Johannes Berg1-15/+1
This reverts commit 8c3ed7aa2b9ef666195b789e9b02e28383243fa8. As Jouni points out, there's really no need for this, since the RSN pre-authentication frames are normal data frames, not port control frames (locally). We can still revert this now since it hasn't actually gone beyond -next. Fixes: 8c3ed7aa2b9e ("nl80211: add src and dst addr attributes for control port tx/rx") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200224101910.b746e263287a.I9eb15d6895515179d50964dec3550c9dc784bb93@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-1/+24
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-02-21 The following pull-request contains BPF updates for your *net-next* tree. We've added 25 non-merge commits during the last 4 day(s) which contain a total of 33 files changed, 2433 insertions(+), 161 deletions(-). The main changes are: 1) Allow for adding TCP listen sockets into sock_map/hash so they can be used with reuseport BPF programs, from Jakub Sitnicki. 2) Add a new bpf_program__set_attach_target() helper for adding libbpf support to specify the tracepoint/function dynamically, from Eelco Chaudron. 3) Add bpf_read_branch_records() BPF helper which helps use cases like profile guided optimizations, from Daniel Xu. 4) Enable bpf_perf_event_read_value() in all tracing programs, from Song Liu. 5) Relax BTF mandatory check if only used for libbpf itself e.g. to process BTF defined maps, from Andrii Nakryiko. 6) Move BPF selftests -mcpu compilation attribute from 'probe' to 'v3' as it has been observed that former fails in envs with low memlock, from Yonghong Song. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller5-30/+40
Conflict resolution of ice_virtchnl_pf.c based upon work by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21Merge tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds1-8/+8
Pull USB/Thunderbolt fixes from Greg KH: "Here are a number of small USB driver fixes for 5.6-rc3. Included in here are: - MAINTAINER file updates - USB gadget driver fixes - usb core quirk additions and fixes for regressions - xhci driver fixes - usb serial driver id additions and fixes - thunderbolt bugfix Thunderbolt patches come in through here now that USB4 is really thunderbolt. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits) USB: misc: iowarrior: add support for the 100 device thunderbolt: Prevent crash if non-active NVMem file is read usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format USB: misc: iowarrior: add support for the 28 and 28L devices USB: misc: iowarrior: add support for 2 OEMed devices USB: Fix novation SourceControl XL after suspend xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2 Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables" MAINTAINERS: Sort entries in database for THUNDERBOLT usb: dwc3: debug: fix string position formatting mixup with ret and len usb: gadget: serial: fix Tx stall after buffer overflow usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows usb: dwc2: Fix in ISOC request length checking usb: gadget: composite: Support more than 500mA MaxPower usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus usb: gadget: u_audio: Fix high-speed max packet size usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields USB: core: clean up endpoint-descriptor parsing USB: quirks: blacklist duplicate ep on Sound Devices USBPre2 ...
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-10/+18
Pull networking fixes from David Miller: 1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from Cong Wang. 2) Fix deadlock in xsk by publishing global consumer pointers when NAPI is finished, from Magnus Karlsson. 3) Set table field properly to RT_TABLE_COMPAT when necessary, from Jethro Beekman. 4) NLA_STRING attributes are not necessary NULL terminated, deal wiht that in IFLA_ALT_IFNAME. From Eric Dumazet. 5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov. 6) Handle mtu==0 devices properly in wireguard, from Jason A. Donenfeld. 7) Fix several lockdep warnings in bonding, from Taehee Yoo. 8) Fix cls_flower port blocking, from Jason Baron. 9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen. 10) Fix RDMA race in qede driver, from Michal Kalderon. 11) Fix several false lockdep warnings by adding conditions to list_for_each_entry_rcu(), from Madhuparna Bhowmik. 12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen. 13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song. 14) Hey, variables declared in switch statement before any case statements are not initialized. I learn something every day. Get rids of this stuff in several parts of the networking, from Kees Cook. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs. bnxt_en: Improve device shutdown method. net: netlink: cap max groups which will be considered in netlink_bind() net: thunderx: workaround BGX TX Underflow issue ionic: fix fw_status read net: disable BRIDGE_NETFILTER by default net: macb: Properly handle phylink on at91rm9200 s390/qeth: fix off-by-one in RX copybreak check s390/qeth: don't warn for napi with 0 budget s390/qeth: vnicc Fix EOPNOTSUPP precedence openvswitch: Distribute switch variables for initialization net: ip6_gre: Distribute switch variables for initialization net: core: Distribute switch variables for initialization udp: rehash on disconnect net/tls: Fix to avoid gettig invalid tls record bpf: Fix a potential deadlock with bpf_map_do_batch bpf: Do not grab the bucket spinlock by default on htab batch ops ice: Wait for VF to be reset/ready before configuration ice: Don't tell the OS that link is going down ice: Don't reject odd values of usecs set by user ...
2020-02-21include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swapChristian Borntraeger1-2/+2
QEMU has a funny new build error message when I use the upstream kernel headers: CC block/file-posix.o In file included from /home/cborntra/REPOS/qemu/include/qemu/timer.h:4, from /home/cborntra/REPOS/qemu/include/qemu/timed-average.h:29, from /home/cborntra/REPOS/qemu/include/block/accounting.h:28, from /home/cborntra/REPOS/qemu/include/block/block_int.h:27, from /home/cborntra/REPOS/qemu/block/file-posix.c:30: /usr/include/linux/swab.h: In function `__swab': /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:34: error: "sizeof" is not defined, evaluates to 0 [-Werror=undef] 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^~~~~~ /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:41: error: missing binary operator before token "(" 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^ cc1: all warnings being treated as errors make: *** [/home/cborntra/REPOS/qemu/rules.mak:69: block/file-posix.o] Error 1 rm tests/qemu-iotests/socket_scm_helper.o This was triggered by commit d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h"). That patch is doing #include <asm/bitsperlong.h> but it uses BITS_PER_LONG. The kernel file asm/bitsperlong.h provide only __BITS_PER_LONG. Let us use the __ variant in swap.h Link: http://lkml.kernel.org/r/20200213142147.17604-1-borntraeger@de.ibm.com Fixes: d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Allison Randal <allison@lohutok.net> Cc: Joe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: hide timeval/timespec/itimerval/itimerspec typesArnd Bergmann1-10/+12
There are no in-kernel users remaining, but there may still be users that include linux/time.h instead of sys/time.h from user space, so leave the types available to user space while hiding them from kernel space. Only the __kernel_old_* versions of these types remain now. Link: http://lkml.kernel.org/r/20200110154232.4104492-4-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-20PCI: pciehp: Disable in-band presence detect when possibleAlexandru Gagniuc1-0/+2
The presence detect state (PDS) is normally a logical OR of in-band and out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to disable in-band presence so that the PDS bit always reflects the state of the out-of-band presence. The recommendation of the PCIe spec is to disable in-band presence whenever supported (PCIe r5.0, appendix I implementation note): Due to architectural issues, the in-band (Physical-Layer-based) portion of the PD mechanism is deprecated for use with async hot-plug. One issue is that in-band PD as architected does not detect adapter removal during certain LTSSM states, notably the L1 and Disabled States. Another issue is that when both in-band and OOB PD are being used together, the Presence Detect State bit and its associated interrupt mechanism always reflect the logical OR of the inband and OOB PD states, and with some hot-plug hardware configurations, it is important for software to detect and respond to in-band and OOB PD events independently. If OOB PD is being used and the associated DSP supports In-Band PD Disable, it is recommended that the In-Band PD Disable bit be Set, and the Presence Detect State bit and its associated interrupt mechanism be used exclusively for OOB PD. As a substitute for in-band PD with async hot-plug, the reference model uses either the DPC or the DLL Link Active mechanism. Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com [bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD value (suggested by Lukas)] Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
2020-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-9/+7
Alexei Starovoitov says: ==================== pull-request: bpf 2020-02-19 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 10 day(s) which contain a total of 10 files changed, 93 insertions(+), 31 deletions(-). The main changes are: 1) batched bpf hashtab fixes from Brian and Yonghong. 2) various selftests and libbpf fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19bpf: Add bpf_read_branch_records() helperDaniel Xu1-1/+24
Branch records are a CPU feature that can be configured to record certain branches that are taken during code execution. This data is particularly interesting for profile guided optimizations. perf has had branch record support for a while but the data collection can be a bit coarse grained. We (Facebook) have seen in experiments that associating metadata with branch records can improve results (after postprocessing). We generally use bpf_probe_read_*() to get metadata out of userspace. That's why bpf support for branch records is useful. Aside from this particular use case, having branch data available to bpf progs can be useful to get stack traces out of userspace applications that omit frame pointers. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
2020-02-18ethtool: Add support for low latency RS FECAya Levin1-1/+3
Add support for low latency Reed Solomon FEC as LLRS. The LL-FEC is defined by the 25G/50G ethernet consortium, in the document titled "Low Latency Reed Solomon Forward Error Correction" Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> CC: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
2020-02-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-1/+11
Pablo Neira Ayuso says: ==================== Netfilter fixes for net This batch contains Netfilter fixes for net: 1) Restrict hashlimit size to 1048576, from Cong Wang. 2) Check for offload flags from nf_flow_table_offload_setup(), this fixes a crash in case the hardware offload is disabled. From Florian Westphal. 3) Three preparation patches to extend the conntrack clash resolution, from Florian. 4) Extend clash resolution to deal with DNS packets from the same flow racing to set up the NAT configuration. 5) Small documentation fix in pipapo, from Stefano Brivio. 6) Remove misleading unlikely() from pipapo_refill(), also from Stefano. 7) Reduce hashlimit mutex scope, from Cong Wang. This patch is actually triggering another problem, still under discussion, another patch to fix this will follow up. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18bpf, uapi: Remove text about bpf_redirect_map() giving higher performanceToke Høiland-Jørgensen1-9/+7
The performance of bpf_redirect() is now roughly the same as that of bpf_redirect_map(). However, David Ahern pointed out that the header file has not been updated to reflect this, and still says that a significant performance increase is possible when using bpf_redirect_map(). Remove this text from the bpf_redirect_map() description, and reword the description in bpf_redirect() slightly. Also fix the 'Return' section of the bpf_redirect_map() documentation. Fixes: 1d233886dd90 ("xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200218130334.29889-1-toke@redhat.com
2020-02-17netfilter: conntrack: allow insertion of clashing entriesFlorian Westphal1-1/+11
This patch further relaxes the need to drop an skb due to a clash with an existing conntrack entry. Current clash resolution handles the case where the clash occurs between two identical entries (distinct nf_conn objects with same tuples), i.e.: Original Reply existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353 clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353 ... existing handling will discard the unconfirmed clashing entry and makes skb->_nfct point to the existing one. The skb can then be processed normally just as if the clash would not have existed in the first place. For other clashes, the skb needs to be dropped. This frequently happens with DNS resolvers that send A and AAAA queries back-to-back when NAT rules are present that cause packets to get different DNAT transformations applied, for example: -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353 -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353 In this case the A or AAAA query is dropped which incurs a costly delay during name resolution. This patch also allows this collision type: Original Reply existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353 clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.7:5353 In this case, clash is in original direction -- the reply direction is still unique. The change makes it so that when the 2nd colliding packet is received, the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed 1 second timeout and is inserted in the reply direction only. The entry is hidden from 'conntrack -L', it will time out quickly and it can be early dropped because it will never progress to the ASSURED state. To avoid special-casing the delete code path to special case the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is added so hlist_nulls_del() will work. Example: CPU A: CPU B: 1. 10.2.3.4:42 -> 10.8.8.8:53 (A) 2. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA) 3. Apply DNAT, reply changed to 10.0.0.6 4. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA) 5. Apply DNAT, reply changed to 10.0.0.7 6. confirm/commit to conntrack table, no collisions 7. commit clashing entry Reply comes in: 10.2.3.4:42 <- 10.0.0.6:5353 (A) -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42 10.2.3.4:42 <- 10.0.0.7:5353 (AAAA) -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42 The conntrack entry is deleted from table, as it has the NAT_CLASH bit set. In case of a retransmit from ORIGINAL dir, all further packets will get the DNAT transformation to 10.0.0.6. I tried to come up with other solutions but they all have worse problems. Alternatives considered were: 1. Confirm ct entries at allocation time, not in postrouting. a. will cause uneccesarry work when the skb that creates the conntrack is dropped by ruleset. b. in case nat is applied, ct entry would need to be moved in the table, which requires another spinlock pair to be taken. c. breaks the 'unconfirmed entry is private to cpu' assumption: we would need to guard all nfct->ext allocation requests with ct->lock spinlock. 2. Make the unconfirmed list a hash table instead of a pcpu list. Shares drawback c) of the first alternative. 3. Document this is expected and force users to rearrange their ruleset (e.g. by using "-m cluster" instead of "-m statistics"). nft has the 'jhash' expression which can be used instead of 'numgen'. Major drawback: doesn't fix what I consider a bug, not very realistic and I believe its reasonable to have the existing rulesets to 'just work'. 4. Document this is expected and force users to steer problematic packets to the same CPU -- this would serialize the "allocate new conntrack entry/nat table evaluation/perform nat/confirm entry", so no race can occur. Similar drawback to 3. Another advantage of this patch compared to 1) and 2) is that there are no changes to the hot path; things are handled in the udp tracker and the clash resolution path. Cc: rcu@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-16openvswitch: add TTL decrement actionMatteo Croce1-0/+7
New action to decrement TTL instead of setting it to a fixed value. This action will decrement the TTL and, in case of expired TTL, drop it or execute an action passed via a nested attribute. The default TTL expired action is to drop the packet. Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively. Tested with a corresponding change in the userspace: # ovs-dpctl dump-flows in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1 in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2 in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2 in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1 # ping -c1 192.168.0.2 -t 42 IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64 # ping -c1 192.168.0.2 -t 120 IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64 # ping -c1 192.168.0.2 -t 1 # Co-developed-by: Bindiya Kurle <bindiyakurle@gmail.com> Signed-off-by: Bindiya Kurle <bindiyakurle@gmail.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.Arjun Roy1-0/+1
This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using epoll, returning sk_err along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a spurious wakeup. Consider a multi-threaded application using epoll. A thread may awaken with EPOLLIN but another thread may already be reading. The spuriously-awoken thread does not necessarily know that another thread 'won'; rather, it may be possible that it was woken up due to the presence of an error if there is no data. A zerocopy read receiving 0 bytes thus would need to be followed up by recvmsg to be sure. Instead, we return sk_err directly with zerocopy, so the application can avoid this extra system call. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16tcp-zerocopy: Return inq along with tcp receive zerocopy.Arjun Roy1-0/+1
This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using edge-triggered epoll, returning inq along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking, since normally we would need to perform a recvmsg() call for every successful small RPC read via TCP receive zerocopy, returning inq can reduce the number of system calls performed by approximately half. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16Merge tag 'mac80211-next-for-net-next-2020-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-nextDavid S. Miller2-3/+82
Johannes Berg says: ==================== A few big new things: * 802.11 frame encapsulation offload support * more HE (802.11ax) support, including some for 6 GHz band * powersave in hwsim, for better testing Of course as usual there are various cleanups and small fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-12clone3: allow spawning processes into cgroupsChristian Brauner1-0/+5
This adds support for creating a process in a different cgroup than its parent. Callers can limit and account processes and threads right from the moment they are spawned: - A service manager can directly spawn new services into dedicated cgroups. - A process can be directly created in a frozen cgroup and will be frozen as well. - The initial accounting jitter experienced by process supervisors and daemons is eliminated with this. - Threaded applications or even thread implementations can choose to create a specific cgroup layout where each thread is spawned directly into a dedicated cgroup. This feature is limited to the unified hierarchy. Callers need to pass a directory file descriptor for the target cgroup. The caller can choose to pass an O_PATH file descriptor. All usual migration restrictions apply, i.e. there can be no processes in inner nodes. In general, creating a process directly in a target cgroup adheres to all migration restrictions. One of the biggest advantages of this feature is that CLONE_INTO_GROUP does not need to grab the write side of the cgroup cgroup_threadgroup_rwsem. This global lock makes moving tasks/threads around super expensive. With clone3() this lock is avoided. Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: cgroups@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12gpiolib: add new ioctl() for monitoring changes in line infoBartosz Golaszewski1-0/+30
Currently there is no way for user-space to be informed about changes in status of GPIO lines e.g. when someone else requests the line or its config changes. We can only periodically re-read the line-info. This is fine for simple one-off user-space tools, but any daemon that provides a centralized access to GPIO chips would benefit hugely from an event driven line info synchronization. This patch adds a new ioctl() that allows user-space processes to reuse the file descriptor associated with the character device for watching any changes in line properties. Every such event contains the updated line information. Currently the events are generated on three types of status changes: when a line is requested, when it's released and when its config is changed. The first two are self-explanatory. For the third one: this will only happen when another user-space process calls the new SET_CONFIG ioctl() as any changes that can happen from within the kernel (i.e. set_transitory() or set_debounce()) are of no interest to user-space. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2020-02-11perf/core: Add new branch sample type for HW index of raw branch recordsKan Liang1-1/+7
The low level index is the index in the underlying hardware buffer of the most recently captured taken branch which is always saved in branch_entries[0]. It is very useful for reconstructing the call stack. For example, in Intel LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. With the low level index information, perf tool may stitch the stacks of two samples. The reconstructed LBR call stack can break the HW limitation. Add a new branch sample type to retrieve low level index of raw branch records. The low level index is between -1 (unknown) and max depth which can be retrieved in /sys/devices/cpu/caps/branches. Only when the new branch sample type is set, the low level index information is dumped into the PERF_SAMPLE_BRANCH_STACK output. Perf tool should check the attr.branch_sample_type, and apply the corresponding format for PERF_SAMPLE_BRANCH_STACK samples. Otherwise, some user case may be broken. For example, users may parse a perf.data, which include the new branch sample type, with an old version perf tool (without the check). Users probably get incorrect information without any warning. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
2020-02-10usb: charger: assign specific number for enum valuePeter Chen1-8/+8
To work properly on every architectures and compilers, the enum value needs to be specific numbers. Suggested-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Peter Chen <peter.chen@nxp.com> Link: https://lore.kernel.org/r/1580537624-10179-1-git-send-email-peter.chen@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-09Merge tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefsLinus Torvalds1-0/+1
Pull new zonefs file system from Damien Le Moal: "Zonefs is a very simple file system exposing each zone of a zoned block device as a file. Unlike a regular file system with native zoned block device support (e.g. f2fs or the on-going btrfs effort), zonefs does not hide the sequential write constraint of zoned block devices to the user. As a result, zonefs is not a POSIX compliant file system. Its goal is to simplify the implementation of zoned block devices support in applications by replacing raw block device file accesses with a richer file based API, avoiding relying on direct block device file ioctls which may be more obscure to developers. One example of this approach is the implementation of LSM (log-structured merge) tree structures (such as used in RocksDB and LevelDB) on zoned block devices by allowing SSTables to be stored in a zone file similarly to a regular file system rather than as a range of sectors of a zoned device. The introduction of the higher level construct "one file is one zone" can help reducing the amount of changes needed in the application while at the same time allowing the use of zoned block devices with various programming languages other than C. Zonefs IO management implementation uses the new iomap generic code. Zonefs has been successfully tested using a functional test suite (available with zonefs userland format tool on github) and a prototype implementation of LevelDB on top of zonefs" * tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Add documentation fs: New zonefs file system
2020-02-07nl80211: add src and dst addr attributes for control port tx/rxMarkus Theil1-1/+15
When using control port over nl80211 in AP mode with pre-authentication, APs need to forward frames to other APs defined by their MAC address. Before this patch, pre-auth frames reaching user space over nl80211 control port have no longer any information about the dest attached, which can be used for forwarding to a controller or injecting the frame back to a ethernet interface over a AF_PACKET socket. Analog problems exist, when forwarding pre-auth frames from AP -> STA. This patch therefore adds the NL80211_ATTR_DST_MAC and NL80211_ATTR_SRC_MAC attributes to provide more context information when forwarding. The respective arguments are optional on tx and included on rx. Therefore unaware existing software is not affected. Software which wants to detect this feature, can do so by checking against: NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211_MAC_ADDRS Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200115125522.3755-1-markus.theil@tu-ilmenau.de [split into separate cfg80211/mac80211 patches] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-07cfg80211: Enhance the AKM advertizement to support per interface.Veerendranath Jakkam1-0/+33
Commit ab4dfa20534e ("cfg80211: Allow drivers to advertise supported AKM suites") introduces the support to advertize supported AKMs to userspace. This needs an enhancement to advertize the AKM support per interface type, specifically for the cfg80211-based drivers that implement SME and use different mechanisms to support the AKM's for each interface type (e.g., the support for SAE, OWE AKM's take different paths for such drivers on STA/AP mode). This commit aims the same and enhances the earlier mechanism of advertizing the AKMs per wiphy. Add new nl80211 attributes and data structure to provide supported AKMs per interface type to userspace. the AKMs advertized in akm_suites are default capabilities if not advertized for a specific interface type in iftype_akm_suites. Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org> Link: https://lore.kernel.org/r/20200126203032.21934-1-vjakkam@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-07cfg80211: add no HE indication to the channel flagHaim Dreyfuss1-0/+5
The regulatory domain might forbid HE operation. Certain regulatory domains may restrict it for specific channels whereas others may do it for the whole regulatory domain. Add an option to indicate it in the channel flag. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20200121081213.733757-1-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-07fs: New zonefs file systemDamien Le Moal1-0/+1
zonefs is a very simple file system exposing each zone of a zoned block device as a file. Unlike a regular file system with zoned block device support (e.g. f2fs), zonefs does not hide the sequential write constraint of zoned block devices to the user. Files representing sequential write zones of the device must be written sequentially starting from the end of the file (append only writes). As such, zonefs is in essence closer to a raw block device access interface than to a full featured POSIX file system. The goal of zonefs is to simplify the implementation of zoned block device support in applications by replacing raw block device file accesses with a richer file API, avoiding relying on direct block device file ioctls which may be more obscure to developers. One example of this approach is the implementation of LSM (log-structured merge) tree structures (such as used in RocksDB and LevelDB) on zoned block devices by allowing SSTables to be stored in a zone file similarly to a regular file system rather than as a range of sectors of a zoned device. The introduction of the higher level construct "one file is one zone" can help reducing the amount of changes needed in the application as well as introducing support for different application programming languages. Zonefs on-disk metadata is reduced to an immutable super block to persistently store a magic number and optional feature flags and values. On mount, zonefs uses blkdev_report_zones() to obtain the device zone configuration and populates the mount point with a static file tree solely based on this information. E.g. file sizes come from the device zone type and write pointer offset managed by the device itself. The zone files created on mount have the following characteristics. 1) Files representing zones of the same type are grouped together under a common sub-directory: * For conventional zones, the sub-directory "cnv" is used. * For sequential write zones, the sub-directory "seq" is used. These two directories are the only directories that exist in zonefs. Users cannot create other directories and cannot rename nor delete the "cnv" and "seq" sub-directories. 2) The name of zone files is the number of the file within the zone type sub-directory, in order of increasing zone start sector. 3) The size of conventional zone files is fixed to the device zone size. Conventional zone files cannot be truncated. 4) The size of sequential zone files represent the file's zone write pointer position relative to the zone start sector. Truncating these files is allowed only down to 0, in which case, the zone is reset to rewind the zone write pointer position to the start of the zone, or up to the zone size, in which case the file's zone is transitioned to the FULL state (finish zone operation). 5) All read and write operations to files are not allowed beyond the file zone size. Any access exceeding the zone size is failed with the -EFBIG error. 6) Creating, deleting, renaming or modifying any attribute of files and sub-directories is not allowed. 7) There are no restrictions on the type of read and write operations that can be issued to conventional zone files. Buffered, direct and mmap read & write operations are accepted. For sequential zone files, there are no restrictions on read operations, but all write operations must be direct IO append writes. mmap write of sequential files is not allowed. Several optional features of zonefs can be enabled at format time. * Conventional zone aggregation: ranges of contiguous conventional zones can be aggregated into a single larger file instead of the default one file per zone. * File ownership: The owner UID and GID of zone files is by default 0 (root) but can be changed to any valid UID/GID. * File access permissions: the default 640 access permissions can be changed. The mkzonefs tool is used to format zoned block devices for use with zonefs. This tool is available on Github at: git@github.com:damien-lemoal/zonefs-tools.git. zonefs-tools also includes a test suite which can be run against any zoned block device, including null_blk block device created with zoned mode. Example: the following formats a 15TB host-managed SMR HDD with 256 MB zones with the conventional zones aggregation feature enabled. $ sudo mkzonefs -o aggr_cnv /dev/sdX $ sudo mount -t zonefs /dev/sdX /mnt $ ls -l /mnt/ total 0 dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq The size of the zone files sub-directories indicate the number of files existing for each type of zones. In this example, there is only one conventional zone file (all conventional zones are aggregated under a single file). $ ls -l /mnt/cnv total 137101312 -rw-r----- 1 root root 140391743488 Nov 25 13:23 0 This aggregated conventional zone file can be used as a regular file. $ sudo mkfs.ext4 /mnt/cnv/0 $ sudo mount -o loop /mnt/cnv/0 /data The "seq" sub-directory grouping files for sequential write zones has in this example 55356 zones. $ ls -lv /mnt/seq total 14511243264 -rw-r----- 1 root root 0 Nov 25 13:23 0 -rw-r----- 1 root root 0 Nov 25 13:23 1 -rw-r----- 1 root root 0 Nov 25 13:23 2 ... -rw-r----- 1 root root 0 Nov 25 13:23 55354 -rw-r----- 1 root root 0 Nov 25 13:23 55355 For sequential write zone files, the file size changes as data is appended at the end of the file, similarly to any regular file system. $ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s $ ls -l /mnt/seq/0 -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0 The written file can be truncated to the zone size, preventing any further write operation. $ truncate -s 268435456 /mnt/seq/0 $ ls -l /mnt/seq/0 -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0 Truncation to 0 size allows freeing the file zone storage space and restart append-writes to the file. $ truncate -s 0 /mnt/seq/0 $ ls -l /mnt/seq/0 -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0 Since files are statically mapped to zones on the disk, the number of blocks of a file as reported by stat() and fstat() indicates the size of the file zone. $ stat /mnt/seq/0 File: /mnt/seq/0 Size: 0 Blocks: 524288 IO Block: 4096 regular empty file Device: 870h/2160d Inode: 50431 Links: 1 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-11-25 13:23:57.048971997 +0900 Modify: 2019-11-25 13:52:25.553805765 +0900 Change: 2019-11-25 13:52:25.553805765 +0900 Birth: - The number of blocks of the file ("Blocks") in units of 512B blocks gives the maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone size in this example. Of note is that the "IO block" field always indicates the minimum IO size for writes and corresponds to the device physical sector size. This code contains contributions from: * Johannes Thumshirn <jthumshirn@suse.de>, * Darrick J. Wong <darrick.wong@oracle.com>, * Christoph Hellwig <hch@lst.de>, * Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and * Ting Yao <tingyao@hust.edu.cn>. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-02-06Merge tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+5
Pull more KVM updates from Paolo Bonzini: "s390: - fix register corruption - ENOTSUPP/EOPNOTSUPP mixed - reset cleanups/fixes - selftests x86: - Bug fixes and cleanups - AMD support for APIC virtualization even in combination with in-kernel PIT or IOAPIC. MIPS: - Compilation fix. Generic: - Fix refcount overflow for zero page" * tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit x86: vmxfeatures: rename features for consistency with KVM and manual KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses KVM: x86: Fix perfctr WRMSR for running counters x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() kvm: mmu: Separate generating and setting mmio ptes kvm: mmu: Replace unsigned with unsigned int for PTE access KVM: nVMX: Remove stale comment from nested_vmx_load_cr3() KVM: MIPS: Fold comparecount_func() into comparecount_wakeup() KVM: MIPS: Fix a build error due to referencing not-yet-defined function x86/kvm: do not setup pv tlb flush when not paravirtualized KVM: fix overflow of zero page refcount with ksm running KVM: x86: Take a u64 when checking for a valid dr7 value KVM: x86: use raw clock values consistently KVM: x86: reorganize pvclock_gtod_data members KVM: nVMX: delete meaningless nested_vmx_run() declaration KVM: SVM: allow AVIC without split irqchip kvm: ioapic: Lazy update IOAPIC EOI ...
2020-02-05Merge tag 'kvm-s390-next-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEADPaolo Bonzini1-0/+5
KVM: s390: Fixes and cleanups for 5.6 - fix register corruption - ENOTSUPP/EOPNOTSUPP mixed - reset cleanups/fixes - selftests
2020-02-04Merge tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linuxLinus Torvalds1-1/+6
Pull RTC updates from Alexandre Belloni: "The VL_READ and VL_CLR ioctls have been reworked to be more useful. This will not break userspace as there are very few users and they are using the integer value as a boolean. Apart from that, two drivers were reworked and a few fixes here and there for a net reduction of number of lines. Summary: Subsystem: - the VL_READ and VL_CLR ioctls are now documented and their behavior is unified across all the drivers. - RTC_I2C_AND_SPI Kconfig option rework to avoid selecting both REGMAP_I2C and REGMAP_SPI unecessarily. Drivers: - at91rm9200: remove deprecated procfs, add sam9x60, sama5d4 and sama5d2 compatibles. - cmos: solve lost interrupts issue on MS Surface 3 - hym8563: return proper errno when time is invalid - rv3029: many fixes, nvram support" * tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (63 commits) dt-bindings: rtc: at91rm9200: document clocks property rtc: i2c/spi: Avoid inclusion of REGMAP support when not needed rtc: Kconfig: select REGMAP_I2C when necessary rtc: Kconfig: properly indent sd3078 entry rtc: cmos: Refactor code by using the new dmi_get_bios_year() helper rtc: cmos: Use predefined value for RTC IRQ on legacy x86 rtc: cmos: Stop using shared IRQ rtc: tps6586x: Use IRQ_NOAUTOEN flag rtc: at91rm9200: use FIELD_PREP/FIELD_GET rtc: at91rm9200: avoid time readout in at91_rtc_setalarm rtc: at91rm9200: move register definitions to C file rtc: at91rm9200: add sama5d4 and sama5d2 compatibles dt-bindings: rtc: at91rm9200: convert bindings to json-schema rtc: at91rm9200: remove procfs information dt-bindings: atmel, at91rm9200-rtc: add microchip, sam9x60-rtc rtc: pcf8563: Use BIT rtc: moxart: Convert to SPDX identifier rtc: ds1343: Remove unused struct spi_device in struct ds1343_priv rtc: rx8025: Remove struct i2c_client from struct rx8025_data rtc: hym8563: Read the valid flag directly instead of caching it ...
2020-02-01Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/randomLinus Torvalds1-1/+3
Pull random changes from Ted Ts'o: "Change /dev/random so that it uses the CRNG and only blocking if the CRNG hasn't initialized, instead of the old blocking pool. Also clean up archrandom.h, and some other miscellaneous cleanups" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits) s390x: Mark archrandom.h functions __must_check powerpc: Mark archrandom.h functions __must_check powerpc: Use bool in archrandom.h x86: Mark archrandom.h functions __must_check linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check linux/random.h: Use false with bool linux/random.h: Remove arch_has_random, arch_has_random_seed s390: Remove arch_has_random, arch_has_random_seed powerpc: Remove arch_has_random, arch_has_random_seed x86: Remove arch_has_random, arch_has_random_seed random: remove some dead code of poolinfo random: fix typo in add_timer_randomness() random: Add and use pr_fmt() random: convert to ENTROPY_BITS for better code readability random: remove unnecessary unlikely() random: remove kernel.random.read_wakeup_threshold random: delete code to pull data into pools random: remove the blocking pool random: make /dev/random be almost like /dev/urandom random: ignore GRND_RANDOM in getentropy(2) ...
2020-01-31Merge branch 'next' into for-linusDmitry Torokhov78-192/+1194
Prepare input updates for 5.6 merge window.
2020-01-31Merge tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds2-2/+16
Pull PCI updates from Bjorn Helgaas: "Resource management: - Improve resource assignment for hot-added nested bridges, e.g., Thunderbolt (Nicholas Johnson) Power management: - Optionally print config space of devices before suspend (Chen Yu) - Increase D3 delay for AMD Ryzen5/7 XHCI controllers (Daniel Drake) Virtualization: - Generalize DMA alias quirks (James Sewart) - Add DMA alias quirk for PLX PEX NTB (James Sewart) - Fix IOV memory leak (Navid Emamdoost) AER: - Log which device prevents error recovery (Yicong Yang) Peer-to-peer DMA: - Whitelist Intel SkyLake-E (Armen Baloyan) Broadcom iProc host bridge driver: - Apply PAXC quirk whether driver is built-in or module (Wei Liu) Broadcom STB host bridge driver: - Add Broadcom STB PCIe host controller driver (Jim Quinlan) Intel Gateway SoC host bridge driver: - Add driver for Intel Gateway SoC (Dilip Kota) Intel VMD host bridge driver: - Add support for DMA aliases on other buses (Jon Derrick) - Remove dma_map_ops overrides (Jon Derrick) - Remove now-unused X86_DEV_DMA_OPS (Christoph Hellwig) NVIDIA Tegra host bridge driver: - Fix Tegra30 afi_pex2_ctrl register offset (Marcel Ziswiler) Panasonic UniPhier host bridge driver: - Remove module code since driver can't be built as a module (Masahiro Yamada) Qualcomm host bridge driver: - Add support for SDM845 PCIe controller (Bjorn Andersson) TI Keystone host bridge driver: - Fix "num-viewport" DT property error handling (Kishon Vijay Abraham I) - Fix link training retries initiation (Yurii Monakov) - Fix outbound region mapping (Yurii Monakov) Misc: - Add Switchtec Gen4 support (Kelvin Cao) - Add Switchtec Intercomm Notify and Upstream Error Containment support (Logan Gunthorpe) - Use dma_set_mask_and_coherent() since Switchtec supports 64-bit addressing (Wesley Sheng)" * tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (60 commits) PCI: Allow adjust_bridge_window() to shrink resource if necessary PCI: Set resource size directly in adjust_bridge_window() PCI: Rename extend_bridge_window() to adjust_bridge_window() PCI: Rename extend_bridge_window() parameter PCI: Consider alignment of hot-added bridges when assigning resources PCI: Remove local variable usage in pci_bus_distribute_available_resources() PCI: Pass size + alignment to pci_bus_distribute_available_resources() PCI: Rename variables PCI: vmd: Add two VMD Device IDs PCI: Remove unnecessary braces PCI: brcmstb: Add MSI support PCI: brcmstb: Add Broadcom STB PCIe host controller driver x86/PCI: Remove X86_DEV_DMA_OPS PCI: vmd: Remove dma_map_ops overrides iommu/vt-d: Remove VMD child device sanity check iommu/vt-d: Use pci_real_dma_dev() for mapping PCI: Introduce pci_real_dma_dev() x86/PCI: Expose VMD's pci_dev in struct pci_sysdata x86/PCI: Add to_pci_sysdata() helper PCI/AER: Initialize aer_fifo ...
2020-01-31Merge tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds1-0/+29
Pull media updates from Mauro Carvalho Chehab: - New staging driver for Rockship ISPv1 unit - New staging driver for Rockchip MIPI Synopsys DPHY RX0 - y2038 fixes at V4L2 API (backward-compatible) - A dvb core fix when receiving invalid EIT sections - Some clang-specific warnings got fixed - Added support for touch V4L2 interface at vivid - Several drivers were converted to use the new i2c_new_scanned_device() kAPI - Added sm1 support at meson's vdec driver - Several other driver cleanups, fixes and improvements * tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (207 commits) media: staging/intel-ipu3: remove TODO item about acronyms media: v4l2-fwnode: Print the node name while parsing endpoints media: Revert "media: staging/intel-ipu3: make imgu use fixed running mode" media: mt9v111: constify copied structure media: platform: VIDEO_MEDIATEK_JPEG can also depend on MTK_IOMMU media: uvcvideo: Add a quirk to force GEO GC6500 Camera bits-per-pixel value media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors media: hantro: fix post-processing NULL pointer dereference media: rcar-vin: Use correct pixel format when aligning format media: MAINTAINERS: add entry for Rockchip ISP1 driver media: staging: rkisp1: add TODO file for staging media: staging: rkisp1: add document for rkisp1 meta buffer format media: staging: rkisp1: add output device for parameters media: staging: rkisp1: add capture device for statistics media: staging: rkisp1: add user space ABI definitions media: staging: rkisp1: add streaming paths media: staging: rkisp1: add Rockchip ISP1 base driver media: staging: phy-rockchip-dphy-rx0: add Rockchip MIPI Synopsys DPHY RX0 driver media: staging: dt-bindings: add Rockchip MIPI RX D-PHY RX0 yaml bindings media: staging: dt-bindings: add Rockchip ISP1 yaml bindings ...
2020-01-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-1/+11
Pull updates from Andrew Morton: "Most of -mm and quite a number of other subsystems: hotfixes, scripts, ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov. MM is fairly quiet this time. Holidays, I assume" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) kcov: ignore fault-inject and stacktrace include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc() execve: warn if process starts with executable stack reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() init/main.c: fix misleading "This architecture does not have kernel memory protection" message init/main.c: fix quoted value handling in unknown_bootoption init/main.c: remove unnecessary repair_env_string in do_initcall_level init/main.c: log arguments and environment passed to init fs/binfmt_elf.c: coredump: allow process with empty address space to coredump fs/binfmt_elf.c: coredump: delete duplicated overflow check fs/binfmt_elf.c: coredump: allocate core ELF header on stack fs/binfmt_elf.c: make BAD_ADDR() unlikely fs/binfmt_elf.c: better codegen around current->mm fs/binfmt_elf.c: don't copy ELF header around fs/binfmt_elf.c: fix ->start_code calculation fs/binfmt_elf.c: smaller code generation around auxv vector fill lib/find_bit.c: uninline helper _find_next_bit() lib/find_bit.c: join _find_next_bit{_le} uapi: rename ext2_swab() to swab() and share globally in swab.h lib/scatterlist.c: adjust indentation in __sg_alloc_table ...
2020-01-31uapi: rename ext2_swab() to swab() and share globally in swab.hYury Norov1-0/+10
ext2_swab() is defined locally in lib/find_bit.c However it is not specific to ext2, neither to bitmaps. There are many potential users of it, so rename it to just swab() and move to include/uapi/linux/swab.h ABI guarantees that size of unsigned long corresponds to BITS_PER_LONG, therefore drop unneeded cast. Link: http://lkml.kernel.org/r/20200103202846.21616-1-yury.norov@gmail.com Signed-off-by: Yury Norov <yury.norov@gmail.com> Cc: Allison Randal <allison@lohutok.net> Cc: Joe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31mm: fix comments related to node reclaimHao Lee1-1/+1
As zone reclaim has been replaced by node reclaim, this patch fixes related comments. Link: http://lkml.kernel.org/r/20191126141346.GA22665@haolee.github.io Signed-off-by: Hao Lee <haolee.swjtu@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31KVM: s390: Add new reset vcpu APIJanosch Frank1-0/+5
The architecture states that we need to reset local IRQs for all CPU resets. Because the old reset interface did not support the normal CPU reset we never did that on a normal reset. Let's implement an interface for the missing normal and clear resets and reset all local IRQs, registers and control structures as stated in the architecture. Userspace might already reset the registers via the vcpu run struct, but as we need the interface for the interrupt clearing part anyway, we implement the resets fully and don't rely on userspace to reset the rest. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/r/20200131100205.74720-4-frankja@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-01-30Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-0/+53
Pull drm updates from Davbe Airlie: "This is the main pull request for graphics for 5.6. Usual selection of changes all over. I've got one outstanding vmwgfx pull that touches mm so kept it separate until after all of this lands. I'll try and get it to you soon after this, but it might be early next week (nothing wrong with code, just my schedule is messy) This also hits a lot of fbdev drivers with some cleanups. Other notables: - vulkan timeline semaphore support added to syncobjs - nouveau turing secureboot/graphics support - Displayport MST display stream compression support Detailed summary: uapi: - dma-buf heaps added (and fixed) - command line add support for panel oreientation - command line allow overriding penguin count drm: - mipi dsi definition updates - lockdep annotations for dma_resv - remove dma-buf kmap/kunmap support - constify fb_ops in all fbdev drivers - MST fix for daisy chained hotplug- - CTA-861-G modes with VIC >= 193 added - fix drm_panel_of_backlight export - LVDS decoder support - more device based logging support - scanline alighment for dumb buffers - MST DSC helpers scheduler: - documentation fixes - job distribution improvements panel: - Logic PD type 28 panel support - Jimax8729d MIPI-DSI - igenic JZ4770 - generic DSI devicetree bindings - sony acx424AKP panel - Leadtek LTK500HD1829 - xinpeng XPP055C272 - AUO B116XAK01 - GiantPlus GPM940B0 - BOE NV140FHM-N49 - Satoz SAT050AT40H12R2 - Sharp LS020B1DD01D panels. ttm: - use blocking WW lock i915: - hw/uapi state separation - Lock annotation improvements - selftest improvements - ICL/TGL DSI VDSC support - VBT parsing improvments - Display refactoring - DSI updates + fixes - HDCP 2.2 for CFL - CML PCI ID fixes - GLK+ fbc fix - PSR fixes - GEN/GT refactor improvments - DP MST fixes - switch context id alloc to xarray - workaround updates - LMEM debugfs support - tiled monitor fixes - ICL+ clock gating programming removed - DP MST disable sequence fixed - LMEM discontiguous object maps - prefaulting for discontiguous objects - use LMEM for dumb buffers if possible - add LMEM mmap support amdgpu: - enable sync object timelines for vulkan - MST atomic routines - enable MST DSC support - add DMCUB display microengine support - DC OEM i2c support - Renoir DC fixes - Initial HDCP 2.x support - BACO support for Arcturus - Use BACO for runtime PM power save - gfxoff on navi10 - gfx10 golden updates and fixes - DCN support on POWER - GFXOFF for raven1 refresh - MM engine idle handlers cleanup - 10bpc EDP panel fixes - renoir watermark fixes - SR-IOV fixes - Arcturus VCN fixes - GDDR6 training fixes - freesync fixes - Pollock support amdkfd: - unify more codepath with amdgpu - use KIQ to setup HIQ rather than MMIO radeon: - fix vma fault handler race - PPC DMA fix - register check fixes for r100/r200 nouveau: - mmap_sem vs dma_resv fix - rewrite the ACR secure boot code for Turing - TU10x graphics engine support (TU11x pending) - Page kind mapping for turing - 10-bit LUT support - GP10B Tegra fixes - HD audio regression fix hisilicon/hibmc: - use generic fbdev code and helpers rockchip: - dsi/px30 support virtio: - fb damage support - static some functions vc4: - use dma_resv lock wrappers msm: - use dma_resv lock wrappers - sc7180 display + DSI support - a618 support - UBWC support improvements vmwgfx: - updates + new logging uapi exynos: - enable/disable callback cleanups etnaviv: - use dma_resv lock wrappers atmel-hlcdc: - clock fixes mediatek: - cmdq support - non-smooth cursor fixes - ctm property support sun4i: - suspend support - A64 mipi dsi support rcar-du: - Color management module support - LVDS encoder dual-link support - R8A77980 support analogic: - add support for an6345 ast: - atomic modeset support - primary plane garbage fix arcgpu: - fixes for fourcc handling tegra: - minor fixes and improvments mcde: - vblank support meson: - OSD1 plane AFBC commit gma500: - add pageflip support - reomve global drm_dev komeda: - tweak debugfs output - d32 support - runtime PM suppotr udl: - use generic shmem helpers - cleanup and fixes" * tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm: (1998 commits) drm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing drm/nouveau/acr: return error when registering LSF if ACR not supported drm/nouveau/disp/gv100-: not all channel types support reporting error codes drm/nouveau/disp/nv50-: prevent oops when no channel method map provided drm/nouveau: support synchronous pushbuf submission drm/nouveau: signal pending fences when channel has been killed drm/nouveau: reject attempts to submit to dead channels drm/nouveau: zero vma pointer even if we only unreference it rather than free drm/nouveau: Add HD-audio component notifier support drm/nouveau: fix build error without CONFIG_IOMMU_API drm/nouveau/kms/nv04: remove set but not used variable 'width' drm/nouveau/kms/nv50: remove set but not unused variable 'nv_connector' drm/nouveau/mmu: fix comptag memory leak drm/nouveau/gr/gp10b: Use gp100_grctx and gp100_gr_zbc drm/nouveau/pmu/gm20b,gp10b: Fix Falcon bootstrapping drm/exynos: Rename Exynos to lowercase drm/exynos: change callback names drm/mst: Don't do atomic checks over disabled managers drm/amdgpu: add the lost mutex_init back drm/amd/display: skip opp blank or unblank if test pattern enabled ...
2020-01-29Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linuxLinus Torvalds2-0/+5
Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper
2020-01-29Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-6/+67
Pull io_uring updates from Jens Axboe: - Support for various new opcodes (fallocate, openat, close, statx, fadvise, madvise, openat2, non-vectored read/write, send/recv, and epoll_ctl) - Faster ring quiesce for fileset updates - Optimizations for overflow condition checking - Support for max-sized clamping - Support for probing what opcodes are supported - Support for io-wq backend sharing between "sibling" rings - Support for registering personalities - Lots of little fixes and improvements * tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits) io_uring: add support for epoll_ctl(2) eventpoll: support non-blocking do_epoll_ctl() calls eventpoll: abstract out epoll_ctl() handler io_uring: fix linked command file table usage io_uring: support using a registered personality for commands io_uring: allow registering credentials io_uring: add io-wq workqueue sharing io-wq: allow grabbing existing io-wq io_uring/io-wq: don't use static creds/mm assignments io-wq: make the io_wq ref counted io_uring: fix refcounting with batched allocations at OOM io_uring: add comment for drain_next io_uring: don't attempt to copy iovec for READ/WRITE io_uring: honor IOSQE_ASYNC for linked reqs io_uring: prep req when do IOSQE_ASYNC io_uring: use labeled array init in io_op_defs io_uring: optimise sqe-to-req flags translation io_uring: remove REQ_F_IO_DRAINED io_uring: file switch work needs to get flushed on exit io_uring: hide uring_fd in ctx ...
2020-01-29Merge branch 'remotes/lorenzo/pci/dwc'Bjorn Helgaas1-0/+1
- Add intel-gw driver for PCIe host controller on Intel Gateway SoC (Dilip Kota) - Use shared DesignWare helpers to configure Fast Training Sequence (FTS) in artpec6 (Dilip Kota) * remotes/lorenzo/pci/dwc: PCI: artpec6: Configure FTS with dwc helper function PCI: dwc: intel: PCIe RC controller driver dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller