aboutsummaryrefslogtreecommitdiffstats
path: root/tools/testing/selftests/locking (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-05-11bpf: Add strict alignment flag for BPF_PROG_LOAD.David S. Miller6-9/+40
Add a new field, "prog_flags", and an initial flag value BPF_F_STRICT_ALIGNMENT. When set, the verifier will enforce strict pointer alignment regardless of the setting of CONFIG_EFFICIENT_UNALIGNED_ACCESS. The verifier, in this mode, will also use a fixed value of "2" in place of NET_IP_ALIGN. This facilitates test cases that will exercise and validate this part of the verifier even when run on architectures where alignment doesn't matter. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-11bpf: Do per-instruction state dumping in verifier when log_level > 1.David S. Miller1-2/+6
If log_level > 1, do a state dump every instruction and emit it in a more compact way (without a leading newline). This will facilitate more sophisticated test cases which inspect the verifier log for register state. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-11bpf: Track alignment of register values in the verifier.David S. Miller2-19/+92
Currently if we add only constant values to pointers we can fully validate the alignment, and properly check if we need to reject the program on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures. However, once an unknown value is introduced we only allow byte sized memory accesses which is too restrictive. Add logic to track the known minimum alignment of register values, and propagate this state into registers containing pointers. The most common paradigm that makes use of this new logic is computing the transport header using the IP header length field. For example: struct ethhdr *ep = skb->data; struct iphdr *iph = (struct iphdr *) (ep + 1); struct tcphdr *th; ... n = iph->ihl; th = ((void *)iph + (n * 4)); port = th->dest; The existing code will reject the load of th->dest because it cannot validate that the alignment is at least 2 once "n * 4" is added the the packet pointer. In the new code, the register holding "n * 4" will have a reg->min_align value of 4, because any value multiplied by 4 will be at least 4 byte aligned. (actually, the eBPF code emitted by the compiler in this case is most likely to use a shift left by 2, but the end result is identical) At the critical addition: th = ((void *)iph + (n * 4)); The register holding 'th' will start with reg->off value of 14. The pointer addition will transform that reg into something that looks like: reg->aux_off = 14 reg->aux_off_align = 4 Next, the verifier will look at the th->dest load, and it will see a load offset of 2, and first check: if (reg->aux_off_align % size) which will pass because aux_off_align is 4. reg_off will be computed: reg_off = reg->off; ... reg_off += reg->aux_off; plus we have off==2, and it will thus check: if ((NET_IP_ALIGN + reg_off + off) % size != 0) which evaluates to: if ((NET_IP_ALIGN + 14 + 2) % size != 0) On strict alignment architectures, NET_IP_ALIGN is 2, thus: if ((2 + 14 + 2) % size != 0) which passes. These pointer transformations and checks work regardless of whether the constant offset or the variable with known alignment is added first to the pointer register. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-11bpf, arm64: fix faulty emission of map access in tail callsDaniel Borkmann1-2/+3
Shubham was recently asking on netdev why in arm64 JIT we don't multiply the index for accessing the tail call map by 8. That led me into testing out arm64 JIT wrt tail calls and it turned out I got a NULL pointer dereference on the tail call. The buggy access is at: prog = array->ptrs[index]; if (prog == NULL) goto out; [...] 00000060: d2800e0a mov x10, #0x70 // #112 00000064: f86a682a ldr x10, [x1,x10] 00000068: f862694b ldr x11, [x10,x2] 0000006c: b40000ab cbz x11, 0x00000080 [...] The code triggering the crash is f862694b. x1 at the time contains the address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning, above we load the pointer to the program at map slot 0 into x10. x10 can then be NULL if the slot is not occupied, which we later on try to access with a user given offset in x2 that is the map index. Fix this by emitting the following instead: [...] 00000060: d2800e0a mov x10, #0x70 // #112 00000064: 8b0a002a add x10, x1, x10 00000068: d37df04b lsl x11, x2, #3 0000006c: f86b694b ldr x11, [x10,x11] 00000070: b40000ab cbz x11, 0x00000084 [...] This basically adds the offset to ptrs to the base address of the bpf array we got and we later on access the map with an index * 8 offset relative to that. The tail call map itself is basically one large area with meta data at the head followed by the array of prog pointers. This makes tail calls working again, tested on Cavium ThunderX ARMv8. Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") Reported-by: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11net: ethernet: ti: netcp_core: return error while dma channel open issueIvan Khoronzhuk2-3/+5
Fix error path while dma open channel issue. Also, no need to check output on NULL if it's never returned. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11s390/qeth: add missing hash table initializationsUrsula Braun1-0/+2
commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") added new hash tables, but missed to initialize them. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11s390/qeth: avoid null pointer dereference on OSNJulian Wiedmann1-4/+6
Access card->dev only after checking whether's its valid. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11s390/qeth: unbreak OSM and OSN supportJulian Wiedmann7-20/+51
commit b4d72c08b358 ("qeth: bridgeport support - basic control") broke the support for OSM and OSN devices as follows: As OSM and OSN are L2 only, qeth_core_probe_device() does an early setup by loading the l2 discipline and calling qeth_l2_probe_device(). In this context, adding the l2-specific bridgeport sysfs attributes via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c, since the basic sysfs infrastructure for the device hasn't been established yet. Note that OSN actually has its own unique sysfs attributes (qeth_osn_devtype), so the additional attributes shouldn't be created at all. For OSM, add a new qeth_l2_devtype that contains all the common and l2-specific sysfs attributes. When qeth_core_probe_device() does early setup for OSM or OSN, assign the corresponding devtype so that the ccwgroup probe code creates the full set of sysfs attributes. This allows us to skip qeth_l2_create_device_attributes() in case of an early setup. Any device that can't do early setup will initially have only the generic sysfs attributes, and when it's probed later qeth_l2_probe_device() adds the l2-specific attributes. If an early-setup device is removed (by calling ccwgroup_ungroup()), device_unregister() will - using the devtype - delete the l2-specific attributes before qeth_l2_remove_device() is called. So make sure to not remove them twice. What complicates the issue is that qeth_l2_probe_device() and qeth_l2_remove_device() is also called on a device when its layer2 attribute changes (ie. its layer mode is switched). For early-setup devices this wouldn't work properly - we wouldn't remove the l2-specific attributes when switching to L3. But switching the layer mode doesn't actually make any sense; we already decided that the device can only operate in L2! So just refuse to switch the layer mode on such devices. Note that OSN doesn't have a layer2 attribute, so we only need to special-case OSM. Based on an initial patch by Ursula Braun. Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11s390/qeth: handle sysfs error during initializationUrsula Braun4-3/+13
When setting up the device from within the layer discipline's probe routine, creating the layer-specific sysfs attributes can fail. Report this error back to the caller, and handle it by releasing the layer discipline. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> [jwi: updated commit msg, moved an OSN change to a subsequent patch] Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11mdio: mux: Correct mdio_mux_init error path issuesJon Mason1-5/+7
There is a potential unnecessary refcount decrement on error path of put_device(&pb->mii_bus->dev), as it is possible to avoid the of_mdio_find_bus() call if mux_bus is specified by the calling function. The same put_device() is not called in the error path if the devm_kzalloc of pb fails. This caused the variable used in the put_device() to be changed, as the pb pointer was obviously not set up. There is an unnecessary of_node_get() on child_bus_node if the of_mdiobus_register() is successful, as the for_each_available_child_of_node() automatically increments this. Thus the refcount on this node will always be +1 more than it should be. There is no of_node_put() on child_bus_node if the of_mdiobus_register() call fails. Finally, it is lacking devm_kfree() of pb in the error path. While this might not be technically necessary, it was present in other parts of the function. So, I am adding it where necessary to make it uniform. Signed-off-by: Jon Mason <jon.mason@broadcom.com> Fixes: f20e6657a875 ("mdio: mux: Enhanced MDIO mux framework for integrated multiplexers") Fixes: 0ca2997d1452 ("netdev/of/phy: Add MDIO bus multiplexer support.") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11ipv6/dccp: do not inherit ipv6_mc_list from parentWANG Cong2-0/+8
Like commit 657831ffc38e ("dccp/tcp: do not inherit mc_list from parent") we should clear ipv6_mc_list etc. for IPv6 sockets too. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11netxen_nic: set rcode to the return status from the call to netxen_issue_cmdColin Ian King1-1/+1
Currently rcode is being initialized to NX_RCODE_SUCCESS and later it is checked to see if it is not NX_RCODE_SUCCESS which is never true. It appears that there is an unintentional missing assignment of rcode from the return of the call to netxen_issue_cmd() that was dropped in an earlier fix, so add it in. Detected by CoverityScan, CID#401900 ("Logically dead code") Fixes: 2dcd5d95ad6b2 ("netxen_nic: fix cdrp race condition") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11net: qca_spi: Fix alignment issues in rx pathStefan Wahren1-4/+6
The qca_spi driver causes alignment issues on ARM devices. So fix this by using netdev_alloc_skb_ip_align(). Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11driver: vrf: Fix one possible use-after-free issueGao Feng1-1/+2
The current codes only deal with the case that the skb is dropped, it may meet one use-after-free issue when NF_HOOK returns 0 that means the skb is stolen by one netfilter rule or hook. When one netfilter rule or hook stoles the skb and return NF_STOLEN, it means the skb is taken by the rule, and other modules should not touch this skb ever. Maybe the skb is queued or freed directly by the rule. Now uses the nf_hook instead of NF_HOOK to get the result of netfilter, and check the return value of nf_hook. Only when its value equals 1, it means the skb could go ahead. Or reset the skb as NULL. BTW, because vrf_rcv_finish is empty function, so needn't invoke it even though nf_hook returns 1. But we need to modify vrf_rcv_finish to deal with the NF_STOLEN case. There are two cases when skb is stolen. 1. The skb is stolen and freed directly. There is nothing we need to do, and vrf_rcv_finish isn't invoked. 2. The skb is queued and reinjected again. The vrf_rcv_finish would be invoked as okfn, so need to free the skb in it. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09dccp/tcp: do not inherit mc_list from parentEric Dumazet1-0/+2
syzkaller found a way to trigger double frees from ip_mc_drop_socket() It turns out that leave a copy of parent mc_list at accept() time, which is very bad. Very similar to commit 8b485ce69876 ("tcp: do not inherit fastopen_req from parent") Initial report from Pray3r, completed by Andrey one. Thanks a lot to them ! Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Pray3r <pray3r.z@gmail.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09sparc64: fix fault handling in NGbzero.S and GENbzero.SDave Aldridge3-2/+8
When any of the functions contained in NGbzero.S and GENbzero.S vector through *bzero_from_clear_user, we may end up taking a fault when executing one of the store alternate address space instructions. If this happens, the exception handler does not restore the %asi register. This commit fixes the issue by introducing a new exception handler that ensures the %asi register is restored when a fault is handled. Orabug: 25577560 Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09sparc: use memdup_user_nul in sun4m LED driverGeliang Tang1-10/+3
Use memdup_user_nul() helper instead of open-coding to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09proc: try to remove use of FOLL_FORCE entirelyLinus Torvalds1-4/+1
We fixed the bugs in it, but it's still an ugly interface, so let's see if anybody actually depends on it. It's entirely possible that nothing actually requires the whole "punch through read-only mappings" semantics. For example, gdb definitely uses the /proc/<pid>/mem interface, but it looks like it mainly does it for regular reads of the target (that don't need FOLL_FORCE), and looking at the gdb source code seems to fall back on the traditional ptrace(PTRACE_POKEDATA) interface if it needs to. If this breaks something, I do have a (more complex) version that only enables FOLL_FORCE when somebody has PTRACE_ATTACH'ed to the target, like the comment here used to say ("Maybe we should limit FOLL_FORCE to actual ptrace users?"). Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09qede: Split PF/VF ndos.Mintz, Yuval2-6/+21
PFs and VFs share the same structure of NDOs today, and the VFs explicitly fails the ndo_xdp() callback stating it doesn't support XDP. This results in lots of: [qede_xdp:1032(enp131s2)]VFs don't support XDP ------------[ cut here ]------------ WARNING: CPU: 4 PID: 1426 at net/core/rtnetlink.c:1637 rtnl_dump_ifinfo+0x354/0x3c0 ... Call Trace: ? __alloc_skb+0x9b/0x1d0 netlink_dump+0x122/0x290 netlink_recvmsg+0x27d/0x430 sock_recvmsg+0x3d/0x50 ... As every dump request for the VF interface info would fail due to rtnl_xdp_fill() returning an error code. To resolve this, introduce a subset of the NDOs meant for the VF in a seperate structure and register that one instead for VFs, and omit the ndo_xdp initialization. Fixes: 40b8c45492ef ("qede: Prevent VFs from using XDP") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09qed: Correct doorbell configuration for !4Kb pagesRam Amrani1-1/+1
When configuring the doorbell DPI address, driver aligns the start address to 4KB [HW-pages] instead of host PAGE_SIZE. As a result, RoCE applications might receive addresses which are unaligned to pages [when PAGE_SIZE > 4KB], which is a security risk. Fixes: 51ff17251c9c ("qed: Add support for RoCE hw init") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09qed: Tell QM the number of tasksMintz, Yuval1-0/+1
Driver doesn't pass the number of tasks to the QM init logic which would cause back-pressure in scenarios requiring many tasks [E.g., using max MRs] and thus reduced performance. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09qed: Fix VF removal sequenceMintz, Yuval1-2/+4
After previos changes in HW-stop scheme, VFs stopped sending CLOSE messages to their PFs when they unload. Fixes: 1226337ad98f ("qed: Correct HW stop flow") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09qede: Fix XDP memory leak on unloadSuddarsana Reddy Kalluru1-0/+3
When (re|un)loading, Tx-queues belonging to XDP would not get freed. Fixes: cb6aeb079294 ("qede: Add support for XDP_TX") Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09net/mlx4_core: Reduce harmless SRIOV error message to debug levelJack Morgenstein2-4/+12
Under SRIOV resource management, extra counters are allocated to VFs from a free pool. If that pool is empty, the ALLOC_RES command for a counter resource fails -- and this generates a misleading error message in the message log. Under SRIOV, each VF is allocated (i.e., guaranteed) 2 counters -- one counter per port. For ETH ports, the RoCE driver requests an additional counter (above the guaranteed counters). If that request fails, the VF RoCE driver simply uses the default (i.e., guaranteed) counter for that port. Thus, failing to allocate an additional counter does not constitute a problem, and the error message on the PF when this occurs should be reduced to debug level. Finally, to identify the situation that the reason for the failure is that no resources are available to grant to the VF, we modified the error returned by mlx4_grant_resource to -EDQUOT (Quota exceeded), which more accurately describes the error. Fixes: c3abb51bdb0e ("IB/mlx4: Add RoCE/IB dedicated counters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09net/mlx4_en: Avoid adding steering rules with invalid ringTalat Batheesh1-0/+5
Inserting steering rules with illegal ring is an invalid operation, block it. Fixes: 820672812f82 ('net/mlx4_en: Manage flow steering rules with ethtool') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09net/mlx4_en: Change the error print to debug printKamal Heib1-1/+2
The error print within mlx4_en_calc_rx_buf() should be a debug print. Fixes: 51151a16a60f ('mlx4: allow order-0 memory allocations in RX path') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparisonKarim Eshapa1-1/+1
Use time_after() for time comparison with the new fix. Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09DECnet: Use container_of() for embedded structKees Cook1-6/+6
Instead of a direct cross-type cast, use conatiner_of() to locate the embedded structure, even in the face of future struct layout randomization. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08Revert "ipv4: restore rt->fi for reference counting"David S. Miller2-18/+1
This reverts commit 82486aa6f1b9bc8145e6d0fa2bc0b44307f3b875. As implemented, this causes dangling netdevice refs. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4Andrew Morton1-110/+126
drivers/staging/ccree/ssi_hash.c:1990: error: unknown field 'template_ahash' specified in initializer drivers/staging/ccree/ssi_hash.c:1991: error: unknown field 'init' specified in initializer drivers/staging/ccree/ssi_hash.c:1991: warning: missing braces around initializer drivers/staging/ccree/ssi_hash.c:1991: warning: (near initialization for 'driver_hash[0].<anonymous>.template_ahash') drivers/staging/ccree/ssi_hash.c:1992: error: unknown field 'update' specified in initializer drivers/staging/ccree/ssi_hash.c:1992: warning: excess elements in union initializer drivers/staging/ccree/ssi_hash.c:1992: warning: (near initialization for 'driver_hash[0].<anonymous>') drivers/staging/ccree/ssi_hash.c:1993: error: unknown field 'final' specified in initializer drivers/staging/ccree/ssi_hash.c:1993: warning: excess elements in union initializer drivers/staging/ccree/ssi_hash.c:1993: warning: (near initialization for 'driver_hash[0].<anonymous>') ... gcc-4.4.4 has issues with anon union initializers. Work around this. Cc: Gilad Ben-Yossef <gilad@benyossef.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08selftests/vm: add a test for virtual address range mappingAnshuman Khandual3-0/+134
This verifies virtual address mapping below and above the 128TB range and makes sure that address returned are within the expected range depending upon the hint passed from the user space. Link: http://lkml.kernel.org/r/20170418095252.20533-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoint to dax_insert_mapping()Ross Zwisler2-0/+31
Add a tracepoint to dax_insert_mapping(), following the same logging conventions as the rest of DAX. This tracepoint, along with the one in dax_load_hole(), lets us know how a DAX PTE fault was serviced. Here is an example DAX fault that inserts a PTE mapping: small-1126 [007] .... 145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 small-1126 [007] .... 145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006 small-1126 [007] .... 145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoint to dax_writeback_one()Ross Zwisler2-0/+25
Add a tracepoint to dax_writeback_one(), following the same logging conventions as the rest of DAX. Here is an example range writeback which ends up flushing one PMD and one PTE: test-1265 [003] .... 496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff test-1265 [003] .... 496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200 test-1265 [003] .... 496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1 test-1265 [003] .... 496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff [akpm@linux-foundation.org: struct blk_dax_ctl has disappeared] Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_writeback_mapping_range()Ross Zwisler2-5/+39
Add tracepoints to dax_writeback_mapping_range(), following the same logging conventions as the rest of DAX. Here is an example writeback call: msync-1085 [006] .... 200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff msync-1085 [006] .... 200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff [ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()] Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_load_hole()Ross Zwisler2-5/+12
Add tracepoints to dax_load_hole(), following the same logging conventions as the rest of DAX. Here is the logging generated by a PTE read from a hole: read-1075 [002] .... 62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 read-1075 [002] .... 62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE read-1075 [002] .... 62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_pfn_mkwrite()Ross Zwisler2-0/+5
Add tracepoints to dax_pfn_mkwrite(), following the same logging conventions as the rest of DAX. Here is an example PTE fault followed by a pfn_mkwrite: small_aligned-1094 [002] .... 374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 small_aligned-1094 [002] .... 374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE small_aligned-1094 [002] .... 374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08dax: add tracepoints to dax_iomap_pte_fault()Ross Zwisler2-4/+52
Patch series "second round of tracepoints for DAX". This second round of DAX tracepoint patches adds tracing to the PTE fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(), dax_insert_mapping()) and to the writeback path (dax_writeback_mapping_range(), dax_writeback_one()). The purpose of this tracing is to give us a high level view of what DAX is doing, whether faults are being serviced by PMDs or PTEs, and by real storage or by zero pages covering holes. I do have some patches nearly ready which also add tracing to grab_mapping_entry() and dax_insert_mapping_entry(). These are more targeted at logging how we are interacting with the radix tree, how we use empty entries for locking, whether we "downgrade" huge zero pages to 4k PTE sized allocations, etc. In the end it seemed to me that this might be too detailed to have as constantly present tracepoints, but if anyone sees value in having tracepoints like this in the DAX code permanently (Jan?), please let me know and I'll add those last two patches. All these tracepoints were done to be consistent with the style of the XFS tracepoints and with the existing DAX PMD tracepoints. This patch (of 6): Add tracepoints to dax_iomap_pte_fault(), following the same logging conventions as the rest of DAX. Here is an example fault that initially tries to be serviced by the PMD fault handler but which falls back to PTEs because the VMA isn't large enough to hold a PMD: small-1086 [005] .... 71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003 small-1086 [005] .... 71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 small-1086 [005] .... 71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK small-1086 [005] .... 71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 small-1086 [005] .... 71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mtd: nand: nandsim: convert to memalloc_noreclaim_*()Vlastimil Babka1-20/+9
Nandsim has own functions set_memalloc() and clear_memalloc() for robust setting and clearing of PF_MEMALLOC. Replace them by the new generic helpers. No functional change. Link: http://lkml.kernel.org/r/20170405074700.29871-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Richard Weinberger <richard@nod.at> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Leech <cleech@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08treewide: convert PF_MEMALLOC manipulations to new helpersVlastimil Babka4-12/+16
We now have memalloc_noreclaim_{save,restore} helpers for robust setting and clearing of PF_MEMALLOC. Let's convert the code which was using the generic tsk_restore_flags(). No functional change. [vbabka@suse.cz: in net/core/sock.c the hunk is missing] Link: http://lkml.kernel.org/r/20170405074700.29871-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Chris Leech <cleech@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Wouter Verhelst <w@uter.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm: introduce memalloc_noreclaim_{save,restore}Vlastimil Babka3-11/+29
The previous patch ("mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC") has shown that simply setting and clearing PF_MEMALLOC in current->flags can result in wrongly clearing a pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim. Let's introduce helpers that support proper nesting by saving the previous stat of the flag, similar to the existing memalloc_noio_* and memalloc_nofs_* helpers. Convert existing setting/clearing of PF_MEMALLOC within mm to the new helpers. There are no known issues with the converted code, but the change makes it more robust. Link: http://lkml.kernel.org/r/20170405074700.29871-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Chris Leech <cleech@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm: prevent potential recursive reclaim due to clearing PF_MEMALLOCVlastimil Babka1-1/+2
Patch series "more robust PF_MEMALLOC handling" This series aims to unify the setting and clearing of PF_MEMALLOC, which prevents recursive reclaim. There are some places that clear the flag unconditionally from current->flags, which may result in clearing a pre-existing flag. This already resulted in a bug report that Patch 1 fixes (without the new helpers, to make backporting easier). Patch 2 introduces the new helpers, modelled after existing memalloc_noio_* and memalloc_nofs_* helpers, and converts mm core to use them. Patches 3 and 4 convert non-mm code. This patch (of 4): __alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock during page migration by lock_page() (see the comment in __unmap_and_move()). Then it unconditionally clears the flag, which can clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim. This was not a problem until commit a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath"), because direct compation was called only after direct reclaim, which was skipped when PF_MEMALLOC flag was set. Even now it's only a theoretical issue, as the new callsite of __alloc_pages_direct_compact() is reached only for costly orders and when gfp_pfmemalloc_allowed() is true, which means either __GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true. There is no such known context, but let's play it safe and make __alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is already set. Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath") Link: http://lkml.kernel.org/r/20170405074700.29871-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Chris Leech <cleech@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm/huge_memory.c: deposit a pgtable for DAX PMD faults when requiredOliver O'Halloran1-2/+18
Although all architectures use a deposited page table for THP on anonymous VMAs, some architectures (s390 and powerpc) require the deposited storage even for file backed VMAs due to quirks of their MMUs. This patch adds support for depositing a table in DAX PMD fault handling path for archs that require it. Other architectures should see no functional changes. Link: http://lkml.kernel.org/r/20170411174233.21902-3-oohall@gmail.com Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: linux-nvdimm@ml01.01.org Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm/huge_memory.c: use zap_deposited_table() moreOliver O'Halloran1-6/+2
Depending on the flags of the PMD being zapped there may or may not be a deposited pgtable to be freed. In two of the three cases this is open coded while the third uses the zap_deposited_table() helper. This patch converts the others to use the helper to clean things up a bit. Link: http://lkml.kernel.org/r/20170411174233.21902-2-oohall@gmail.com Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: linux-nvdimm@ml01.01.org Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08time: delete CURRENT_TIME_SEC and CURRENT_TIMEDeepa Dinamani1-3/+0
All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced by other time functions. These macros are also not y2038 safe. And, all their use cases can be fulfilled by y2038 safe ktime_get_* variants. Link: http://lkml.kernel.org/r/1491613030-11599-12-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08gfs2: replace CURRENT_TIME with current_timeStephen Rothwell1-2/+2
Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08apparmorfs: replace CURRENT_TIME with current_time()Deepa Dinamani1-1/+1
CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time(). This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. current_time() is also planned to be transitioned to y2038 safe behavior along with this change. CURRENT_TIME macro will be deleted before merging the aforementioned change. Link: http://lkml.kernel.org/r/1491613030-11599-11-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08lustre: replace CURRENT_TIME macroDeepa Dinamani2-5/+5
CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time() for filesystem times, and ktime_get_* functions for others. struct timespec is also not y2038 safe. Retain timespec for timestamp representation here as lustre uses it internally everywhere. These references will be changed to use struct timespec64 in a separate patch. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. current_time() is also planned to be transitioned to y2038 safe behavior along with this change. CURRENT_TIME macro will be deleted before merging the aforementioned change. Link: http://lkml.kernel.org/r/1491613030-11599-10-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: James Simmons <jsimmons@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fs: ubifs: replace CURRENT_TIME_SEC with current_timeDeepa Dinamani6-30/+26
CURRENT_TIME_SEC is not y2038 safe. current_time() will be transitioned to use 64 bit time along with vfs in a separate patch. There is no plan to transition CURRENT_TIME_SEC to use y2038 safe time interfaces. current_time() returns timestamps according to the granularities set in the inode's super_block. The granularity check to call current_fs_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to update inode timestamp. Use timespec_trunc during file system creation, before the first inode is created. Link: http://lkml.kernel.org/r/1491613030-11599-9-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Weinberger <richard@nod.at> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fs: ufs: use ktime_get_real_ts64() for birthtimeDeepa Dinamani1-2/+4
CURRENT_TIME is not y2038 safe. Replace it with ktime_get_real_ts64(). Inode time formats are already 64 bit long and accommodates time64_t. Link: http://lkml.kernel.org/r/1491613030-11599-6-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fs: ceph: CURRENT_TIME with ktime_get_real_ts()Deepa Dinamani4-6/+10
CURRENT_TIME is not y2038 safe. The macro will be deleted and all the references to it will be replaced by ktime_get_* apis. struct timespec is also not y2038 safe. Retain timespec for timestamp representation here as ceph uses it internally everywhere. These references will be changed to use struct timespec64 in a separate patch. The current_fs_time() api is being changed to use vfs struct inode* as an argument instead of struct super_block*. Set the new mds client request r_stamp field using ktime_get_real_ts() instead of using current_fs_time(). Also, since r_stamp is used as mtime on the server, use timespec_trunc() to truncate the timestamp, using the right granularity from the superblock. This api will be transitioned to be y2038 safe along with vfs. Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> M: Ilya Dryomov <idryomov@gmail.com> M: "Yan, Zheng" <zyan@redhat.com> M: Sage Weil <sage@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>