aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-03-06netfilter: nf_tables: cleanup nf_tables.hPatrick McHardy1-87/+87
The transaction related definitions are squeezed in between the rule and expression definitions, which are closely related and should be next to each other. The transaction definitions actually don't belong into that file at all since it defines the global objects and API and transactions are internal to nf_tables_api, but for now simply move them to a seperate section. Similar, the chain types are in between a set of registration functions, they belong to the chain section. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06netfilter: ipt_CLUSTERIP: deprecate it in favour of xt_clusterPablo Neira Ayuso1-0/+1
xt_cluster supersedes ipt_CLUSTERIP since it can be also used in gateway configurations (not only from the backend side). ipt_CLUSTER is also known to leak the netdev that it uses on device removal, which requires a rather large fix to workaround the problem: http://patchwork.ozlabs.org/patch/358629/ So let's deprecate this so we can probably kill code this in the future. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-05Merge branch 'suspend-to-idle'Rafael J. Wysocki1-2/+15
* suspend-to-idle: cpuidle / sleep: Use broadcast timer for states that stop local timer cpuidle: Clean up fallback handling in cpuidle_idle_call() cpuidle / sleep: Do sanity checks in cpuidle_enter_freeze() too idle / sleep: Avoid excessive disabling and enabling interrupts
2015-03-05cpuidle / sleep: Use broadcast timer for states that stop local timerRafael J. Wysocki1-2/+15
Commit 381063133246 (PM / sleep: Re-implement suspend-to-idle handling) overlooked the fact that entering some sufficiently deep idle states by CPUs may cause their local timers to stop and in those cases it is necessary to switch over to a broadcast timer prior to entering the idle state. If the cpuidle driver in use does not provide the new ->enter_freeze callback for any of the idle states, that problem affects suspend-to-idle too, but it is not taken into account after the changes made by commit 381063133246. Fix that by changing the definition of cpuidle_enter_freeze() and re-arranging of the code in cpuidle_idle_call(), so the former does not call cpuidle_enter() any more and the fallback case is handled by cpuidle_idle_call() directly. Fixes: 381063133246 (PM / sleep: Re-implement suspend-to-idle handling) Reported-and-tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-03-05bridge: Extend Proxy ARP design to allow optional rules for Wi-FiJouni Malinen2-0/+2
This extends the design in commit 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") with optional set of rules that are needed to meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The previously added BR_PROXYARP behavior is left as-is and a new BR_PROXYARP_WIFI alternative is added so that this behavior can be configured from user space when required. In addition, this enables proxyarp functionality for unicast ARP requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible to use unicast as well as broadcast for these frames. The key differences in functionality: BR_PROXYARP: - uses the flag on the bridge port on which the request frame was received to determine whether to reply - block bridge port flooding completely on ports that enable proxy ARP BR_PROXYARP_WIFI: - uses the flag on the bridge port to which the target device of the request belongs - block bridge port flooding selectively based on whether the proxyarp functionality replied Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONETejun Heo1-1/+2
cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com v3: bit_waitqueue() can't be used for work items defined in vmalloc area. Switched to custom wake function which matches the target work item and exclusive wait and wakeup. v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if the target bit waitqueue has wait_bit_queue's on it. Use DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu Vizoso. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com> Cc: stable@vger.kernel.org Tested-by: Jesper Nilsson <jesper.nilsson@axis.com> Tested-by: Rabin Vincent <rabin.vincent@axis.com>
2015-03-05bcma: move internal function declarations to private headerRafał Miłecki5-36/+0
These functions are not exported nor used anywhere, so there is no reason to put them in public headers. Also drop unused bcma_chipco_(suspend|resume). Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-03-05bcma: make bcma_host_pci_(up|down) calls safe for every configRafał Miłecki1-0/+9
We were providing declarations but actual code was compiled only with CONFIG_BCMA_HOST_PCI set. This could result in: ERROR: "bcma_host_pci_down" [drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko] undefined! ERROR: "bcma_host_pci_up" [drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko] undefined! ERROR: "bcma_host_pci_down" [drivers/net/wireless/b43/b43.ko] undefined! ERROR: "bcma_host_pci_up" [drivers/net/wireless/b43/b43.ko] undefined! Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-03-04fib_trie: Make fib_table rcu safeAlexander Duyck2-31/+46
The fib_table was wrapped in several places with an rcu_read_lock/rcu_read_unlock however after looking over the code I found several spots where the tables were being accessed as just standard pointers without any protections. This change fixes that so that all of the proper protections are in place when accessing the table to take RCU replacement or removal of the table into account. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05drm/ttm: device address space != CPU address spaceAlex Deucher2-2/+2
We need to store device offsets in 64 bit as the device address space may be larger than the CPU's. Fixes GPU init failures on radeons with 4GB or more of vram on 32 bit kernels. We put vram at the start of the GPU's address space so the gart aperture starts at 4 GB causing all GPU addresses in the gart aperture to get truncated. bug: https://bugs.freedesktop.org/show_bug.cgi?id=89072 [airlied: fix warning on nouveau build] Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: thellstrom@vmware.com Acked-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-03-05drm/mm: Support 4 GiB and larger rangesThierry Reding1-26/+26
The current implementation is limited by the number of addresses that fit into an unsigned long. This causes problems on 32-bit Tegra where unsigned long is 32-bit but drm_mm is used to manage an IOVA space of 4 GiB. Given the 32-bit limitation, the range is limited to 4 GiB - 1 (or 4 GiB - 4 KiB for page granularity). This commit changes the start and size of the range to be an unsigned 64-bit integer, thus allowing much larger ranges to be supported. [airlied: fix i915 warnings and coloring callback] Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Dave Airlie <airlied@redhat.com> fixupo
2015-03-04genirq / PM: Add flag for shared NO_SUSPEND interrupt linesRafael J. Wysocki2-0/+6
It currently is required that all users of NO_SUSPEND interrupt lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the WARN_ON_ONCE() in irq_pm_install_action() will trigger. That is done to warn about situations in which unprepared interrupt handlers may be run unnecessarily for suspended devices and may attempt to access those devices by mistake. However, it may cause drivers that have no technical reasons for using IRQF_NO_SUSPEND to set that flag just because they happen to share the interrupt line with something like a timer. Moreover, the generic handling of wakeup interrupts introduced by commit 9ce7a25849e8 (genirq: Simplify wakeup mechanism) only works for IRQs without any NO_SUSPEND users, so the drivers of wakeup devices needing to use shared NO_SUSPEND interrupt lines for signaling system wakeup generally have to detect wakeup in their interrupt handlers. Thus if they happen to share an interrupt line with a NO_SUSPEND user, they also need to request that their interrupt handlers be run after suspend_device_irqs(). In both cases the reason for using IRQF_NO_SUSPEND is not because the driver in question has a genuine need to run its interrupt handler after suspend_device_irqs(), but because it happens to share the line with some other NO_SUSPEND user. Otherwise, the driver would do without IRQF_NO_SUSPEND just fine. To make it possible to specify that condition explicitly, introduce a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND, that, when set, will indicate to the IRQ core that the interrupt user is generally fine with suspending the IRQ, but it also can tolerate handler invocations after suspend_device_irqs() and, in particular, it is capable of detecting system wakeup and triggering it as appropriate from its interrupt handler. That will allow us to work around a problem with a shared timer interrupt line on at91 platforms. Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2 Link: http://marc.info/?t=142252775300011&r=1&w=2 Link: https://lkml.org/lkml/2014/12/15/552 Reported-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com>
2015-03-04netfilter: nf_tables: fix userdata length overflowPatrick McHardy1-3/+19
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8 to store its size. Introduce a struct nft_userdata which contains a length field and indicate its presence using a single bit in the rule. The length field of struct nft_userdata is also a u8, however we don't store zero sized data, so the actual length is udata->len + 1. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04mpls: Multicast route table change notificationsEric W. Biederman1-0/+2
Unlike IPv4 this code notifies on all cases where mpls routes are added or removed and it never automatically removes routes. Avoiding both the userspace confusion that is caused by omitting route updates and the possibility of a flood of netlink traffic when an interface goes doew. For now reserved labels are handled automatically and userspace is not notified. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Netlink commands to add, remove, and dump routesEric W. Biederman1-0/+8
This change adds two new netlink routing attributes: RTA_VIA and RTA_NEWDST. RTA_VIA specifies the specifies the next machine to send a packet to like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it includes the address family of the address of the next machine to send a packet to. Currently the MPLS code supports addresses in AF_INET, AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac address is acquired from the neighbour table. For AF_PACKET the destination mac_address is specified in the netlink configuration. I think raw destination mac address support with the family AF_PACKET will prove useful. There is MPLS-TP which is defined to operate on machines that do not support internet packets of any flavor. Further seem to be corner cases where it can be useful. At this point I don't care much either way. RTA_NEWDST specifies the destination address to forward the packet with. MPLS typically changes it's destination address at every hop. For a swap operation RTA_NEWDST is specified with a length of one label. For a push operation RTA_NEWDST is specified with two or more labels. For a pop operation RTA_NEWDST is not specified or equivalently an emtpy RTAN_NEWDST is specified. Those new netlink attributes are used to implement handling of rt-netlink RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the MPLS label table. rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message, verify no unhandled attributes or unhandled values are present and sets up the data structures for mpls_route_add and mpls_route_del. I did my best to match up with the existing conventions with the caveats that MPLS addresses are all destination-specific-addresses, and so don't properly have a scope. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Add a sysctl to control the size of the mpls label tableEric W. Biederman1-0/+2
This sysctl gives two benefits. By defaulting the table size to 0 mpls even when compiled in and enabled defaults to not forwarding any packets. This prevents unpleasant surprises for users. The other benefit is that as mpls labels are allocated locally a dense table a small dense label table may be used which saves memory and is extremely simple and efficient to implement. This sysctl allows userspace to choose the restrictions on the label table size userspace applications need to cope with. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04mpls: Basic routing supportEric W. Biederman3-0/+21
This change adds a new Kconfig option MPLS_ROUTING. The core of this change is the code to look at an mpls packet received from another machine. Look that packet up in a routing table and forward the packet on. Support of MPLS over ATM is not considered or attempted here. This implemntation follows RFC3032 and implements the MPLS shim header that can pass over essentially any network. What RFC3021 refers to as the as the Incoming Label Map (ILM) I call net->mpls.platform_label[]. What RFC3031 refers to as the Next Label Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the label fordwarding information base (lfib) might also be valid. Further the implemntation forwards packets as described in RFC3032. There is no need and given the original motivation for MPLS a strong discincentive to have a flexible label forwarding path. In essence the logic is the topmost label is read, looked up, removed, and replaced by 0 or more new lables and the sent out the specified interface to it's next hop. Quite a few optional features are not implemented here. Among them are generation of ICMP errors when the TTL is exceeded or the packet is larger than the next hop MTU (those conditions are detected and the packets are dropped instead of generating an icmp error). The traffic class field is always set to 0. The implementation focuses on IP over MPLS and does not handle egress of other kinds of protocols. Instead of implementing coordination with the neighbour table and sorting out how to input next hops in a different address family (for which there is value). I was lazy and implemented a next hop mac address instead. The code is simpler and there are flavor of MPLS such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is appropriate so a next hop by mac address would need to be implemented at some point. Two new definitions AF_MPLS and PF_MPLS are exposed to userspace. Decoding the mpls header must be done by first byeswapping a 32bit bit endian word into the local cpu endian and then bit shifting to extract the pieces. There is no C bit-field that can represent a wire format mpls header on a little endian machine as the low bits of the 20bit label wind up in the wrong half of third byte. Therefore internally everything is deal with in cpu native byte order except when writing to and reading from a packet. For management simplicity if a label is configured to forward out an interface that is down the packet is dropped early. Similarly if an network interface is removed rt_dev is updated to NULL (so no reference is preserved) and any packets for that label are dropped. Keeping the label entries in the kernel allows the kernel label table to function as the definitive source of which labels are allocated and which are not. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04neigh: Add helper function neigh_xmitEric W. Biederman1-0/+3
For MPLS I am building the code so that either the neighbour mac address can be specified or we can have a next hop in ipv4 or ipv6. The kind of next hop we have is indicated by the neighbour table pointer. A neighbour table pointer of NULL is a link layer address. A non-NULL neighbour table pointer indicates which neighbour table and thus which address family the next hop address is in that we need to look up. The code either sends a packet directly or looks up the appropriate neighbour table entry and sends the packet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04neigh: Factor out ___neigh_lookup_norefEric W. Biederman3-33/+57
While looking at the mpls code I found myself writing yet another version of neigh_lookup_noref. We currently have __ipv4_lookup_noref and __ipv6_lookup_noref. So to make my work a little easier and to make it a smidge easier to verify/maintain the mpls code in the future I stopped and wrote ___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and __ipv6_lookup_noref in terms of this new function. I tested my new version by verifying that the same code is generated in ip_finish_output2 and ip6_finish_output2 where these functions are inlined. To get to ___neigh_lookup_noref I added a new neighbour cache table function key_eq. So that the static size of the key would be available. I also added __neigh_lookup_noref for people who want to to lookup a neighbour table entry quickly but don't know which neibhgour table they are going to look up. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller74-850/+4232
Conflicts: drivers/net/ethernet/rocker/rocker.c The rocker commit was two overlapping changes, one to rename the ->vport member to ->pport, and another making the bitmask expression use '1ULL' instead of plain '1'. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds5-19/+9
Pull networking fixes from David Miller: 1) If an IPVS tunnel is created with a mixed-family destination address, it cannot be removed. Fix from Alexey Andriyanov. 2) Fix module refcount underflow in netfilter's nft_compat, from Pablo Neira Ayuso. 3) Generic statistics infrastructure can reference variables sitting on a released function stack, therefore use dynamic allocation always. Fix from Ignacy Gawędzki. 4) skb_copy_bits() return value test is inverted in ip_check_defrag(). 5) Fix network namespace exit in openvswitch, we have to release all of the per-net vports. From Pravin B Shelar. 6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter. 7) Fix rhashtable grow/shrink behavior, only expand during inserts and shrink during deletes. From Daniel Borkmann. 8) Netdevice names with semicolons should never be allowed, because they serve as a separator. From Matthew Thode. 9) Use {,__}set_current_state() where appropriate, from Fabian Frederick. 10) Revert byte queue limits support in r8169 driver, it's causing regressions we can't figure out. 11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to measure packets in flight, properly use tcp_packets_in_flight() instead. From Neal Cardwell. 12) Fix accidental removal of support for bluetooth in CSR based Intel wireless cards. From Marcel Holtmann. 13) We accidently added a behavioral change between native and compat tasks, wrt testing the MSG_CMSG_COMPAT bit. Just ignore it if the user happened to set it in a native binary as that was always the behavior we had. From Catalin Marinas. 14) Check genlmsg_unicast() return valud in hwsim netlink tx frame handling, from Bob Copeland. 15) Fix stale ->radar_required setting in mac80211 that can prevent starting new scans, from Eliad Peller. 16) Fix memory leak in nl80211 monitor, from Johannes Berg. 17) Fix race in TX index handling in xen-netback, from David Vrabel. 18) Don't enable interrupts in amx-xgbe driver until all software et al. state is ready for the interrupt handler to run. From Thomas Lendacky. 19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric W Biederman. 20) The amount of header space needed in macvtap was not calculated properly, fix it otherwise we splat past the beginning of the packet. From Eric Dumazet. 21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin. 22) Don't raw initialize or mod timers, use setup_timer() and mod_timer() instead. From Vaishali Thakkar. 23) Fix software maintained statistics in bcmgenet and systemport drivers, from Florian Fainelli. 24) DMA descriptor updates in sh_eth need proper memory barriers, from Ben Hutchings. 25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal Kubecek. 26) Openvswitch's non-masked set actions aren't constructed properly into netlink messages, fix from Joe Stringer. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) openvswitch: Fix serialization of non-masked set actions. gianfar: Reduce logging noise seen due to phy polling if link is down ibmveth: Add function to enable live MAC address changes net: bridge: add compile-time assert for cb struct size udp: only allow UFO for packets from SOCK_DGRAM sockets sh_eth: Really fix padding of short frames on TX Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790" sh_eth: Fix RX recovery on R-Car in case of RX ring underrun sh_eth: Ensure proper ordering of descriptor active bit write/read net/mlx4_en: Disbale GRO for incoming loopback/selftest packets net/mlx4_core: Fix wrong mask and error flow for the update-qp command net: systemport: fix software maintained statistics net: bcmgenet: fix software maintained statistics rxrpc: don't multiply with HZ twice rxrpc: terminate retrans loop when sending of skb fails net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface. net: pasemi: Use setup_timer and mod_timer net: stmmac: Use setup_timer and mod_timer net: 8390: axnet_cs: Use setup_timer and mod_timer net: 8390: pcnet_cs: Use setup_timer and mod_timer ...
2015-03-03ax25: Stop using magic neighbour cache operations.Eric W. Biederman1-4/+1
Before the ax25 stack calls dev_queue_xmit it always calls ax25_type_trans which sets skb->protocol to ETH_P_AX25. Which means that by looking at the protocol type it is possible to detect IP packets that have not been munged by the ax25 stack in ndo_start_xmit and call a function to munge them. Rename ax25_neigh_xmit to ax25_ip_xmit and tweak the return type and value to be appropriate for an ndo_start_xmit function. Update all of the ax25 devices to test the protocol type for ETH_P_IP and return ax25_ip_xmit as the first thing they do. This preserves the existing semantics of IP packet processing, but the timing will be a little different as the IP packets now pass through the qdisc layer before reaching the ax25 ip packet processing. Remove the now unnecessary ax25 neighbour table operations. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03NFS: Fix a regression in the read() syscallTrond Myklebust1-0/+1
When invalidating the page cache for a regular file, we want to first sync all dirty data to disk and then call invalidate_inode_pages2(). The latter relies on nfs_launder_page() and nfs_release_page() to deal respectively with dirty pages, and unstable written pages. When commit 9590544694bec ("NFS: avoid deadlocks with loop-back mounted NFS filesystems.") changed the behaviour of nfs_release_page(), then it made it possible for invalidate_inode_pages2() to fail with an EBUSY. Unfortunately, that error is then propagated back to read(). Let's therefore work around the problem for now by protecting the call to sync the data and invalidate_inode_pages2() so that they are atomic w.r.t. the addition of new writes. Later on, we can revisit whether or not we still need nfs_launder_page() and nfs_release_page(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-03spi: fix a typo in comment.Marcin Bis1-1/+1
alway -> always Signed-off-by: Marcin Bis <marcin@bis.org.pl> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-03-03netfilter: reject: don't send icmp error if csum is invalidFlorian Westphal2-14/+3
tcp resets are never emitted if the packet that triggers the reject/reset has an invalid checksum. For icmp error responses there was no such check. It allows to distinguish icmp response generated via iptables -I INPUT -p udp --dport 42 -j REJECT and those emitted by network stack (won't respond if csum is invalid, REJECT does). Arguably its possible to avoid this by using conntrack and only using REJECT with -m conntrack NEW/RELATED. However, this doesn't work when connection tracking is not in use or when using nf_conntrack_checksum=0. Furthermore, sending errors in response to invalid csums doesn't make much sense so just add similar test as in nf_send_reset. Validate csum if needed and only send the response if it is ok. Reference: http://bugzilla.redhat.com/show_bug.cgi?id=1169829 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-02Merge branch 'fixes-for-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermalLinus Torvalds1-2/+54
Pull thermal management fixes from Eduardo Valentin: "Specifics: - Several fixes in tmon tool. - Fixes in intel int340x for _ART and _TRT tables. - Add id for Avoton SoC into powerclamp driver. - Fixes in RCAR thermal driver to remove race conditions and fix fail path - Fixes in TI thermal driver: removal of unnecessary code and build fix if !CONFIG_PM_SLEEP - Cleanups in exynos thermal driver - Add stubs for include/linux/thermal.h. Now drivers using thermal calls but that also work without CONFIG_THERMAL will be able to compile for systems that don't care about thermal. Note: I am sending this pull on Rui's behalf while he fixes issues in his Linux box" * 'fixes-for-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: thermal: int340x_thermal: Ignore missing _ART, _TRT tables thermal/intel_powerclamp: add id for Avoton SoC tools/thermal: tmon: silence 'set but not used' warnings tools/thermal: tmon: use pkg-config to determine library dependencies tools/thermal: tmon: support cross-compiling tools/thermal: tmon: add .gitignore tools/thermal: tmon: fixup tui windowing calculations tools/thermal: tmon: tui: don't hard-code dialog window size assumptions tools/thermal: tmon: add min/max macros tools/thermal: tmon: add --target-temp parameter thermal: exynos: Clean-up code to use oneline entry for exynos compatible table thermal: rcar: Make error and remove paths symmetrical with init thermal: rcar: Fix race condition between init and interrupt thermal: Introduce dummy functions when thermal is not defined ti-soc-thermal: Delete an unnecessary check before the function call "cpufreq_cooling_unregister" thermal: ti-soc-thermal: bandgap: Fix build warning if !CONFIG_PM_SLEEP
2015-03-02neigh: Don't require dst in neigh_hh_initEric W. Biederman1-0/+1
- Add protocol to neigh_tbl so that dst->ops->protocol is not needed - Acquire the device from neigh->dev This results in a neigh_hh_init that will cache the samve values regardless of the packets flowing through it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02arp: Kill arp_findEric W. Biederman1-1/+0
There are no more callers so kill this function. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: Kill dev_rebuild_headerEric W. Biederman2-12/+1
Now that there are no more users kill dev_rebuild_header and all of it's implementations. This is long overdue. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02neigh: Move neigh_compat_output into ax25_ip.cEric W. Biederman1-1/+0
The only caller is now is ax25_neigh_construct so move neigh_compat_output into ax25_ip.c make it static and rename it ax25_neigh_output. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02ax25: Refactor to use private neighbour operations.Eric W. Biederman1-0/+5
AX25 already has it's own private arp cache operations to isolate it's abuse of dev_rebuild_header to transmit packets. Add a function ax25_neigh_construct that will allow all of the ax25 devices to force using these operations, so that the generic arp code does not need to. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02ax25: Make ax25_header and ax25_rebuild_header staticEric W. Biederman1-3/+0
The only user is in ax25_ip.c so stop exporting these functions. Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net/mlx4_core: Fix wrong mask and error flow for the update-qp commandOr Gerlitz1-1/+1
The bit mask for currently supported driver features (MLX4_UPDATE_QP_SUPPORTED_ATTRS) of the update-qp command was defined twice (using enum value and pre-processor define directive) and wrong. The return value of the call to mlx4_update_qp() from within the SRIOV resource-tracker was wrongly voided down. Fix both issues. issue: none Fixes: 09e05c3f78e9 ('net/mlx4: Set vlan stripping policy by the right command') Fixes: ce8d9e0d6746 ('net/mlx4_core: Add UPDATE_QP SRIOV wrapper support') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02ebpf: move CONFIG_BPF_SYSCALL-only function declarationsDaniel Borkmann1-9/+9
Masami noted that it would be better to hide the remaining CONFIG_BPF_SYSCALL-only function declarations within the BPF header ifdef, w/o else path dummy alternatives since these functions are not supposed to have a user outside of CONFIG_BPF_SYSCALL. Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reference: http://article.gmane.org/gmane.linux.kernel.api/8658 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2-18/+50
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next A small batch with accumulated updates in nf-next, mostly IPVS updates, they are: 1) Add 64-bits stats counters to IPVS, from Julian Anastasov. 2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker seem to require this, from Anton Blanchard. 3) Use boolean instead of numeric value in set_match_v*(), from coccinelle via Fengguang Wu. 4) Allows rescheduling of new connections in IPVS when port reuse is detected, from Marcelo Ricardo Leitner. 5) Add missing bits to support arptables extensions from nft_compat, from Arturo Borrero. Patrick is preparing a large batch to enhance the set infrastructure, named expressions among other things, that should follow up soon after this batch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-nextDavid S. Miller5-105/+70
Johan Hedberg says: ==================== pull request: bluetooth-next 2015-03-02 Here's the first bluetooth-next pull request targeting the 4.1 kernel: - ieee802154/6lowpan cleanups - SCO routing to host interface support for the btmrvl driver - AMP code cleanups - Fixes to AMP HCI init sequence - Refactoring of the HCI callback mechanism - Added shutdown routine for Intel controllers in the btusb driver - New config option to enable/disable Bluetooth debugfs information - Fix for early data reception on L2CAP fixed channels Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: Remove iocb argument from sendmsg and recvmsgYing Xue8-35/+27
After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02bcma: support bringing up bus hosted on PCIe Gen 2Rafał Miłecki1-0/+2
Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-03-02bcma: change IRQ control function to accept bus as an argumentRafał Miłecki1-1/+1
It doesn't operate on PCI core, but PCI host device, so there is no point of passing core related struct. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-03-02bcma: add helpers bringing PCIe hosted bus up / downRafał Miłecki2-2/+3
Bringing PCIe hosted bus up requires operating on host-related core. Since we plan to support PCIe Gen 2 devices we should provide a helper picking the correct one (PCIE or PCIE2). Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-03-02net: move skb->dropcount to skb->cb[]Eyal Birger2-4/+16
Commit 977750076d98 ("af_packet: add interframe drop cmsg (v6)") unionized skb->mark and skb->dropcount in order to allow recording of the socket drop count while maintaining struct sk_buff size. skb->dropcount was introduced since there was no available room in skb->cb[] in packet sockets. However, its introduction led to the inability to export skb->mark, or any other aliased field to userspace if so desired. Moving the dropcount metric to skb->cb[] eliminates this problem at the expense of 4 bytes less in skb->cb[] for protocol families using it. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: add common accessor for setting dropcount on packetsEyal Birger1-0/+6
As part of an effort to move skb->dropcount to skb->cb[], use a common function in order to set dropcount in struct sk_buff. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: use common macro for assering skb->cb[] available size in protocol familiesEyal Birger1-0/+3
As part of an effort to move skb->dropcount to skb->cb[] use a common macro in protocol families using skb->cb[] for ancillary data to validate available room in skb->cb[]. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit fieldsEyal Birger1-3/+3
Convert boolean fields incoming and req_start to bit fields and move force_active in order save space in bt_skb_cb in an effort to use a portion of skb->cb[] for storing skb->dropcount. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrlEyal Birger1-7/+3
struct hci_req_ctrl is never used outside of struct bt_skb_cb; Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing the addition of more ancillary data. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02pppoe: Use workqueue to die properly when a PADT is receivedSimon Farnsworth1-0/+2
When a PADT frame is received, the socket may not be in a good state to close down the PPP interface. The current implementation handles this by simply blocking all further PPP traffic, and hoping that the lack of traffic will trigger the user to investigate. Use schedule_work to get to a process context from which we clear down the PPP interface, in a fashion analogous to hangup on a TTY-based PPP interface. This causes pppd to disconnect immediately, and allows tools to take immediate corrective action. Note that pppd's rp_pppoe.so plugin has code in it to disable the session when it disconnects; however, as a consequence of this patch, the session is already disabled before rp_pppoe.so is asked to disable the session. The result is a harmless error message: Failed to disconnect PPPoE socket: 114 Operation already in progress This message is safe to ignore, as long as the error is 114 Operation already in progress; in that specific case, it means that the PPPoE session has already been disabled before pppd tried to disable it. Signed-off-by: Simon Farnsworth <simon@farnz.org.uk> Tested-by: Dan Williams <dcbw@redhat.com> Tested-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01NFS: Add attribute update barriers to NFS writebacksTrond Myklebust1-0/+1
Ensure that other operations that race with our write RPC calls cannot revert the file size updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Add attribute update barriers to nfs_setattr_update_inode()Trond Myklebust1-1/+1
Ensure that other operations which raced with our setattr RPC call cannot revert the file attribute changes that were made on the server. To do so, we artificially bump the attribute generation counter on the inode so that all calls to nfs_fattr_init() that precede ours will be dropped. The motivation for the patch came from Chuck Lever's reports of readaheads racing with truncate operations and causing the file size to be reverted. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01NFS: Add a helper to set attribute barriersTrond Myklebust1-0/+1
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01cls_bpf: add initial eBPF support for programmable classifiersDaniel Borkmann1-0/+2
This work extends the "classic" BPF programmable tc classifier by extending its scope also to native eBPF code! This allows for user space to implement own custom, 'safe' C like classifiers (or whatever other frontend language LLVM et al may provide in future), that can then be compiled with the LLVM eBPF backend to an eBPF elf file. The result of this can be loaded into the kernel via iproute2's tc. In the kernel, they can be JITed on major archs and thus run in native performance. Simple, minimal toy example to demonstrate the workflow: #include <linux/ip.h> #include <linux/if_ether.h> #include <linux/bpf.h> #include "tc_bpf_api.h" __section("classify") int cls_main(struct sk_buff *skb) { return (0x800 << 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos)); } char __license[] __section("license") = "GPL"; The classifier can then be compiled into eBPF opcodes and loaded via tc, for example: clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o tc filter add dev em1 parent 1: bpf cls.o [...] As it has been demonstrated, the scope can even reach up to a fully fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c). For tc, maps are allowed to be used, but from kernel context only, in other words, eBPF code can keep state across filter invocations. In future, we perhaps may reattach from a different application to those maps e.g., to read out collected statistics/state. Similarly as in socket filters, we may extend functionality for eBPF classifiers over time depending on the use cases. For that purpose, cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so we can allow additional functions/accessors (e.g. an ABI compatible offset translation to skb fields/metadata). For an initial cls_bpf support, we allow the same set of helper functions as eBPF socket filters, but we could diverge at some point in time w/o problem. I was wondering whether cls_bpf and act_bpf could share C programs, I can imagine that at some point, we introduce i) further common handlers for both (or even beyond their scope), and/or if truly needed ii) some restricted function space for each of them. Both can be abstracted easily through struct bpf_verifier_ops in future. The context of cls_bpf versus act_bpf is slightly different though: a cls_bpf program will return a specific classid whereas act_bpf a drop/non-drop return code, latter may also in future mangle skbs. That said, we can surely have a "classify" and "action" section in a single object file, or considered mentioned constraint add a possibility of a shared section. The workflow for getting native eBPF running from tc [1] is as follows: for f_bpf, I've added a slightly modified ELF parser code from Alexei's kernel sample, which reads out the LLVM compiled object, sets up maps (and dynamically fixes up map fds) if any, and loads the eBPF instructions all centrally through the bpf syscall. The resulting fd from the loaded program itself is being passed down to cls_bpf, which looks up struct bpf_prog from the fd store, and holds reference, so that it stays available also after tc program lifetime. On tc filter destruction, it will then drop its reference. Moreover, I've also added the optional possibility to annotate an eBPF filter with a name (e.g. path to object file, or something else if preferred) so that when tc dumps currently installed filters, some more context can be given to an admin for a given instance (as opposed to just the file descriptor number). Last but not least, bpf_prog_get() and bpf_prog_put() needed to be exported, so that eBPF can be used from cls_bpf built as a module. Thanks to 60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only") I think this is of no concern since anything wanting to alter eBPF opcode after verification stage would crash the kernel. [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>