aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/workqueue.txt (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2016-08-10net: resolve symbol conflicts with generic hashtable.hJiri Kosina5-28/+28
This is a preparatory patch for converting qdisc linked list into a hashtable. As we'll need to include hashtable.h in netdevice.h, we first have to make sure that this will not introduce symbol conflicts for any of the netdevice.h users. Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10ravb: use proper names for suspend/resume functionsNiklas Söderlund1-3/+3
The patch 'ravb: add sleep PM suspend/resume support' used incorrect function names containing 'runtime' for the suspend and resume functions. Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10net: ipconfig: fix use after freeUwe Kleine-König1-8/+9
ic_close_devs() calls kfree() for all devices's ic_device. Since commit 2647cffb2bc6 ("net: ipconfig: Support using "delayed" DHCP replies") the active device's ic_device is still used however to print the ipconfig summary which results in an oops if the memory is already changed. So delay freeing until after the autoconfig results are reported. Fixes: 2647cffb2bc6 ("net: ipconfig: Support using "delayed" DHCP replies") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09ravb: add sleep PM suspend/resume supportNiklas Söderlund1-8/+64
The interface would not function after the system had been woken up after have been suspended (echo mem > /sys/power/state) cycle. The reason for this is that all device registers have been reset to its default values. This patch adds sleep suspend and resume functions that detached the interface at suspend and restore the registers and reattach the interface at resume. Only the registers that are only configured at probe time needs to be explicitly restored by the resume handler. All other registers are reconfigured by either reopening the device in the resume handler (if the device was running when the system was suspended) or when the interface is opened by a user at a later time. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09net: dsa: b53: constify b53_io_ops structuresJulia Lawall6-6/+8
The b53_io_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09net: Remove fib_local variableDavid Ahern2-8/+0
After commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") fib_local is set but not used. Remove it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09ppp: build ifname using unit identifier for rtnl based devicesGuillaume Nault1-0/+9
Userspace programs generally need to know the name of the ppp devices they create. Both ioctl and rtnl interfaces use the ppp<suffix> sheme to name them. But although the suffix used by the ioctl interface can be known by userspace (it's the PPP unit identifier returned by the PPPIOCGUNIT ioctl), the one used by the rtnl is only known by the kernel. This patch brings more consistency between ioctl and rtnl based ppp devices by generating device names using the PPP unit identifer as suffix in both cases. This way, userspace can always infer the name of the devices they create. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08RDS: add __printf format attribute to error reporting functionsNicolas Iooss2-0/+2
This is helpful to detect at compile-time errors related to format strings. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08Microsemi VSC 8531/41 PHY DriverRaju Lakkaraju3-0/+167
Hello, I added all review comments and re-sending for review. >From a5017f5878a92d2acec86a6a29b1498c457cb73a Mon Sep 17 00:00:00 2001 From: Nagaraju Lakkaraju <Raju.Lakkaraju@microsemi.com> Date: Wed, 3 Aug 2016 18:28:24 +0530 Subject: [PATCH v2] net: phy: Add drivers for Microsemi PHYs Signed-off-by: Nagaraju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net/fsl: use of_property_read_boolJulia Lawall1-5/+2
Use of_property_read_bool to check for the existence of a property. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e1,e2,x; @@ - if (of_get_property(e1,e2,NULL)) - x = true; - else - x = false; + x = of_property_read_bool(e1,e2); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08hv_netvsc: Add handler for physical link speed changeHaiyang Zhang3-7/+25
On Hyper-V host 2016 and later, VMs gets an event message of the physical link speed when vSwitch is changed. This patch handles this message, so the updated link speed can be reported by ethtool. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08hv_netvsc: Add query for initial physical link speedHaiyang Zhang2-2/+26
The physical link speed value will be reported by ethtool command. The real speed is available from Windows 2016 host or later. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ethernet: ti: cpdma: remove used_desc counterGrygorii Strashko1-11/+7
The struct cpdma_desc_pool->used_desc field can be safely removed from CPDMA driver (and hot patch) because used_descs counter is used just for pool consistency check at CPDMA deinitialization and now this check can be re-implemnted using gen_pool_size(pool->gen_pool) != gen_pool_avail(pool->gen_pool). More over, this will allow to get rid of warnings in cpdma_desc_pool_destro()-> WARN_ON(pool->used_desc) which may happen because the used_descs is used unprotected, since CPDMA has been switched to use genalloc, and may get wrong values on SMP. Hence, remove used_desc from struct cpdma_desc_pool. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net/sched/sch_hfsc.c: remove unused cl_myfadjMichal Soltys1-5/+2
The code using this variable has been commented out in the past as it was causing issues in upperlimited link-sharing scenarios. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net/sched/sch_hfsc.c: keep fsc and virtual times in sync; fix an old bugMichal Soltys1-32/+12
This patch simplifies how we update fsc and calculate vt from it - while keeping the expected functionality identical with how hfsc behaves curently. It also fixes a certain issue introduced with a very old patch. The idea is, that instead of correcting cl_vt before fsc curve update (rtsc_min) and correcting cl_vt after calculation (rtsc_y2x) to keep cl_vt local to the current period - we can simply rely on virtual times and curve values always being in sync - analogously to how rsc and usc function, except that we use virtual time here. Why hasn't it been done since the beginning this way ? The likely scenario (basing on the code trying to correct curves whenever possible) was to keep the virtual times as small as possible - as they have tendency to "gallop" forward whenever their siblings and other fair sharing subtrees are idling. On top of that, current code is subtly bugged, so cumulative time (without any corrections) is always kept and used in init_vf() when a new backlog period begins (using cl_cvtoff). Is cumulative value safe ? Generally yes, though corner cases are easy to create. For example consider: 1gbit interface some 100kbit leaf, everything else idle With current tick (64ns) 1s is 15625000 ticks, but the leaf is alone and it's virtual time, so in reality it's 10000 times more. ITOW 38 bits are needed to hold 1 second. 54 - 1 day, 59 - 1 month, 63 - 1 year (all logarithms rounded up). It's getting somewhat dangerous, but also requires setup excusing this kind of values not mentioning permanently backlogged class for a year. In near most extreme case (10gbit, 10kbit leaf), we have "enough" to hold ~13.6 days in 64 bits. Well, the issue remains mostly theoretical and cl_cvtoff has been working fine for all those years. Sensible configuration are de-facto immune to this issue, and not so sensible can solve it with a cronjob and its period inversely proportional to the insanity of such setup =) Now let's explain the subtle bug mentioned earlier. The issue is related to how offsets are kept and how we calculate virtual times and update fair service curve(s). The issue itself is subtle, but easy to observe with long m1 segments. It was introduced in rather old patch: Commit 99296150c7: "[NET_SCHED]: O(1) children vtoff adjustment in HFSC scheduler" (available in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git) Originally when a new backlog period was started, cl_vtoff of each sibling was updated with cl_cvtmax from past period - naturally moving all cl_vt to proper starting point. That patch adjusted it so cumulative offset is kept in the parent, and there is no need for traversing the list (as any subsequent child activation derives new vt from already active sibling(s)). But with this change, cl_vtoff (of each sibling) is no longer persistent across the inactivity periods, as it's calculated from parent's cl_cvtoff on a new backlog period, conflicting with the following curve correction from the previous period: if (cl->cl_virtual.x == vt) { cl->cl_virtual.x -= cl->cl_vtoff; cl->cl_vtoff = 0; } This essentially tries to keep curve as if it was local to the period and resets cl_vtoff (cumulative vt offset of the class) to 0 when possible (read: when we have an intersection or if a new curve is below the old one). But then it's recalculated from cl_cvtoff on next active period. Then rtsc_min() call preceding the above if() doesn't really do what we expect it to do in such scenario - as it calculates the minimum of corrected curve (from the previous backlog period) and the new uncorrected curve (with offset derived from cl_cvtoff). Example: tc class add dev $ife parent 1:0 classid 1:1 hfsc ls m2 100mbit ul m2 100mbit tc class add dev $ife parent 1:1 classid 1:10 hfsc ls m1 80mbit d 10s m2 20mbit tc class add dev $ife parent 1:1 classid 1:11 hfsc ls m2 20mbit start B, keep it backlogged, let it run 6s (30s worth of vt as A is idle) pause B briefly to force cl_cvtoff update in parent (whole 1:1 going idle) start A, let it run 10s pause A briefly to force rtsc_min() At this point we would expect A to continue at 20mbit after a brief moment of 80mbit. But instead A will use 80mbit for full 10s again. It's the effect of first correcting A (during 'start A'), and then - after unpausing - calculating rtsc_min() from old corrected and new uncorrected curve. The patch fixes this bug and keepis vt and fsc in sync (virtual times are cumulative, not local to the backlog period). Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08qed: Use DEFINE_SPINLOCK() for spinlockWei Yongjun1-7/+1
spinlock can be initialized automatically with DEFINE_SPINLOCK() rather than explicitly calling spin_lock_init(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net/multicast: should not send source list records when have filter mode changeHangbin Liu2-0/+20
Based on RFC3376 5.1 and RFC3810 6.1 If the per-interface listening change that triggers the new report is a filter mode change, then the next [Robustness Variable] State Change Reports will include a Filter Mode Change Record. This applies even if any number of source list changes occur in that period. Old State New State State Change Record Sent --------- --------- ------------------------ INCLUDE (A) EXCLUDE (B) TO_EX (B) EXCLUDE (A) INCLUDE (B) TO_IN (B) So we should not send source-list change if there is a filter-mode change. Here are two scenarios: 1. Group deleted and filter mode is EXCLUDE, which means we need send a TO_IN { }. 2. Not group deleted, but has pcm->crcount, which means we need send a normal filter-mode-change. At the same time, if the type is ALLOW or BLOCK, and have psf->sf_crcount, we stop add records and decrease sf_crcount directly Reference: https://www.ietf.org/mail-archive/web/magma/current/msg01274.html Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ethernet: marvell: mvneta: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes1-21/+13
The ethtool api {get|set}_settings is deprecated. We move the mvneta driver to new api {get|set}_link_ksettings. We use the generic function phy_ethtool_get_link_ksettings, and update old mvneta_ethtool_set_settings to the new api. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ethernet: marvell: mvneta: use phydev from struct net_devicePhilippe Reynes1-18/+16
The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phy_dev in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ethernet: greth: use phy_ethtool_{get|set}_link_ksettingsPhilippe Reynes1-21/+2
There are two generics functions phy_ethtool_{get|set}_link_ksettings, so we can use them instead of defining the same code in the driver. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ethernet: greth: use phydev from struct net_devicePhilippe Reynes2-13/+11
The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phy in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ethernet: octeon: use phy_ethtool_{get|set}_link_ksettingsPhilippe Reynes1-21/+2
There are two generics functions phy_ethtool_{get|set}_link_ksettings, so we can use them instead of defining the same code in the driver. There was a check on CAP_NET_ADMIN in cvm_oct_set_settings, but this check is already done in dev_ethtool, so no need to repeat it before calling the generic function. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ethernet: octeon: use phydev from struct net_devicePhilippe Reynes4-37/+26
The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phydev in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bna: remove global bnad_list_mutexIvan Vecera1-20/+0
Remove global bnad_list_mutex as it is not used anymore. This makes bnad_add_to_list() and bnad_remove_from_list() empty so remove them too. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bna: change type of bna_id to atomic_tIvan Vecera1-2/+2
Change type of bna_id to atomic_t. The bnad_list_mutex is used to prevent a race when bna_id is incremented. After the change the mutex can be removed in the next step. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bna: remove useless linked listIvan Vecera2-4/+0
Remove global variable bnad_list and bnad->list_entry that are used as list of bna driver instances. It is not necessary and useless. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ipconfig: drop inter-device timeoutUwe Kleine-König1-4/+5
Now that ipconfig learned to handle "delayed replies" in the previous commit, there is no reason any more to delay sending a first request per device. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ipconfig: Support using "delayed" DHCP repliesUwe Kleine-König1-19/+10
The dhcp code only waits 1s between sending DHCP requests on different devices and only accepts an answer for the device that sent out the last request. Only the timeout at the end of a loop is increased iteratively which favours only the last device. This makes it impossible to work with a dhcp server that takes little more than 1s connected to a device that is not the last one. Instead of also increasing the inter-device timeout, teach the code to handle delayed replies. To accomplish that, make *ic_dev track the current ic_device instead of the current net_device and adapt all users accordingly. The relevant change then is to reset d to ic_dev on a reply to assert that the followup request goes through the right device. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: ipconfig: Add device name to debug messagesUwe Kleine-König1-6/+6
This simplifies understanding what happens when there is more than one device. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08be2net: replace polling with sleeping in the FW completion pathSathya Perla3-158/+327
The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with BHs disabled. The driver currently issues the required cmds to the FW in these contexts and polls on completions from the FW, while BHs remain disabled. This can cause either packet loss or packet reception to be delayed on that CPU. This patch defers processing of the above cmds to a separate workqueue. With this change, FW cmds are now issued only in process context. Now that the FW cmds are issued only in process context, they can sleep waiting for a completion instead of polling. All the spin_lock_bh(mcc_lock) calls are now replaced with mutex calls. Also a new rx_filter_lock is now needed to protect the RX filtering fields like vids[] between be_vlan_add/rem_vid() and __be_set_rx_mode() contexts. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08be2net: Avoid unnecessary firmware updates of multicast listSriharsha Basavapatna2-41/+130
Eachtime the ndo_set_rx_mode() routine is called, the driver programs the multicast list in the adapter without checking if there are any changes to the list. This leads to a flood of RX_FILTER cmds when a number of vlan interfaces are configured over the device, as the ndo_ gets called for each vlan interface. To avoid this, we now use __dev_mc_sync() and __dev_uc_sync() API, but only to detect if there is a change in the mc/uc lists. Now that we use this API, the code has to be-designed to issue these API calls for each invocation of the be_set_rx_mode() call. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08be2net: do not remove vids from driver table if be_vid_config() fails.Sathya Perla1-7/+1
The driver currently removes a new vid from the adapter->vids[] array if be_vid_config() returns an error, which occurs when there is an error in HW/FW. This is wrong. After the HW/FW error is recovered from, we need the complete vids[] array to re-program the vlan list. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08be2net: clear vlan-promisc setting before programming the vlan listSomnath Kotur1-2/+5
The Lancer FW has a bug due to which in some cases vlan-promisc setting is cleared eventhough the vlan-list programming did not succeed (via VLAN_CONFIG) cmd. The driver has no way of knowing if the vlan-promisc mode was cleared or not when this cmd fails. To work around this issue, this patch first explicitly clears the vlan-promisc mode via RX_FILTER cmd and then tries to program the vlan list. Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08neigh: allow admin to set NUD_STALEJulian Anastasov1-1/+2
Admin should be able to set any state. Currently, this fails when lladdr is not changed and state is changed from NUD_CONNECTED to NUD_STALE: ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0 ip neigh show to 192.168.8.1 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0 ip neigh show to 192.168.8.1 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT Problem may be from 2.1.X days. Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Chunhui He <hchunhui@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08sctp: use event->chunk when it's validXin Long1-2/+2
Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available") used event->chunk->head_skb to get the head_skb in sctp_ulpevent_set_owner(). But at that moment, the event->chunk was NULL, as it cloned the skb in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really work. This patch is to move the event->chunk initialization before calling sctp_ulpevent_receive_data() so that it uses event->chunk when it's valid. Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: vxlan: lwt: Fix vxlan local traffic.pravin shelar1-2/+2
vxlan driver has bypass for local vxlan traffic, but that depends on information about all VNIs on local system in vxlan driver. This is not available in case of LWT. Therefore following patch disable encap bypass for LWT vxlan traffic. Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Reported-by: Jakub Libosvar <jlibosva@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: vxlan: lwt: Use source ip address during route lookup.pravin shelar1-12/+18
LWT user can specify destination as well as source ip address for given tunnel endpoint. But vxlan is ignoring given source ip address. Following patch uses both ip address to route the tunnel packet. This consistent with other LWT implementations, like GENEVE and GRE. Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bpf: fix checksum for vlan push/pop helperDaniel Borkmann1-0/+12
When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't push rcsum of mac header back in and after BPF run back pull out again as opposed to some other subsystems (ovs, for example). For cases like q-in-q, meaning when a vlan tag for offloading is already present and we're about to push another one, then skb_vlan_push() pushes the inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing the 4 bytes vlan header diff. Likewise, for the reverse operation in skb_vlan_pop() for the case where vlan header needs to be pulled out of the skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes rcsum of the vlan header that was removed. However mangling the rcsum here will lead to hw csum failure for BPF case, since we're pulling or pushing data that was not part of the current rcsum. Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN() is also not really an option since current behaviour is ABI by now, but apart from that would also mean to do quite a bit of useless work in the sense that usually 12 bytes need to be rcsum pushed/pulled also when we don't need to touch this vlan related corner case. One way to fix it would be to push the necessary rcsum fixup down into vlan helpers that are (mostly) slow-path anyway. Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bpf: fix checksum fixups on bpf_skb_store_bytesDaniel Borkmann2-21/+35
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way. Where we ran into an issue with bpf_skb_store_bytes() is when we did a single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The underlying issue has been tracked down to a buffer alignment issue. Meaning, that csum_partial() computations via skb_postpull_rcsum() and skb_postpush_rcsum() pair invoked had a wrong result since they operated on an odd address for the hoplimit, while other computations were done on an even address. This mix doesn't work as-is with skb_postpull_rcsum(), skb_postpush_rcsum() pair as it always expects at least half-word alignment of input buffers, which is normally the case. Thus, instead of these helpers using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(), csum_block_add(), respectively. For unaligned offsets, they rotate the sum to align it to a half-word boundary again, otherwise they work the same as csum_sub() and csum_add(). Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers use a 0 constant for offset so that the compiler optimizes the offset & 1 test away and generates the same code as with csum_sub()/_add(). Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bpf: also call skb_postpush_rcsum on xmit occasionsDaniel Borkmann1-3/+10
Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect locations which need CHECKSUM_COMPLETE fixups on ingress. For the same reasons as described in f8ffad69c9f8 already, we of course also need this here, since dev_queue_xmit() on a veth device will let us end up in the dev_forward_skb() helper again to cross namespaces. Latter then calls into skb_postpull_rcsum() to pull out L2 header, so that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers. Also here we have to address bpf_redirect() and bpf_clone_redirect(). Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net/ethernet: tundra: fix dump_eth_one warning in tsi108_ethPaul Gortmaker1-0/+2
The call site for this function appears as: #ifdef DEBUG data->msg_enable = DEBUG; dump_eth_one(dev); #endif ...leading to the following warning for !DEBUG builds: drivers/net/ethernet/tundra/tsi108_eth.c:169:13: warning: 'dump_eth_one' defined but not used [-Wunused-function] static void dump_eth_one(struct net_device *dev) ^ ...when using the arch/powerpc/configs/mpc7448_hpc2_defconfig Put the function definition under the same #ifdef as the call site to avoid the warning. Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08mlxsw: spectrum: Add missing DCB rollback in error pathIdo Schimmel1-0/+1
We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but I missed its rollback in the error path of port creation, so add it. Fixes: f00817df2b42 ("mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08mlxsw: spectrum: Do not override PAUSE settingsIdo Schimmel1-0/+2
The PFCC register is used to configure both PAUSE and PFC frames. Therefore, when PFC frames are disabled we must make sure we don't mistakenly also disable PAUSE frames (which might be enabled). Fix this by packing the PFCC register with the current PAUSE settings. Note that this register is also accessed via ethtool ops, but there we are guaranteed to have PFC disabled. Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08mlxsw: spectrum: Do not assume PAUSE frames are disabledIdo Schimmel1-4/+4
When ieee_setpfc() gets called, PAUSE frames are not necessarily disabled on the port. Check if PAUSE frames are disabled or enabled and configure the port's headroom buffer accordingly. Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08rhashtable-test: Fix max_size parameter descriptionPhil Sutter1-1/+1
Looks like a simple copy'n'paste error. Fixes: 1aa661f5c3df1 ("rhashtable-test: Measure time to insert, remove & traverse entries") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08sctp_diag: Respect ss adding TCPF_CLOSE to idiag_statesPhil Sutter1-2/+2
Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't rely upon TCPF_LISTEN flag solely being present when listening sockets are requested. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08sctp_diag: Fix T3_rtx timer exportPhil Sutter1-4/+10
The asoc's timer value is not kept in asoc->timeouts array but in it's primary transport instead. Furthermore, we must export the timer only if it is pending, otherwise the value will underrun when stored in an unsigned variable and user space will only see a very large timeout value. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08sctp: Export struct sctp_info to userspacePhil Sutter2-64/+64
This is required to correctly interpret INET_DIAG_INFO messages exported by sctp_diag module. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06net: macb: Correct CAPS maskHarini Katakam1-1/+1
USRIO and JUMBO CAPS have the same mask. Fix the same. Fixes: ce721a702197 ("net: ethernet: cadence-macb: Add disabled usrio caps") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Harini Katakam <harinik@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06samples/bpf: add bpf_map_update_elem() testsAlexei Starovoitov1-2/+13
increase test coverage to check previously missing 'update when full' Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>