linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-09-28	net: do not export sk_stream_write_space	Eric Dumazet	1	-1/+0
	Since commit 900f65d361d3 ("tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()") we no longer need to export sk_stream_write_space() From: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	tcp: Change txhash on every SYN and RTO retransmit	Lawrence Brakmo	1	-0/+4
	The current code changes txhash (flowlables) on every retransmitted SYN/ACK, but only after the 2nd retransmitted SYN and only after tcp_retries1 RTO retransmits. With this patch: 1) txhash is changed with every SYN retransmits 2) txhash is changed with every RTO. The result is that we can start re-routing around failed (or very congested paths) as soon as possible. Otherwise application health checks may fail and the connection may be terminated before we start to change txhash. v4: Removed sysctl, txhash is changed for all RTOs v3: Removed text saying default value of sysctl is 0 (it is 100) v2: Added sysctl documentation and cleaned code Tested with packetdrill tests Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	net/sched: pkt_cls: change tc actions order to be as the user sets	Hadar Hen Zion	1	-1/+1
	Currently the created tc actions list is reversed against the order set by the user. Change the actions list order to be the same as was set by the user. This patch doesn't affect dump actions behavior. For dumping, action->order parameter is used so the list order doesn't matter. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	sh_eth: add R8A7743/5 support	Sergei Shtylyov	3	-1/+5
	Add support for the first two members of the Renesas RZ/G family, RZ/G1M/E (also known as R8A7743/5). The Ether core is the same as in the R-Car gen2 SoCs, so will share the code/data with them... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	doc: update switchdev L3 section	Jiri Pirko	1	-13/+14
	This is to reflect the change of FIB offload infrastructure from switchdev objects to FIB notifier. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	switchdev: remove FIB offload infrastructure	Jiri Pirko	6	-341/+1
	Since this is now taken care of by FIB notifier, remove the code, with all unused dependencies. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	rocker: use FIB notifications instead of switchdev calls	Jiri Pirko	3	-65/+185
	Until now, in order to offload a FIB entry to HW we use switchdev op. Use the newly introduced FIB notifier infrasturucture to process FIB entries to offload the in HW. Abort mechanism is now handled within the rocker driver. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls	Jiri Pirko	3	-195/+260
	Until now, in order to offload a FIB entry to HW we use switchdev op. However that has limits. Mainly in case we need to make the HW aware of all route prefixes configured in kernel. HW needs to know those in order to properly trap appropriate packets and pass the to kernel to do the forwarding. Abort mechanism is now handled within the mlxsw driver. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	fib: introduce FIB info offload flag helpers	Jiri Pirko	2	-2/+15
	These helpers are to be used in case someone offloads the FIB entry. The result is that if the entry is offloaded to at least one device, the offload flag is set. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	fib: introduce FIB notification infrastructure	Jiri Pirko	4	-14/+108
	This allows to pass information about added/deleted FIB entries/rules to whoever is interested. This is done in a very similar way as devinet notifies address additions/removals. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28	net/sched: cls_flower: Use a proper mask value for enc key id parameter	Hadar Hen Zion	1	-2/+2
	The current code use the encapsulation key id value as the mask of that parameter which is wrong. Fix that by using a full mask. Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	lib: clean up put_cpu_var usage	Shaohua Li	1	-2/+2
	put_cpu_var takes the percpu data, not the data returned from get_cpu_var. This doesn't change the behavior. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	bpf: clean up put_cpu_var usage	Shaohua Li	1	-1/+1
	put_cpu_var takes the percpu data, not the data returned from get_cpu_var. This doesn't change the behavior. Cc: Tejun Heo <tj@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	igb: restore PPS signal on igb_ptp_reset	Jacob Keller	2	-1/+5
	When a reset occurs, the PPS SYS_WRAP interrupt was not re-enabled which resulted in disabling of the PPS signaling. Fix this by recording when the interrupt is on and ensuring that we re-enable it every time we reset. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-27	igb: bump version to igb-5.4.0	Todd Fujinaka	1	-1/+1
	Bump igb version to match other igb drivers. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-27	igbvf: bump version to igbvf-2.4.0	Todd Fujinaka	1	-1/+1
	Bump version to match other igbvf drivers. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-27	igb: fix non static symbol warning	Wei Yongjun	1	-2/+2
	Fixes the following sparse warning: drivers/net/ethernet/intel/igb/igb_ethtool.c:2707:5: warning: symbol 'igb_rxnfc_write_vlan_prio_filter' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-27	bnx2x: free the mac filter group list before freeing the cmd	jbaron@akamai.com	1	-1/+1
	The group list must be freed prior to freeing the command otherwise we have a use-after-free. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Yuval Mintz <Yuval.Mintz@qlogic.com> Cc: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	net: ethernet: mediatek: bug fix to disable HW LRO	Nelson Chang	2	-2/+3
	(1) Modify the register settings for LRO relinquishments (2) Jump out from the waiting loop while LRO relinquishments are done Signed-off-by: Nelson Chang <nelson.chang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	net: ethernet: mediatek: add to stop PDMA while stopping the frame engine	Nelson Chang	1	-0/+1
	Stop PDMA while the frame engine is going to stop. Signed-off-by: Nelson Chang <nelson.chang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	net: bcmgenet: use new api ethtool_{get\|set}_link_ksettings	Philippe Reynes	1	-8/+8
	The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	Revert "net: ethernet: bcmgenet: use phydev from struct net_device"	Florian Fainelli	3	-31/+39
	This reverts commit 62469c76007e ("net: ethernet: bcmgenet: use phydev from struct net_device") because it causes GENETv1/2/3 adapters to expose the following behavior after an ifconfig down/up sequence: PING fainelli-linux (10.112.156.244): 56 data bytes 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.352 ms 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.472 ms (DUP!) 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.496 ms (DUP!) 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.517 ms (DUP!) 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.536 ms (DUP!) 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.557 ms (DUP!) 64 bytes from 10.112.156.244: seq=1 ttl=61 time=752.448 ms (DUP!) This was previously fixed by commit 5dbebbb44a6a ("net: bcmgenet: Software reset EPHY after power on") but the commit we are reverting was essentially making this previous commit void, here is why. Without commit 62469c76007e we would have the following scenario after an ifconfig down then up sequence: - bcmgenet_open() calls bcmgenet_power_up() to make sure the PHY is initialized before we get to initialize the UniMAC, this is critical to ensure the PHY is in a correct state, priv->phydev is valid, this code executes fine - second time from bcmgenet_mii_probe(), through the normal phy_init_hw() call (which arguably could be optimized out) Everything is fine in that case. With commit 62469c76007e, we would have the following scenario to happen after an ifconfig down then up sequence: - bcmgenet_close() calls phy_disonnect() which makes dev->phydev become NULL - when bcmgenet_open() executes again and calls bcmgenet_mii_reset() from bcmgenet_power_up() to initialize the internal PHY, the NULL check becomes true, so we do not reset the PHY, yet we keep going on and initialize the UniMAC, causing MAC activity to occur - we call bcmgenet_mii_reset() from bcmgenet_mii_probe(), but this is too late, the PHY is botched, and causes the above bogus pings/packets transmission/reception to occur Reported-by: Jaedon Shin <jaedon.shin@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	Revert "net: ethernet: bcmgenet: use new api ethtool_{get\|set}_link_ksettings"	Philippe Reynes	1	-8/+8
	This reverts commit 6b352ebccbcf ("net: ethernet: broadcom: bcmgenet: use new api ethtool_{get\|set}_link_ksettings"). We needs to revert the commit 62469c76007e ("net: ethernet: bcmgenet: use phydev from struct net_device"), because this commit add a regression. As the commit 6b352ebccbcf ("net: ethernet: broadcom: bcmgenet: use new api ethtool_{get\|set}_link_ksettings") depend on the first one, we also need to revert it first. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	bpf: Set register type according to is_valid_access()	Mickaël Salaün	1	-3/+2
	This prevent future potential pointer leaks when an unprivileged eBPF program will read a pointer value from its context. Even if is_valid_access() returns a pointer type, the eBPF verifier replace it with UNKNOWN_VALUE. The register value that contains a kernel address is then allowed to leak. Moreover, this fix allows unprivileged eBPF programs to use functions with (legitimate) pointer arguments. Not an issue currently since reg_type is only set for PTR_TO_PACKET or PTR_TO_PACKET_END in XDP and TC programs that can only be loaded as privileged. For now, the only unprivileged eBPF program allowed is for socket filtering and all the types from its context are UNKNOWN_VALUE. However, this fix is important for future unprivileged eBPF programs which could use pointers in their context. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	bpf samples: update tracex5 sample to use __seccomp_filter	Naveen N. Rao	2	-9/+10
	seccomp_phase1() does not exist anymore. Instead, update sample to use __seccomp_filter(). While at it, set max locked memory to unlimited. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27	bpf samples: fix compiler errors with sockex2 and sockex3	Naveen N. Rao	3	-11/+11
	These samples fail to compile as 'struct flow_keys' conflicts with definition in net/flow_dissector.h. Fix the same by renaming the structure used in the sample. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	net: tg3: use new api ethtool_{get\|set}_link_ksettings	Philippe Reynes	1	-50/+62
	The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	bnx2x: allocate mac filtering pending list in PAGE_SIZE increments	Jason Baron	1	-37/+86
	Currently, we can have high order page allocations that specify GFP_ATOMIC when configuring multicast MAC address filters. For example, we have seen order 2 page allocation failures with ~500 multicast addresses configured. Convert the allocation for the pending list to be done in PAGE_SIZE increments. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Yuval Mintz <Yuval.Mintz@qlogic.com> Cc: Ariel Elior <Ariel.Elior@qlogic.com> Acked-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	bnx2x: allocate mac filtering 'mcast_list' in PAGE_SIZE increments	Jason Baron	1	-28/+51
	Currently, we can have high order page allocations that specify GFP_ATOMIC when configuring multicast MAC address filters. For example, we have seen order 2 page allocation failures with ~500 multicast addresses configured. Convert the allocation for 'mcast_list' to be done in PAGE_SIZE increments. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Yuval Mintz <Yuval.Mintz@qlogic.com> Cc: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	nfp: bpf: improve handling for disabled BPF syscall	Arnd Bergmann	2	-10/+3
	I stumbled over a new warning during randconfig testing, with CONFIG_BPF_SYSCALL disabled: drivers/net/ethernet/netronome/nfp/nfp_net_offload.c: In function 'nfp_net_bpf_offload': drivers/net/ethernet/netronome/nfp/nfp_net_offload.c:263:3: error: '((void )&res+4)' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/net/ethernet/netronome/nfp/nfp_net_offload.c:263:3: error: 'res.n_instr' may be used uninitialized in this function [-Werror=maybe-uninitialized] As far as I can tell, this is a false positive caused by the compiler getting confused about a function that is partially inlined, but it's easy to avoid while improving the code: The nfp_bpf_jit() stub helper for that configuration is unusual as it is defined in a header file but not marked 'static inline'. By moving the compile-time check into the caller using the IS_ENABLED() macro, we can remove that stub and simplify the nfp_net_bpf_offload_prepare() function enough to unconfuse the compiler. Fixes: 7533fdc0f77f ("nfp: bpf: add hardware bpf offload") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	net: bcmgenet: remove unused function in bcmgenet.c	Baoyou Xie	1	-122/+0
	We get 1 warning when building kernel with W=1: drivers/net/ethernet/broadcom/genet/bcmgenet.c:2763:5: warning: no previous prototype for 'bcmgenet_hfb_add_filter' [-Wmissing-prototypes] In fact, this function is implemented in drivers/net/ethernet/broadcom/genet/bcmgenet.c, but be called by no one, thus can be removed. So this patch removes the unused functions. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	cxgb4: mark symbols static where possible	Baoyou Xie	2	-11/+14
	We get 10 warnings when building kernel with W=1: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:304:5: warning: no previous prototype for 'cxgb4_dcb_enabled' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:194:5: warning: no previous prototype for 'setup_sge_queues_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:241:6: warning: no previous prototype for 'free_sge_queues_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:268:5: warning: no previous prototype for 'cfg_queues_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:344:6: warning: no previous prototype for 'free_queues_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:353:5: warning: no previous prototype for 'request_msix_queue_irqs_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:379:6: warning: no previous prototype for 'free_msix_queue_irqs_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:393:6: warning: no previous prototype for 'name_msix_vecs_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:433:6: warning: no previous prototype for 'enable_rx_uld' [-Wmissing-prototypes] drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:442:6: warning: no previous prototype for 'quiesce_rx_uld' [-Wmissing-prototypes] In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	net: mvneta: mark symbols static where possible	Baoyou Xie	1	-4/+6
	We get 2 warnings when building kernel with W=1: drivers/net/ethernet/marvell/mvneta.c:639:27: warning: no previous prototype for 'mvneta_get_stats64' [-Wmissing-prototypes] drivers/net/ethernet/marvell/mvneta.c:3529:5: warning: no previous prototype for 'mvneta_ethtool_set_link_ksettings' [-Wmissing-prototypes] In fact, these two functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	net: hip04: mark tx_done() static	Baoyou Xie	1	-1/+1
	We get 1 warning when building kernel with W=1: drivers/net/ethernet/hisilicon/hip04_eth.c:603:22: warning: no previous prototype for 'tx_done' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks this function with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-26	net: hisilicon: mark symbols static where possible	Baoyou Xie	1	-3/+3
	We get 2 warnings when building kernel with W=1: drivers/net/ethernet/hisilicon/hisi_femac.c:943:5: warning: no previous prototype for 'hisi_femac_drv_suspend' [-Wmissing-prototypes] drivers/net/ethernet/hisilicon/hisi_femac.c:960:5: warning: no previous prototype for 'hisi_femac_drv_resume' [-Wmissing-prototypes] In fact, these two functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25	net: dsa: mv88e6xxx: fix non static symbol warnings	Wei Yongjun	1	-4/+4
	Fixes the following sparse warnings: drivers/net/dsa/mv88e6xxx/chip.c:219:5: warning: symbol 'mv88e6xxx_port_read' was not declared. Should it be static? drivers/net/dsa/mv88e6xxx/chip.c:227:5: warning: symbol 'mv88e6xxx_port_write' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25	be2net: fix non static symbol warnings	Wei Yongjun	1	-2/+2
	Fixes the following sparse warnings: drivers/net/ethernet/emulex/benet/be_main.c:47:25: warning: symbol 'be_err_recovery_workq' was not declared. Should it be static? drivers/net/ethernet/emulex/benet/be_main.c:63:25: warning: symbol 'be_wq' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25	net: smc91x: take into account register shift	Robert Jarzmik	1	-0/+3
	This aligns smc91x with its cousin, namely smc911x.c. This also allows the driver to run also in a device-tree based lubbock board build, on which it was tested. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25	cxgb4: fix -ve error check on a signed iq	Colin Ian King	1	-5/+5
	iq is unsigned, so the error check for iq < 0 has no effect so errors can slip past this check. Fix this by making iq signed and also get_filter_steerq return a signed int so a -ve error can be returned. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25	netfilter: nf_log: get rid of XT_LOG_* macros	Liping Zhang	3	-12/+12
	nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros here is not appropriate. Replace them with NF_LOG_XXX. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: nft_log: complete NFTA_LOG_FLAGS attr support	Liping Zhang	10	-18/+32
	NFTA_LOG_FLAGS attribute is already supported, but the related NF_LOG_XXX flags are not exposed to the userspace. So we cannot explicitly enable log flags to log uid, tcp sequence, ip options and so on, i.e. such rule "nft add rule filter output log uid" is not supported yet. So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In order to keep consistent with other modules, change NF_LOG_MASK to refer to all supported log flags. On the other hand, add a new NF_LOG_DEFAULT_MASK to refer to the original default log flags. Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the userspace. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: nf_tables: add range expression	Pablo Neira Ayuso	5	-2/+178
	Inverse ranges != [a,b] are not currently possible because rules are composites of && operations, and we need to express this: data < a \|\| data > b This patch adds a new range expression. Positive ranges can be already through two cmp expressions: cmp(sreg, data, >=) cmp(sreg, data, <=) This new range expression provides an alternative way to express this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: xt_socket: fix transparent match for IPv6 request sockets	KOVACS Krisztian	2	-1/+1
	The introduction of TCP_NEW_SYN_RECV state, and the addition of request sockets to the ehash table seems to have broken the --transparent option of the socket match for IPv6 (around commit a9407000). Now that the socket lookup finds the TCP_NEW_SYN_RECV socket instead of the listener, the --transparent option tries to match on the no_srccheck flag of the request socket. Unfortunately, that flag was only set for IPv4 sockets in tcp_v4_init_req() by copying the transparent flag of the listener socket. This effectively causes '-m socket --transparent' not match on the ACK packet sent by the client in a TCP handshake. Based on the suggestion from Eric Dumazet, this change moves the code initializing no_srccheck to tcp_conn_request(), rendering the above scenario working again. Fixes: a940700003 ("netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support") Signed-off-by: Alex Badics <alex.badics@balabit.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: evict stale entries when user reads /proc/net/nf_conntrack	Florian Westphal	1	-0/+5
	Fabian reports a possible conntrack memory leak (could not reproduce so far), however, one minor issue can be easily resolved: > cat /proc/net/nf_conntrack \| wc -l = 5 > 4 minutes required to clean up the table. We should not report those timed-out entries to the user in first place. And instead of just skipping those timed-out entries while iterating over the table we can also zap them (we already do this during ctnetlink walks, but I forgot about the /proc interface). Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer") Reported-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: xt_hashlimit: Create revision 2 to support higher pps rates	Vishwanath Pai	2	-68/+285
	Create a new revision for the hashlimit iptables extension module. Rev 2 will support higher pps of upto 1 million, Version 1 supports only 10k. To support this we have to increase the size of the variables avg and burst in hashlimit_cfg to 64-bit. Create two new structs hashlimit_cfg2 and xt_hashlimit_mtinfo2 and also create newer versions of all the functions for match, checkentry and destroy. Some of the functions like hashlimit_mt, hashlimit_mt_check etc are very similar in both rev1 and rev2 with only minor changes, so I have split those functions and moved all the common code to a *_common function. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: xt_hashlimit: Prepare for revision 2	Vishwanath Pai	1	-30/+31
	I am planning to add a revision 2 for the hashlimit xtables module to support higher packets per second rates. This patch renames all the functions and variables related to revision 1 by adding _v1 at the end of the names. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: nft_ct: report error if mark and dir specified simultaneously	Liping Zhang	1	-0/+2
	NFT_CT_MARK is unrelated to direction, so if NFTA_CT_DIRECTION attr is specified, report EINVAL to the userspace. This validation check was already done at nft_ct_get_init, but we missed it in nft_ct_set_init. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: nft_ct: unnecessary to require dir when use ct l3proto/protocol	Liping Zhang	1	-10/+9
	Currently, if the user want to match ct l3proto, we must specify the direction, for example: # nft add rule filter input ct original l3proto ipv4 ^^^^^^^^ Otherwise, error message will be reported: # nft add rule filter input ct l3proto ipv4 nft add rule filter input ct l3proto ipv4 <cmdline>:1:1-38: Error: Could not process rule: Invalid argument add rule filter input ct l3proto ipv4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Actually, there's no need to require NFTA_CT_DIRECTION attr, because ct l3proto and protocol are unrelated to direction. And for compatibility, even if the user specify the NFTA_CT_DIRECTION attr, do not report error, just skip it. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack	Gao Feng	1	-8/+12
	It is valid that the TCP RST packet which does not set ack flag, and bytes of ack number are zero. But current seqadj codes would adjust the "0" ack to invalid ack number. Actually seqadj need to check the ack flag before adjust it for these RST packets. The following is my test case client is 10.26.98.245, and add one iptable rule: iptables -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2: --connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with tcp-reset This iptables rule could generate on TCP RST without ack flag. server:10.172.135.55 Enable the synproxy with seqadjust by the following iptables rules iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m tcp --syn -j CT --notrack iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack --ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460 iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack --ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT The following is my test result. 1. packet trace on client root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829, win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7], length 0 IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266, ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884, nop,wscale 7], length 0 IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229, options [nop,nop,TS val 452367885 ecr 15643479], length 0 IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226, options [nop,nop,TS val 15643479 ecr 452367885], length 0 IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830, win 0, length 0 2. seqadj log on server [62873.867319] Adjusting sequence number from 602341895->546723267, ack from 3695959830->3695959830 [62873.867644] Adjusting sequence number from 602341895->546723267, ack from 3695959830->3695959830 [62873.869040] Adjusting sequence number from 3695959830->3695959830, ack from 0->55618628 To summarize, it is clear that the seqadj codes adjust the 0 ack when receive one TCP RST packet without ack. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25	netfilter: replace list_head with single linked list	Aaron Conole	10	-116/+167
	The netfilter hook list never uses the prev pointer, and so can be trimmed to be a simple singly-linked list. In addition to having a more light weight structure for hook traversal, struct net becomes 5568 bytes (down from 6400) and struct net_device becomes 2176 bytes (down from 2240). Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>