aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter/xt_string.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-05-23netfilter: nf_nat: add nat type hooks to nat coreFlorian Westphal9-289/+232
Currently the packet rewrite and instantiation of nat NULL bindings happens from the protocol specific nat backend. Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type. Invocation looks like this (simplified): NF_HOOK() | `---iptable_nat | `---> nf_nat_l3proto_ipv4 -> nf_nat_packet | new packet? pass skb though iptables nat chain | `---> iptable_nat: ipt_do_table In nft case, this looks the same (nft_chain_nat_ipv4 instead of iptable_nat). This is a problem for two reasons: 1. Can't use iptables nat and nf_tables nat at the same time, as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a NULL binding if do_table() did not find a matching nat rule so we can detect post-nat tuple collisions). 2. If you use e.g. nft_masq, snat, redir, etc. uses must also register an empty base chain so that the nat core gets called fro NF_HOOK() to do the reverse translation, which is neither obvious nor user friendly. After this change, the base hook gets registered not from iptable_nat or nftables nat hooks, but from the l3 nat core. iptables/nft nat base hooks get registered with the nat core instead: NF_HOOK() | `---> nf_nat_l3proto_ipv4 -> nf_nat_packet | new packet? pass skb through iptables/nftables nat chains | +-> iptables_nat: ipt_do_table +-> nft nat chain x `-> nft nat chain y The nat core deals with null bindings and reverse translation. When no mapping exists, it calls the registered nat lookup hooks until one creates a new mapping. If both iptables and nftables nat hooks exist, the first matching one is used (i.e., higher priority wins). Also, nft users do not need to create empty nat hooks anymore, nat core always registers the base hooks that take care of reverse/reply translation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: nf_nat: add nat hook register functions to nf_natFlorian Westphal2-0/+161
This adds the infrastructure to register nat hooks with the nat core instead of the netfilter core. nat hooks are used to configure nat bindings. Such hooks are registered from ip(6)table_nat or by the nftables core when a nat chain is added. After next patch, nat hooks will be registered with nf_nat instead of netfilter core. This allows to use many nat lookup functions at the same time while doing the real packet rewrite (nat transformation) in one place. This change doesn't convert the intended users yet to ease review. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: core: export raw versions of add/delete hook functionsFlorian Westphal2-21/+59
This will allow the nat core to reuse the nf_hook infrastructure to maintain nat lookup functions. The raw versions don't assume a particular hook location, the functions get added/deleted from the hook blob that is passed to the functions. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: nf_tables: allow chain type to override hook registerFlorian Westphal4-23/+47
Will be used in followup patch when nat types no longer use nf_register_net_hook() but will instead register with the nat core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: xtables: allow table definitions not backed by hook_opsFlorian Westphal2-2/+8
The ip(6)tables nat table is currently receiving skbs from the netfilter core, after a followup patch skbs will be coming from the netfilter nat core instead, so the table is no longer backed by normal hook_ops. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: nf_nat: move common nat code to nat coreFlorian Westphal4-99/+81
Copy-pasted, both l3 helpers almost use same code here. Split out the common part into an 'inet' helper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-17netfilter: nft_hash: add map lookups for hashing operationsLaura Garcia Liebana2-1/+134
This patch creates new attributes to accept a map as argument and then perform the lookup with the generated hash accordingly. Both current hash functions are supported: Jenkins and Symmetric Hash. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-17netfilter: nft_numgen: add map lookups for numgen random operationsLaura Garcia Liebana1-4/+72
This patch uses the map lookup already included to be applied for random number generation. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-17netfilter: nf_tables: remove old nf_log based tracingFlorian Westphal1-22/+7
nfnetlink tracing is available since nft 0.6 (June 2016). Remove old nf_log based tracing to avoid rule counter in main loop. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-17netfilter: fix fallout from xt/nf osf separationFlorian Westphal2-6/+8
Stephen Rothwell says: today's linux-next build (x86_64 allmodconfig) produced this warning: ./usr/include/linux/netfilter/nf_osf.h:25: found __[us]{8,16,32,64} type without #include <linux/types.h> Fix that up and also move kernel-private struct out of uapi (it was not exposed in any released kernel version). tested via allmodconfig build + make headers_check. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: bfb15f2a95cb ("netfilter: extract Passive OS fingerprint infrastructure from xt_osf") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-13net: ipv4: ipconfig: fix unused variableAnders Roxell1-3/+2
When CONFIG_PROC_FS isn't set, variable ipconfig_dir isn't used. net/ipv4/ipconfig.c:167:31: warning: ‘ipconfig_dir’ defined but not used [-Wunused-variable] static struct proc_dir_entry *ipconfig_dir; ^~~~~~~~~~~~ Move the declaration of ipconfig_dir inside the CONFIG_PROC_FS ifdef to fix the warning. Fixes: c04d2cb2009f ("ipconfig: Write NTP server IPs to /proc/net/ipconfig/ntp_servers") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net sched actions: fix refcnt leak in skbmodRoman Mashak1-1/+4
When application fails to pass flags in netlink TLV when replacing existing skbmod action, the kernel will leak refcnt: $ tc actions get action skbmod index 1 total acts 0 action order 0: skbmod pipe set smac 00:11:22:33:44:55 index 1 ref 1 bind 0 For example, at this point a buggy application replaces the action with index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags, however refcnt gets bumped: $ tc actions get actions skbmod index 1 total acts 0 action order 0: skbmod pipe set smac 00:11:22:33:44:55 index 1 ref 2 bind 0 $ Tha patch fixes this by calling tcf_idr_release() on existing actions. Fixes: 86da71b57383d ("net_sched: Introduce skbmod action") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: phy: DP83TC811: Introduce support for the DP83TC811 phyDan Murphy3-0/+353
Add support for the DP83811 phy. The DP83811 supports both rgmii and sgmii interfaces. There are 2 part numbers for this the DP83TC811R does not reliably support the SGMII interface but the DP83TC811S will. There is not a way to differentiate these parts from the hardware or register set. So this is controlled via the DT to indicate which phy mode is required. Or the part can be strapped to a certain interface. Data sheet can be found here: http://www.ti.com/product/DP83TC811S-Q1/description http://www.ti.com/product/DP83TC811R-Q1/description Signed-off-by: Dan Murphy <dmurphy@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: sched: fix error path in tcf_proto_create() when modules are not configuredJiri Pirko1-1/+1
In case modules are not configured, error out when tp->ops is null and prevent later null pointer dereference. Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11cxgb4: avoid schedule while atomicGanesh Goudar1-2/+2
do not sleep while adding or deleting udp tunnel. Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks") Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11cxgb4: enable inner header checksum calculationGanesh Goudar3-22/+83
set cntrl bits to indicate whether inner header checksum needs to be calculated whenever the packet is an encapsulated packet and enable supported encap features. Fixes: d0a1299c6bf7 ("cxgb4: add support for vxlan segmentation offload") Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11cxgb4: Fix {vxlan/geneve}_port initializationArjun Vynipadath2-0/+23
adapter->rawf_cnt was not initialized, thereby ndo_udp_tunnel_{add/del} was returning immediately without initializing {vxlan/geneve}_port. Also initializes mps_encap_entry refcnt. Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks") Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11cxgb4: Add new T5 device idGanesh Goudar1-0/+1
Add 0x50ad device id for new T5 card. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: doc: fix spelling mistake: "modrobe.d" -> "modprobe.d"Tonghao Zhang1-1/+1
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11bonding: use the skb_get/set_queue_mappingTonghao Zhang1-4/+4
Use the skb_get_queue_mapping, skb_set_queue_mapping and skb_rx_queue_recorded for skb queue_mapping in bonding driver, but not use it directly. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11bonding: replace the return value typeTonghao Zhang2-8/+12
The method ndo_start_xmit is defined as returning a netdev_tx_t, which is a typedef for an enum type, but the implementation in this driver returns an int. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11erspan: auto detect truncated ipv6 packets.William Tu2-0/+12
Currently the truncated bit is set only when 1) the mirrored packet is larger than mtu and 2) the ipv4 packet tot_len is larger than the actual skb->len. This patch adds another case for detecting whether ipv6 packet is truncated or not, by checking the ipv6 header payload_len and the skb->len. Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11mlxsw: spectrum_span: Use a more fitting error codePetr Machata1-1/+1
ENOENT is suitable when an item is looked for in a collection and can't be found. The failure here is actually a depletion of a resource, where ENOBUFS is the more fitting error code. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11mlxsw: spectrum_span: Rename misnamed variable l3edevPetr Machata1-16/+16
Calling the variable l3edev was relevant when neighbor lookup was the last stage in the simulated pipeline. Now that mlxsw handles bridges and vlan devices as well, calling it "L3" is a misnomer. Thus in mlxsw_sp_span_dmac(), rename to "dev", because that function is just a service routine where the distinction between tunnel and egress device isn't necessary. In mlxsw_sp_span_entry_tunnel_parms_common(), rename to "edev" to emphasize that the routine traces packet egress. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net sched actions: fix invalid pointer dereferencing if skbedit flags missingRoman Mashak1-1/+2
When application fails to pass flags in netlink TLV for a new skbedit action, the kernel results in the following oops: [ 8.307732] BUG: unable to handle kernel paging request at 0000000000021130 [ 8.309167] PGD 80000000193d1067 P4D 80000000193d1067 PUD 180e0067 PMD 0 [ 8.310595] Oops: 0000 [#1] SMP PTI [ 8.311334] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw [ 8.314190] CPU: 1 PID: 397 Comm: tc Not tainted 4.17.0-rc3+ #357 [ 8.315252] RIP: 0010:__tcf_idr_release+0x33/0x140 [ 8.316203] RSP: 0018:ffffa0718038f840 EFLAGS: 00010246 [ 8.317123] RAX: 0000000000000001 RBX: 0000000000021100 RCX: 0000000000000000 [ 8.319831] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000021100 [ 8.321181] RBP: 0000000000000000 R08: 000000000004adf8 R09: 0000000000000122 [ 8.322645] R10: 0000000000000000 R11: ffffffff9e5b01ed R12: 0000000000000000 [ 8.324157] R13: ffffffff9e0d3cc0 R14: 0000000000000000 R15: 0000000000000000 [ 8.325590] FS: 00007f591292e700(0000) GS:ffff8fcf5bc40000(0000) knlGS:0000000000000000 [ 8.327001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.327987] CR2: 0000000000021130 CR3: 00000000180e6004 CR4: 00000000001606a0 [ 8.329289] Call Trace: [ 8.329735] tcf_skbedit_init+0xa7/0xb0 [ 8.330423] tcf_action_init_1+0x362/0x410 [ 8.331139] ? try_to_wake_up+0x44/0x430 [ 8.331817] tcf_action_init+0x103/0x190 [ 8.332511] tc_ctl_action+0x11a/0x220 [ 8.333174] rtnetlink_rcv_msg+0x23d/0x2e0 [ 8.333902] ? _cond_resched+0x16/0x40 [ 8.334569] ? __kmalloc_node_track_caller+0x5b/0x2c0 [ 8.335440] ? rtnl_calcit.isra.31+0xf0/0xf0 [ 8.336178] netlink_rcv_skb+0xdb/0x110 [ 8.336855] netlink_unicast+0x167/0x220 [ 8.337550] netlink_sendmsg+0x2a7/0x390 [ 8.338258] sock_sendmsg+0x30/0x40 [ 8.338865] ___sys_sendmsg+0x2c5/0x2e0 [ 8.339531] ? pagecache_get_page+0x27/0x210 [ 8.340271] ? filemap_fault+0xa2/0x630 [ 8.340943] ? page_add_file_rmap+0x108/0x200 [ 8.341732] ? alloc_set_pte+0x2aa/0x530 [ 8.342573] ? finish_fault+0x4e/0x70 [ 8.343332] ? __handle_mm_fault+0xbc1/0x10d0 [ 8.344337] ? __sys_sendmsg+0x53/0x80 [ 8.345040] __sys_sendmsg+0x53/0x80 [ 8.345678] do_syscall_64+0x4f/0x100 [ 8.346339] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.347206] RIP: 0033:0x7f591191da67 [ 8.347831] RSP: 002b:00007fff745abd48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 8.349179] RAX: ffffffffffffffda RBX: 00007fff745abe70 RCX: 00007f591191da67 [ 8.350431] RDX: 0000000000000000 RSI: 00007fff745abdc0 RDI: 0000000000000003 [ 8.351659] RBP: 000000005af35251 R08: 0000000000000001 R09: 0000000000000000 [ 8.352922] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000 [ 8.354183] R13: 00007fff745afed0 R14: 0000000000000001 R15: 00000000006767c0 [ 8.355400] Code: 41 89 d4 53 89 f5 48 89 fb e8 aa 20 fd ff 85 c0 0f 84 ed 00 00 00 48 85 db 0f 84 cf 00 00 00 40 84 ed 0f 85 cd 00 00 00 45 84 e4 <8b> 53 30 74 0d 85 d2 b8 ff ff ff ff 0f 8f b3 00 00 00 8b 43 2c [ 8.358699] RIP: __tcf_idr_release+0x33/0x140 RSP: ffffa0718038f840 [ 8.359770] CR2: 0000000000021130 [ 8.360438] ---[ end trace 60c66be45dfc14f0 ]--- The caller calls action's ->init() and passes pointer to "struct tc_action *a", which later may be initialized to point at the existing action, otherwise "struct tc_action *a" is still invalid, and therefore dereferencing it is an error as happens in tcf_idr_release, where refcnt is decremented. So in case of missing flags tcf_idr_release must be called only for existing actions. v2: - prepare patch for net tree Fixes: 5e1567aeb7fe ("net sched: skbedit action fix late binding") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11ixgbe: fix memory leak on ipsec allocationColin Ian King1-1/+1
The error clean up path kfree's adapter->ipsec and should be instead kfree'ing ipsec. Fix this. Also, the err1 error exit path does not need to kfree ipsec because this failure path was for the failed allocation of ipsec. Detected by CoverityScan, CID#146424 ("Resource Leak") Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-11ixgbevf: fix ixgbevf_xmit_frame()'s return typeLuc Van Oostenryck1-1/+1
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, but the implementation in this driver returns an 'int'. Fix this by returning 'netdev_tx_t' in this driver too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-11ixgbe: return error on unsupported SFP module when resettingEmil Tantilov1-0/+3
Add check for unsupported module and return the error code. This fixes a Coverity hit due to unused return status from setup_sfp. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-11ice: Set rq_last_status when cleaning rqJeff Shaw1-1/+1
Prior to this commit, the rq_last_status was only set when hardware responded with an error. This leads to rq_last_status being invalid in the future when hardware eventually responds without error. This commit resolves the issue by unconditionally setting rq_last_status with the value returned in the descriptor. Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-11Change Trond's email address in MAINTAINERSTrond Myklebust1-1/+1
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-11sh: switch to NO_BOOTMEMRob Herring4-82/+7
Commit 0fa1c579349f ("of/fdt: use memblock_virt_alloc for early alloc") inadvertently switched the DT unflattening allocations from memblock to bootmem which doesn't work because the unflattening happens before bootmem is initialized. Swapping the order of bootmem init and unflattening could also fix this, but removing bootmem is desired. So enable NO_BOOTMEM on SH like other architectures have done. Fixes: 0fa1c579349f ("of/fdt: use memblock_virt_alloc for early alloc") Reported-by: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Rich Felker <dalias@libc.org>
2018-05-11mmap: introduce sane default mmap limitsLinus Torvalds1-0/+32
The internal VM "mmap()" interfaces are based on the mmap target doing everything using page indexes rather than byte offsets, because traditionally (ie 32-bit) we had the situation that the byte offset didn't fit in a register. So while the mmap virtual address was limited by the word size of the architecture, the backing store was not. So we're basically passing "pgoff" around as a page index, in order to be able to describe backing store locations that are much bigger than the word size (think files larger than 4GB etc). But while this all makes a ton of sense conceptually, we've been dogged by various drivers that don't really understand this, and internally work with byte offsets, and then try to work with the page index by turning it into a byte offset with "pgoff << PAGE_SHIFT". Which obviously can overflow. Adding the size of the mapping to it to get the byte offset of the end of the backing store just exacerbates the problem, and if you then use this overflow-prone value to check various limits of your device driver mmap capability, you're just setting yourself up for problems. The correct thing for drivers to do is to do their limit math in page indices, the way the interface is designed. Because the generic mmap code _does_ test that the index doesn't overflow, since that's what the mmap code really cares about. HOWEVER. Finding and fixing various random drivers is a sisyphean task, so let's just see if we can just make the core mmap() code do the limiting for us. Realistically, the only "big" backing stores we need to care about are regular files and block devices, both of which are known to do this properly, and which have nice well-defined limits for how much data they can access. So let's special-case just those two known cases, and then limit other random mmap users to a backing store that still fits in "unsigned long". Realistically, that's not much of a limit at all on 64-bit, and on 32-bit architectures the only worry might be the GPU drivers, which can have big physical address spaces. To make it possible for drivers like that to say that they are 64-bit clean, this patch does repurpose the "FMODE_UNSIGNED_OFFSET" bit in the file flags to allow drivers to mark their file descriptors as safe in the full 64-bit mmap address space. [ The timing for doing this is less than optimal, and this should really go in a merge window. But realistically, this needs wide testing more than it needs anything else, and being main-line is the only way to do that. So the earlier the better, even if it's outside the proper development cycle - Linus ] Cc: Kees Cook <keescook@chromium.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Willy Tarreau <w@1wt.eu> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11udp: avoid refcount_t saturation in __udp_gso_segment()Eric Dumazet1-3/+11
For some reason, Willem thought that the issue we fixed for TCP in commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()") was not relevant for UDP GSO. But syzbot found its way. refcount_t: saturated; leaking memory. WARNING: CPU: 0 PID: 10261 at lib/refcount.c:78 refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 10261 Comm: syz-executor5 Not tainted 4.17.0-rc3+ #38 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 panic+0x22f/0x4de kernel/panic.c:184 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 RIP: 0010:refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78 RSP: 0018:ffff880196db6b90 EFLAGS: 00010282 RAX: 0000000000000026 RBX: 00000000ffffff01 RCX: ffffc900040d9000 RDX: 0000000000004a29 RSI: ffffffff8160f6f1 RDI: ffff880196db66f0 RBP: ffff880196db6c78 R08: ffff8801b33d6740 R09: 0000000000000002 R10: ffff8801b33d6740 R11: 0000000000000000 R12: 0000000000000000 R13: 00000000ffffffff R14: ffff880196db6c50 R15: 0000000000020101 refcount_add+0x1b/0x70 lib/refcount.c:102 __udp_gso_segment+0xaa5/0xee0 net/ipv4/udp_offload.c:272 udp4_ufo_fragment+0x592/0x7a0 net/ipv4/udp_offload.c:301 inet_gso_segment+0x639/0x12b0 net/ipv4/af_inet.c:1342 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865 skb_gso_segment include/linux/netdevice.h:4050 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3122 __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3579 dev_queue_xmit+0x17/0x20 net/core/dev.c:3620 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1401 neigh_output include/net/neighbour.h:483 [inline] ip_finish_output2+0xa5f/0x1840 net/ipv4/ip_output.c:229 ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:277 [inline] ip_output+0x21b/0x850 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:444 [inline] ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124 ip_send_skb+0x40/0xe0 net/ipv4/ip_output.c:1434 udp_send_skb.isra.37+0x5eb/0x1000 net/ipv4/udp.c:825 udp_push_pending_frames+0x5c/0xf0 net/ipv4/udp.c:853 udp_v6_push_pending_frames+0x380/0x3e0 net/ipv6/udp.c:1105 udp_lib_setsockopt+0x59a/0x600 net/ipv4/udp.c:2403 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1447 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: ad405857b174 ("udp: better wmem accounting on gso") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11tcp: switch pacing timer to softirq based hrtimerEric Dumazet3-46/+29
linux-4.16 got support for softirq based hrtimers. TCP can switch its pacing hrtimer to this variant, since this avoids going through a tasklet and some atomic operations. pacing timer logic looks like other (jiffies based) tcp timers. v2: use hrtimer_try_to_cancel() in tcp_clear_xmit_timers() to correctly release reference on socket if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: dsa: bcm_sf2: Get rid of PHYLIB functionsFlorian Fainelli1-149/+0
Now that we have converted the bcm_sf2 driver to implement PHYLINK MAC operations, we can remove the PHYLIB callbacks: adjust_link() and fixed_link_update() which are no longer called by DSA. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: dsa: Plug in PHYLINK supportFlorian Fainelli3-132/+172
Add support for PHYLINK within the DSA subsystem in order to support more complex devices such as pluggable (SFP) and non-pluggable (SFF) modules, 10G PHYs, and traditional PHYs. Using PHYLINK allows us to drop some amount of complexity we had while probing fixed and non-fixed PHYs using Device Tree. Because PHYLINK separates the Ethernet MAC/port configuration into different stages, we let switch drivers implement those, and for now, we maintain functionality by calling dsa_slave_adjust_link() during phylink_mac_link_{up,down} which provides semantically equivalent steps. Drivers willing to take advantage of PHYLINK should implement the phylink_mac_* operations that DSA wraps. We cannot quite remove the adjust_link() callback just yet, because a number of drivers rely on that for configuring their "CPU" and "DSA" ports, this is done dsa_port_setup_phy_of() and dsa_port_fixed_link_register_of() still. Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2) will need to implement phylink_mac_ops from now on to preserve functionality, since PHYLINK *does not* create a phy_device instance for fixed links. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: dsa: mv88e6xxx: add PHYLINK supportRussell King3-0/+125
Add rudimentary phylink support to mv88e6xxx. This allows the driver using user ports with fixed links to keep operating normally. User ports with normal PHYs are not affected since the switch automatically manages their link parameters. User facing ports which use a SFP/SFF with a non-fixed link mode might require a call to phylink_mac_change() to operate properly. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> [Andrew: fixed link setting after adding link polling] Signed-off-by: Andrew Lunn <andrew@lunn.ch> [florian: expand commit message] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: dsa: Eliminate dsa_slave_get_link()Florian Fainelli1-11/+1
Since we use PHYLIB to manage the per-port link indication, this will also be reflected correctly in the network device's carrier state, so we can use ethtool_op_get_link() instead. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: dsa: bcm_sf2: Implement phylink_mac_opsFlorian Fainelli1-8/+206
Make the bcm_sf2 driver implement phylink_mac_ops since it needs to support a wide variety of network interfaces: internal & external MDIO PHYs, fixed PHYs, MoCA with MMIO link status. A large amount of what needs to be done already exists under bcm_sf2_sw_adjust_link() so we are essentially breaking this down into the necessary operation for PHYLINK to work: mac_config, mac_link_up, mac_link_down and validate. We can now entirely get rid of most of what fixed_link_update() provided because only the link information is actually necessary. We still have to force DUPLEX_FULL for legacy Device Tree bindings that did not specify that before. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: dsa: Add PHYLINK switch operationsFlorian Fainelli3-1/+30
In preparation for adding support for PHYLINK within DSA, define a number of operations that we will need and that switch drivers can start implementing. Proper integration with PHYLINK will follow in subsequent patches. We start selecting PHYLINK (which implies PHYLIB) in net/dsa/Kconfig such that drivers can be guaranteed that this dependency is properly taken care of and can start referencing PHYLINK helper functions without requiring stubs or anything. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: phy: phylink: Poll link GPIOsRussell King1-0/+16
When using a fixed link with a link GPIO, we need to poll that GPIO to determine link state changes. This is consistent with what fixed_phy.c does. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: phy: phylink: Release link GPIOFlorian Fainelli1-0/+2
We are not releasing the link GPIO descriptor with gpiod_put() which results in subsequent probing to get -EBUSY when calling fwnode_get_named_gpiod(). Fix this by doing the release in phylink_destroy(). Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: phy: phylink: Use gpiod_get_value_cansleep()Florian Fainelli1-1/+1
The GPIO provider for the link GPIO line might require the use of the _cansleep() API, utilize that. This is safe to do since we run in workqueue context. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsgAndrey Ignatov2-4/+10
Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed earlier in 919483096bfe. * udp_sendmsg one was there since the beginning when linux sources were first added to git; * ping_v4_sendmsg one was copy/pasted in c319b4d76b9e. Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options have to be freed if they were allocated previously. Add label so that future callers (if any) can use it instead of kfree() before return that is easy to forget. Fixes: c319b4d76b9e (net: ipv4: add IPPROTO_ICMP socket kind) Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'Christophe JAILLET1-2/+2
Resources are not freed in the reverse order of the allocation. Labels are also mixed-up. Fix it and reorder code and labels in the error handling path of 'mlxsw_core_bus_device_register()' Fixes: ef3116e5403e ("mlxsw: spectrum: Register KVD resources with devlink") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11bonding: send learning packets for vlans on slaveDebabrata Banerjee3-5/+11
There was a regression at some point from the intended functionality of commit f60c3704e87d ("bonding: Fix alb mode to only use first level vlans.") Given the return value vlan_get_encap_level() we need to store the nest level of the bond device, and then compare the vlan's encap level to this. Without this, this check always fails and learning packets are never sent. In addition, this same commit caused a regression in the behavior of balance_alb, which requires learning packets be sent for all interfaces using the slave's mac in order to load balance properly. For vlan's that have not set a user mac, we can send after checking one bit. Otherwise we need send the set mac, albeit defeating rx load balancing for that vlan. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11bonding: do not allow rlb updates to invalid macDebabrata Banerjee1-1/+1
Make sure multicast, broadcast, and zero mac's cannot be the output of rlb updates, which should all be directed arps. Receive load balancing will be collapsed if any of these happen, as the switch will broadcast. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11tracing: Fix regex_match_front() to not over compare the test stringSteven Rostedt (VMware)1-0/+3
The regex match function regex_match_front() in the tracing filter logic, was fixed to test just the pattern length from testing the entire test string. That is, it went from strncmp(str, r->pattern, len) to strcmp(str, r->pattern, r->len). The issue is that str is not guaranteed to be nul terminated, and if r->len is greater than the length of str, it can access more memory than is allocated. The solution is to add a simple test if (len < r->len) return 0. Cc: stable@vger.kernel.org Fixes: 285caad415f45 ("tracing/filters: Fix MATCH_FRONT_ONLY filter matching") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-10compat: fix 4-byte infoleak via uninitialized struct fieldJann Horn1-0/+1
Commit 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") removed the memset() in compat_get_timex(). Since then, the compat adjtimex syscall can invoke do_adjtimex() with an uninitialized ->tai. If do_adjtimex() doesn't write to ->tai (e.g. because the arguments are invalid), compat_put_timex() then copies the uninitialized ->tai field to userspace. Fix it by adding the memset() back. Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-10net/mlx5e: Err if asked to offload TC match on frag being firstRoi Dayan1-0/+4
The HW doesn't support matching on frag first/later, return error if we are asked to offload that. Fixes: 3f7d0eb42d59 ("net/mlx5e: Offload TC matching on packets being IP fragments") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>