aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-05-02bpf, x64: fix memleak when not converging after imageDaniel Borkmann1-2/+2
While reviewing x64 JIT code, I noticed that we leak the prior allocated JIT image in the case where proglen != oldproglen during the JIT passes. Prior to the commit e0ee9c12157d ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") we would just break out of the loop, and using the image as the JITed prog since it could only shrink in size anyway. After e0ee9c12157d, we would bail out to out_addrs label where we free addrs and jit_data but not the image coming from bpf_jit_binary_alloc(). Fixes: e0ee9c12157d ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02bpf: minor fix to selftest test_stacktrace_build_id()Song Liu1-2/+2
1. remove useless parameter list to ./urandom_read 2. add missing "\n" to the end of an error message Fixes: 81f77fd0deeb ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-26bpf: fix uninitialized variable in bpf toolsJohn Fastabend1-2/+5
Here the variable cont is used as the saved_pointer for a call to strtok_r(). It is safe to use the value uninitialized in this context however and the later reference is only ever used if the strtok_r is successful. But, 'gcc-5' at least doesn't have all this knowledge so initialize cont to NULL. Additionally, do the natural NULL check before accessing just for completness. The warning is the following: ./bpf/tools/bpf/bpf_dbg.c: In function ‘cmd_load’: ./bpf/tools/bpf/bpf_dbg.c:1077:13: warning: ‘cont’ may be used uninitialized in this function [-Wmaybe-uninitialized] } else if (matches(subcmd, "pcap") == 0) { Fixes: fd981e3c321a "filter: bpf_dbg: add minimal bpf debugger" Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-25bpf: fix for lex/yacc build error with gcc-5John Fastabend1-0/+2
Fix build error found with Ubuntu shipped gcc-5 ~/git/bpf/tools/bpf$ make all Auto-detecting system features: ... libbfd: [ OFF ] ... disassembler-four-args: [ OFF ] CC bpf_jit_disasm.o LINK bpf_jit_disasm CC bpf_dbg.o /home/john/git/bpf/tools/bpf/bpf_dbg.c: In function ‘cmd_load’: /home/john/git/bpf/tools/bpf/bpf_dbg.c:1077:13: warning: ‘cont’ may be used uninitialized in this function [-Wmaybe-uninitialized] } else if (matches(subcmd, "pcap") == 0) { ^ LINK bpf_dbg CC bpf_asm.o make: *** No rule to make target `bpf_exp.yacc.o', needed by `bpf_asm'. Stop. Fixes: 5a8997f20715 ("tools: bpf: respect output directory during build") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-25rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qpDag Moxnes1-1/+2
The function rds_ib_setup_qp is calling rds_ib_get_client_data and should correspondingly call rds_ib_dev_put. This call was lost in the non-error path with the introduction of error handling done in commit 3b12f73a5c29 ("rds: ib: add error handle") Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25net/smc: keep clcsock reference in smc_tcp_listen_work()Ursula Braun1-4/+0
The internal CLC socket should exist till the SMC-socket is released. Function tcp_listen_worker() releases the internal CLC socket of a listen socket, if an smc_close_active() is called. This function is called for the final release(), but it is called for shutdown SHUT_RDWR as well. This opens a door for protection faults, if socket calls using the internal CLC socket are called for a shutdown listen socket. With the changes of commit 3d502067599f ("net/smc: simplify wait when closing listen socket") there is no need anymore to release the internal CLC socket in function tcp_listen_worker((). It is sufficient to release it in smc_release(). Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+9045fc589fcd196ef522@syzkaller.appspotmail.com Reported-by: syzbot+28a2c86cf19c81d871fa@syzkaller.appspotmail.com Reported-by: syzbot+9605e6cace1b5efd4a0a@syzkaller.appspotmail.com Reported-by: syzbot+cf9012c597c8379d535c@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25net: phy: allow scanning busses with missing physAlexandre Belloni1-1/+10
Some MDIO busses will error out when trying to read a phy address with no phy present at that address. In that case, probing the bus will fail because __mdiobus_register() is scanning the bus for all possible phys addresses. In case MII_PHYSID1 returns -EIO or -ENODEV, consider there is no phy at this address and set the phy ID to 0xffffffff which is then properly handled in get_phy_device(). Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25bpf, x64: fix JIT emission for dead codeGianluca Borello1-1/+11
Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead code with a series of ja-1 instructions, for safety. That made JIT compilation much more complex for some BPF programs. One instance of such programs is, for example: bool flag = false ... /* A bunch of other code */ ... if (flag) do_something() In some cases llvm is not able to remove at compile time the code for do_something(), so the generated BPF program ends up with a large amount of dead instructions. In one specific real life example, there are two series of ~500 and ~1000 dead instructions in the program. When the verifier replaces them with a series of ja-1 instructions, it causes an interesting behavior at JIT time. During the first pass, since all the instructions are estimated at 64 bytes, the ja-1 instructions end up being translated as 5 bytes JMP instructions (0xE9), since the jump offsets become increasingly large (> 127) as each instruction gets discovered to be 5 bytes instead of the estimated 64. Starting from the second pass, the first N instructions of the ja-1 sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets become <= 127 this time. In particular, N is defined as roughly 127 / (5 - 2) ~= 42. So, each further pass will make the subsequent N JMP instructions shrink from 5 to 2 bytes, making the image shrink every time. This means that in order to have the entire program converge, there need to be, in the real example above, at least ~1000 / 42 ~= 24 passes just for translating the dead code. If we add this number to the passes needed to translate the other non dead code, it brings such program to 40+ passes, and JIT doesn't complete. Ultimately the userspace loader fails because such BPF program was supposed to be part of a prog array owner being JITed. While it is certainly possible to try to refactor such programs to help the compiler remove dead code, the behavior is not really intuitive and it puts further burden on the BPF developer who is not expecting such behavior. To make things worse, such programs are working just fine in all the kernel releases prior to the ja-1 fix. A possible approach to mitigate this behavior consists into noticing that for ja-1 instructions we don't really need to rely on the estimated size of the previous and current instructions, we know that a -1 BPF jump offset can be safely translated into a 0xEB instruction with a jump offset of -2. Such fix brings the BPF program in the previous example to complete again in ~9 passes. Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing") Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-25bpf: clear the ip_tunnel_info.William Tu1-0/+1
The percpu metadata_dst might carry the stale ip_tunnel_info and cause incorrect behavior. When mixing tests using ipv4/ipv6 bpf vxlan and geneve tunnel, the ipv6 tunnel info incorrectly uses ipv4's src ip addr as its ipv6 src address, because the previous tunnel info does not clean up. The patch zeros the fields in ip_tunnel_info. Signed-off-by: William Tu <u9012063@gmail.com> Reported-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-24ice: Fix insufficient memory issue in ice_aq_manage_mac_readMd Fahad Iqbal Polash1-5/+17
For the MAC read operation, the device can return up to two (LAN and WoL) MAC addresses. Without access to adequate memory, the device will return an error. Fixed this by allocating the right amount of memory. Also, logic to detect and copy the LAN MAC address into the port_info structure has been added. Note that the WoL MAC address is ignored currently as the WoL feature isn't supported yet. Fixes: dc49c7723676 ("ice: Get MAC/PHY/link info and scheduler topology") Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24sfc: ARFS filter IDsEdward Cree6-46/+337
Associate an arbitrary ID with each ARFS filter, allowing to properly query for expiry. The association is maintained in a hash table, which is protected by a spinlock. v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot). v2: fixed uninitialised variable (thanks davem and lkp-robot). Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24net: ethtool: Add missing kernel doc for FEC parametersFlorian Fainelli1-0/+2
While adding support for ethtool::get_fecparam and set_fecparam, kernel doc for these functions was missed, add those. Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24packet: fix bitfield update raceWillem de Bruijn2-21/+49
Updates to the bitfields in struct packet_sock are not atomic. Serialize these read-modify-write cycles. Move po->running into a separate variable. Its writes are protected by po->bind_lock (except for one startup case at packet_create). Also replace a textual precondition warning with lockdep annotation. All others are set only in packet_setsockopt. Serialize these updates by holding the socket lock. Analogous to other field updates, also hold the lock when testing whether a ring is active (pg_vec). Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg") Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24ice: Do not check INTEVENT bit for OICR interruptsBen Shelton2-6/+0
According to the hardware spec, checking the INTEVENT bit isn't a reliable way to detect if an OICR interrupt has occurred. This is because this bit can be cleared by the hardware/firmware before the interrupt service routine has run. So instead, just check for OICR events every time. Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24ice: Fix incorrect comment for action typeAnirudh Venkataramanan1-1/+1
Action type 5 defines large action generic values. Fix comment to reflect that better. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24ice: Fix initialization for num_nodes_addedAnirudh Venkataramanan1-2/+2
ice_sched_add_nodes_to_layer is used recursively, and so we start with num_nodes_added being 0. This way, in case of an error or if num_nodes is NULL, the function just returns 0 to indicate that no nodes were added. Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24igb: Fix the transmission mode of queue 0 for Qav modeVinicius Costa Gomes1-1/+16
When Qav mode is enabled, queue 0 should be kept on Stream Reservation mode. From the i210 datasheet, section 8.12.19: "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to Qav." ("QueueMode 1b" represents the Stream Reservation mode) The solution is to give queue 0 the all the credits it might need, so it has priority over queue 1. A situation where this can happen is when cbs is "installed" only on queue 1, leaving queue 0 alone. For example: $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24ixgbevf: ensure xdp_ring resources are free'd on error exitColin Ian King1-1/+1
The current error handling for failed resource setup for xdp_ring data is a break out of the loop and returning 0 indicated everything was OK, when in fact it is not. Fix this by exiting via the error exit label err_setup_tx that will clean up the resources correctly and return and error status. Detected by CoverityScan, CID#1466879 ("Logically dead code") Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24team: fix netconsole setup over teamXin Long1-7/+12
The same fix in Commit dbe173079ab5 ("bridge: fix netconsole setup over bridge") is also needed for team driver. While at it, remove the unnecessary parameter *team from team_port_enable_netpoll(). v1->v2: - fix it in a better way, as does bridge. Fixes: 0fb52a27a04a ("team: cleanup netpoll clode") Reported-by: João Avelino Bellomo Filho <jbellomo@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23amd-xgbe: Only use the SFP supported transceiver signalsTom Lendacky1-17/+54
The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.) that it supports. Update the driver to include checking the eeprom data when deciding whether to use a transceiver signal. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23amd-xgbe: Improve KR auto-negotiation and trainingTom Lendacky7-4/+160
Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks to improve the ability to successfully complete Clause 73 AN when running at 10gbps. Hardware can sometimes have issues with CDR lock when the AN DME page exchange is being performed. The AN and KR training hooks are used as follows: - The pre AN hook is used to disable CDR tracking in the PHY so that the DME page exchange can be successfully and consistently completed. - The post KR training hook is used to re-enable the CDR tracking so that KR training can successfully complete. - The post AN hook is used to check for an unsuccessful AN which will increase a CDR tracking enablement delay (up to a maximum value). Add two debugfs entries to allow control over use of the CDR tracking workaround. The debugfs entries allow the CDR tracking workaround to be disabled and determine whether to re-enable CDR tracking before or after link training has been initiated. Also, with these changes the receiver reset cycle that is performed during the link status check can be performed less often. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23amd-xgbe: Add pre/post auto-negotiation phy hooksTom Lendacky2-2/+19
Add hooks to the driver auto-negotiation (AN) flow to allow the different phy implementations to perform any steps necessary to improve AN. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23pppoe: check sockaddr length in pppoe_connect()Guillaume Nault1-0/+4
We must validate sockaddr_len, otherwise userspace can pass fewer data than we expect and we end up accessing invalid data. Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers") Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23l2tp: check sockaddr length in pppol2tp_connect()Guillaume Nault1-0/+7
Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that it actually points to valid data. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23net: phy: marvell: clear wol event before setting itJingju Hou1-0/+9
If WOL event happened once, the LED[2] interrupt pin will not be cleared unless we read the CSISR register. If interrupts are in use, the normal interrupt handling will clear the WOL event. Let's clear the WOL event before enabling it if !phy_interrupt_is_valid(). Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24bpf: sockmap, fix double page_put on ENOMEM error in redirect pathJohn Fastabend1-2/+1
In the case where the socket memory boundary is hit the redirect path returns an ENOMEM error. However, before checking for this condition the redirect scatterlist buffer is setup with a valid page and length. This is never unwound so when the buffers are released latter in the error path we do a put_page() and clear the scatterlist fields. But, because the initial error happens before completing the scatterlist buffer we end up with both the original buffer and the redirect buffer pointing to the same page resulting in duplicate put_page() calls. To fix this simply move the initial configuration of the redirect scatterlist buffer below the sock memory check. Found this while running TCP_STREAM test with netperf using Cilium. Fixes: fa246693a111 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-24bpf: sockmap, sk_wait_event needed to handle blocking casesJohn Fastabend1-0/+44
In the recvmsg handler we need to add a wait event to support the blocking use cases. Without this we return zero and may confuse user applications. In the wait event any data received on the sk either via sk_receive_queue or the psock ingress list will wake up the sock. Fixes: fa246693a111 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-24bpf: sockmap, map_release does not hold refcnt for pinned mapsJohn Fastabend4-6/+7
Relying on map_release hook to decrement the reference counts when a map is removed only works if the map is not being pinned. In the pinned case the ref is decremented immediately and the BPF programs released. After this BPF programs may not be in-use which is not what the user would expect. This patch moves the release logic into bpf_map_put_uref() and brings sockmap in-line with how a similar case is handled in prog array maps. Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-23bpf: sockmap sample use clang flag, -target bpfJohn Fastabend1-2/+5
Per Documentation/bpf/bpf_devel_QA.txt add the -target flag to the sockmap Makefile. Relevant text quoted here, Otherwise, you can use bpf target. Additionally, you _must_ use bpf target when: - Your program uses data structures with pointer or long / unsigned long types that interface with BPF helpers or context data structures. Access into these structures is verified by the BPF verifier and may result in verification failures if the native architecture is not aligned with the BPF architecture, e.g. 64-bit. An example of this is BPF_PROG_TYPE_SK_MSG require '-target bpf' Fixes: 69e8cc134bcb ("bpf: sockmap sample program") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-23bpf: Document sockmap '-target bpf' requirement for PROG_TYPE_SK_MSGJohn Fastabend1-1/+9
BPF_PROG_TYPE_SK_MSG programs use a 'void *' for both data and the data_end pointers. Additionally, the verifier ensures that every accesses into the values is a __u64 read. This correctly maps on to the BPF 64-bit architecture. However, to ensure that when building on 32bit architectures that clang uses correct types the '-target bpf' option _must_ be specified. To make this clear add a note to the Documentation. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-23bpf: disable and restore preemption in __BPF_PROG_RUN_ARRAYRoman Gushchin1-0/+2
Running bpf programs requires disabled preemption, however at least some* of the BPF_PROG_RUN_ARRAY users do not follow this rule. To fix this bug, and also to make it not happen in the future, let's add explicit preemption disabling/re-enabling to the __BPF_PROG_RUN_ARRAY code. * for example: [ 17.624472] RIP: 0010:__cgroup_bpf_run_filter_sk+0x1c4/0x1d0 ... [ 17.640890] inet6_create+0x3eb/0x520 [ 17.641405] __sock_create+0x242/0x340 [ 17.641939] __sys_socket+0x57/0xe0 [ 17.642370] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 17.642944] SyS_socket+0xa/0x10 [ 17.643357] do_syscall_64+0x79/0x220 [ 17.643879] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-23ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policyEric Dumazet1-0/+2
KMSAN reported use of uninit-value that I tracked to lack of proper size check on RTA_TABLE attribute. I also believe RTA_PREFSRC lacks a similar check. Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config") Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslaveXin Long1-2/+1
After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set. However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo. One way to reproduce it: # modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole The netpoll won't really work. This patch is to remove that slave_dev npinfo setting in bond_enslave(). Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23tcp: don't read out-of-bounds opsizeJann Horn1-5/+2
The old code reads the "opsize" variable from out-of-bounds memory (first byte behind the segment) if a broken TCP segment ends directly after an opcode that is neither EOL nor NOP. The result of the read isn't used for anything, so the worst thing that could theoretically happen is a pagefault; and since the physmap is usually mostly contiguous, even that seems pretty unlikely. The following C reproducer triggers the uninitialized read - however, you can't actually see anything happen unless you put something like a pr_warn() in tcp_parse_md5sig_option() to print the opsize. ==================================== #define _GNU_SOURCE #include <arpa/inet.h> #include <stdlib.h> #include <errno.h> #include <stdarg.h> #include <net/if.h> #include <linux/if.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/in.h> #include <linux/if_tun.h> #include <err.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <assert.h> void systemf(const char *command, ...) { char *full_command; va_list ap; va_start(ap, command); if (vasprintf(&full_command, command, ap) == -1) err(1, "vasprintf"); va_end(ap); printf("systemf: <<<%s>>>\n", full_command); system(full_command); } char *devname; int tun_alloc(char *name) { int fd = open("/dev/net/tun", O_RDWR); if (fd == -1) err(1, "open tun dev"); static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI }; strcpy(req.ifr_name, name); if (ioctl(fd, TUNSETIFF, &req)) err(1, "TUNSETIFF"); devname = req.ifr_name; printf("device name: %s\n", devname); return fd; } #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24)) void sum_accumulate(unsigned int *sum, void *data, int len) { assert((len&2)==0); for (int i=0; i<len/2; i++) { *sum += ntohs(((unsigned short *)data)[i]); } } unsigned short sum_final(unsigned int sum) { sum = (sum >> 16) + (sum & 0xffff); sum = (sum >> 16) + (sum & 0xffff); return htons(~sum); } void fix_ip_sum(struct iphdr *ip) { unsigned int sum = 0; sum_accumulate(&sum, ip, sizeof(*ip)); ip->check = sum_final(sum); } void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) { unsigned int sum = 0; struct { unsigned int saddr; unsigned int daddr; unsigned char pad; unsigned char proto_num; unsigned short tcp_len; } fakehdr = { .saddr = ip->saddr, .daddr = ip->daddr, .proto_num = ip->protocol, .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4) }; sum_accumulate(&sum, &fakehdr, sizeof(fakehdr)); sum_accumulate(&sum, tcp, tcp->doff*4); tcp->check = sum_final(sum); } int main(void) { int tun_fd = tun_alloc("inject_dev%d"); systemf("ip link set %s up", devname); systemf("ip addr add 192.168.42.1/24 dev %s", devname); struct { struct iphdr ip; struct tcphdr tcp; unsigned char tcp_opts[20]; } __attribute__((packed)) syn_packet = { .ip = { .ihl = sizeof(struct iphdr)/4, .version = 4, .tot_len = htons(sizeof(syn_packet)), .ttl = 30, .protocol = IPPROTO_TCP, /* FIXUP check */ .saddr = IPADDR(192,168,42,2), .daddr = IPADDR(192,168,42,1) }, .tcp = { .source = htons(1), .dest = htons(1337), .seq = 0x12345678, .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4, .syn = 1, .window = htons(64), .check = 0 /*FIXUP*/ }, .tcp_opts = { /* INVALID: trailing MD5SIG opcode after NOPs */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 19 } }; fix_ip_sum(&syn_packet.ip); fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp); while (1) { int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet)); if (write_res != sizeof(syn_packet)) err(1, "packet write failed"); } } ==================================== Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22Linux 4.17-rc2Linus Torvalds1-1/+1
2018-04-22ibmvnic: Clean actual number of RX or TX poolsThomas Falcon1-2/+2
Avoid using value stored in the login response buffer when cleaning TX and RX buffer pools since these could be inconsistent depending on the device state. Instead use the field in the driver's private data that tracks the number of active pools. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22net: sched: ife: check on metadata lengthAlexander Aring1-0/+3
This patch checks if sk buffer is available to dererence ife header. If not then NULL will returned to signal an malformed ife packet. This avoids to crashing the kernel from outside. Signed-off-by: Alexander Aring <aring@mojatatu.com> Reviewed-by: Yotam Gigi <yotam.gi@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22net: sched: ife: handle malformed tlv lengthAlexander Aring3-4/+41
There is currently no handling to check on a invalid tlv length. This patch adds such handling to avoid killing the kernel with a malformed ife packet. Signed-off-by: Alexander Aring <aring@mojatatu.com> Reviewed-by: Yotam Gigi <yotam.gi@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22net: sched: ife: signal not finding metaidAlexander Aring1-1/+1
We need to record stats for received metadata that we dont know how to process. Have find_decode_metaid() return -ENOENT to capture this. Signed-off-by: Alexander Aring <aring@mojatatu.com> Reviewed-by: Yotam Gigi <yotam.gi@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22strparser: Do not call mod_delayed_work with a timeout of LONG_MAXDoron Roberts-Kedes1-1/+1
struct sock's sk_rcvtimeo is initialized to LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling mod_delayed_work with a timeout of LONG_MAX causes spurious execution of the work function. timer->expires is set equal to jiffies + LONG_MAX. When timer_base->clk falls behind the current value of jiffies, the delta between timer_base->clk and jiffies + LONG_MAX causes the expiration to be in the past. Returning early from strp_start_timer if timeo == LONG_MAX solves this problem. Found while testing net/tls_sw recv path. Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages") Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pktsAhmed Abdelsalam1-1/+1
In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src() in order to set the src addr of outer IPv6 header. The net_device is required for set_tun_src(). However calling ip6_dst_idev() on dst_entry in case of IPv4 traffic results on the following bug. Using just dst->dev should fix this BUG. [ 196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0 [ 196.243329] Oops: 0000 [#1] SMP PTI [ 196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci [ 196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1 [ 196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300 [ 196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202 [ 196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000 [ 196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850 [ 196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800 [ 196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808 [ 196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200 [ 196.246846] FS: 00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000 [ 196.247286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0 [ 196.247804] Call Trace: [ 196.247972] seg6_do_srh+0x15b/0x1c0 [ 196.248156] seg6_output+0x3c/0x220 [ 196.248341] ? prandom_u32+0x14/0x20 [ 196.248526] ? ip_idents_reserve+0x6c/0x80 [ 196.248723] ? __ip_select_ident+0x90/0x100 [ 196.248923] ? ip_append_data.part.50+0x6c/0xd0 [ 196.249133] lwtunnel_output+0x44/0x70 [ 196.249328] ip_send_skb+0x15/0x40 [ 196.249515] raw_sendmsg+0x8c3/0xac0 [ 196.249701] ? _copy_from_user+0x2e/0x60 [ 196.249897] ? rw_copy_check_uvector+0x53/0x110 [ 196.250106] ? _copy_from_user+0x2e/0x60 [ 196.250299] ? copy_msghdr_from_user+0xce/0x140 [ 196.250508] sock_sendmsg+0x36/0x40 [ 196.250690] ___sys_sendmsg+0x292/0x2a0 [ 196.250881] ? _cond_resched+0x15/0x30 [ 196.251074] ? copy_termios+0x1e/0x70 [ 196.251261] ? _copy_to_user+0x22/0x30 [ 196.251575] ? tty_mode_ioctl+0x1c3/0x4e0 [ 196.251782] ? _cond_resched+0x15/0x30 [ 196.251972] ? mutex_lock+0xe/0x30 [ 196.252152] ? vvar_fault+0xd2/0x110 [ 196.252337] ? __do_fault+0x1f/0xc0 [ 196.252521] ? __handle_mm_fault+0xc1f/0x12d0 [ 196.252727] ? __sys_sendmsg+0x63/0xa0 [ 196.252919] __sys_sendmsg+0x63/0xa0 [ 196.253107] do_syscall_64+0x72/0x200 [ 196.253305] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 196.253530] RIP: 0033:0x7fc4480b0690 [ 196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690 [ 196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003 [ 196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002 [ 196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070 [ 196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe [ 196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10 [ 196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60 [ 196.256445] CR2: 0000000000000000 [ 196.256676] ---[ end trace 71af7d093603885c ]--- Fixes: 8936ef7604c11 ("ipv6: sr: fix NULL pointer dereference when setting encap source address") Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22llc: fix NULL pointer deref for SOCK_ZAPPEDCong Wang1-9/+12
For SOCK_ZAPPED socket, we don't need to care about llc->sap, so we should just skip these refcount functions in this case. Fixes: f7e43672683b ("llc: hold llc_sap before release_sock()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22net: ethernet: ti: cpsw: fix tx vlan priority mappingIvan Khoronzhuk1-1/+1
The CPDMA_TX_PRIORITY_MAP in real is vlan pcp field priority mapping register and basically replaces vlan pcp field for tagged packets. So, set it to be 1:1 mapping. Otherwise, it will cause unexpected change of egress vlan tagged packets, like prio 2 -> prio 5. Fixes: e05107e6b747 ("net: ethernet: ti: cpsw: add multi queue support") Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22llc: delete timers synchronously in llc_sk_free()Cong Wang3-9/+23
The connection timers of an llc sock could be still flying after we delete them in llc_sk_free(), and even possibly after we free the sock. We could just wait synchronously here in case of troubles. Note, I leave other call paths as they are, since they may not have to wait, at least we can change them to synchronously when needed. Also, move the code to net/llc/llc_conn.c, which is apparently a better place. Reported-by: <syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22l2tp: fix {pppol2tp, l2tp_dfs}_seq_stop() in case of seq_file overflowGuillaume Nault2-2/+8
Commit 0e0c3fee3a59 ("l2tp: hold reference on tunnels printed in pppol2tp proc file") assumed that if pppol2tp_seq_stop() was called with non-NULL private data (the 'v' pointer), then pppol2tp_seq_start() would not be called again. It turns out that this isn't guaranteed, and overflowing the seq_file's buffer in pppol2tp_seq_show() is a way to get into this situation. Therefore, pppol2tp_seq_stop() needs to reset pd->tunnel, so that pppol2tp_seq_start() won't drop a reference again if it gets called. We also have to clear pd->session, because the rest of the code expects a non-NULL tunnel when pd->session is set. The l2tp_debugfs module has the same issue. Fix it in the same way. Fixes: 0e0c3fee3a59 ("l2tp: hold reference on tunnels printed in pppol2tp proc file") Fixes: f726214d9b23 ("l2tp: hold reference on tunnels printed in l2tp/tunnels debugfs file") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22s390/qeth: use Read device to query hypervisor for MACJulian Wiedmann1-1/+1
For z/VM NICs, qeth needs to consider which of the three CCW devices in an MPC group it uses for requesting a managed MAC address. On the Base device, the hypervisor returns a default MAC which is pre-assigned when creating the NIC (this MAC is also returned by the READ MAC primitive). Querying any other device results in the allocation of an additional MAC address. For consistency with READ MAC and to avoid using up more addresses than necessary, it is preferable to use the NIC's default MAC. So switch the the diag26c over to using a NIC's Read device, which should always be identical to the Base device. Fixes: ec61bd2fd2a2 ("s390/qeth: use diag26c to get MAC address on L2") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22s390/qeth: fix request-side race during cmd IO timeoutJulian Wiedmann3-27/+40
Submitting a cmd IO request (usually on the WRITE device, but for IDX also on the READ device) is currently done with ccw_device_start() and a manual timeout in the caller. On timeout, the caller cleans up the related resources (eg. IO buffer). But 1) the IO might still be active and utilize those resources, and 2) when the IO completes, qeth_irq() will attempt to clean up the same resources again. Instead of introducing additional resource locking, switch to ccw_device_start_timeout() to ensure IO termination after timeout, and let the IRQ handler alone deal with cleaning up after a request. This also removes a stray write->irq_pending reset from clear_ipacmd_list(). The routine doesn't terminate any pending IO on the WRITE device, so this should be handled properly via IO timeout in the IRQ handler. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22s390/qeth: fix MAC address update sequenceJulian Wiedmann1-24/+31
When changing the MAC address on a L2 qeth device, current code first unregisters the old address, then registers the new one. If HW rejects the new address (or the IO fails), the device ends up with no operable address at all. Re-order the code flow so that the old address only gets dropped if the new address was registered successfully. While at it, add logic to catch some corner-cases. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22s390/qeth: handle failure on workqueue creationJulian Wiedmann1-1/+7
Creating the global workqueue during driver init may fail, deal with it. Also, destroy the created workqueue on any subsequent error. Fixes: 0f54761d167f ("qeth: Support VEPA mode") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22s390/qeth: avoid control IO completion stallsJulian Wiedmann2-19/+5
For control IO, qeth currently tracks the index of the buffer that it expects to complete the next IO on each qeth_channel. If the channel presents an IRQ while this buffer has not yet completed, no completion processing for _any_ completed buffer takes place. So if the 'next buffer' is skipped for any sort of reason* (eg. when it is released due to error conditions, before the IO is started), the buffer obviously won't switch to PROCESSED until it is eventually allocated for a _different_ IO and completes. Until this happens, all completion processing on that channel stalls and pending requests possibly time out. As a fix, remove the whole 'next buffer' logic and simply process any IO buffer right when it completes. A channel will never have more than one IO pending, so there's no risk of processing out-of-sequence. *Note: currently just one location in the code really handles this problem, by advancing the 'next' index manually. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>