aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-05-28nexthop: Add support for nexthop groupsDavid Ahern1-23/+481
Allow the creation of nexthop groups which reference other nexthop objects to create multipath routes: +--------------+ +------------+ +--------------+ | | nh nh_grp --->| nh_grp_entry |-+ +------------+ +---------|----+ ^ | | +------------+ +----------------+ +--->| nh, weight | nh_parent +------------+ A group entry points to a nexthop with a weight for that hop within the group. The nexthop has a list_head, grp_list, for tracking which groups it is a member of and the group entry has a reference back to the parent. The grp_list is used when a nexthop is deleted - to efficiently remove it from groups using it. If a nexthop group spec is given, no other attributes can be set. Each nexthop id in a group spec must already exist. Similar to single nexthops, the specification of a nexthop group can be updated so that data is managed with rcu locking. Add path selection function to account for multiple paths and add ipv{4,6}_good_nh helpers to know that if a neighbor entry exists it is in a good state. Update NETDEV event handling to rebalance multipath nexthop groups if a nexthop is deleted due to a link event (down or unregister). When a nexthop is removed any groups using it are updated. Groups using a nexthop a tracked via a grp_list. Nexthop dumps can be limited to groups only by adding NHA_GROUPS to the request. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28nexthop: Add support for lwt encapsDavid Ahern1-1/+36
Add support for NHA_ENCAP and NHA_ENCAP_TYPE. Leverages the existing code for lwtunnel within fib_nh_common, so the only change needed is handling the attributes in the nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28nexthop: Add support for IPv6 gatewaysDavid Ahern1-0/+56
Handle IPv6 gateway in a nexthop spec. If nh_family is set to AF_INET6, NHA_GATEWAY is expected to be an IPv6 address. Add ipv6 option to gw in nh_config to hold the address, add fib6_nh to nh_info to leverage the ipv6 initialization and cleanup code. Update nh_fill_node to dump the v6 address. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28nexthop: Add support for IPv4 nexthopsDavid Ahern1-0/+208
Add support for IPv4 nexthops. If nh_family is set to AF_INET, then NHA_GATEWAY is expected to be an IPv4 address. Register for netdev events to be notified of admin up/down changes as well as deletes. A hash table is used to track nexthop per devices to quickly convert device events to the affected nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28net: Initial nexthop codeDavid Ahern2-1/+723
Barebones start point for nexthops. Implementation for RTM commands, notifications, management of rbtree for holding nexthops by id, and kernel side data structures for nexthops and nexthop config. Nexthops are maintained in an rbtree sorted by id. Similar to routes, nexthops are configured per namespace using netns_nexthop struct added to struct net. Nexthop notifications are sent when a nexthop is added or deleted, but NOT if the delete is due to a device event or network namespace teardown (which also involves device events). Applications are expected to use the device down event to flush nexthops and any routes used by the nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28inet: frags: fix use-after-free read in inet_frag_destroy_rcuEric Dumazet1-2/+18
As caught by syzbot [1], the rcu grace period that is respected before fqdir_rwork_fn() proceeds and frees fqdir is not enough to prevent inet_frag_destroy_rcu() being run after the freeing. We need a proper rcu_barrier() synchronization to replace the one we had in inet_frags_fini() We also have to fix a potential problem at module removal : inet_frags_fini() needs to make sure that all queued work queues (fqdir_rwork_fn) have completed, otherwise we might call kmem_cache_destroy() too soon and get another use-after-free. [1] BUG: KASAN: use-after-free in inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 Read of size 8 at addr ffff88806ed47a18 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0-rc1+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch kernel/rcu/tree.c:2092 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2310 [inline] rcu_core+0xba5/0x1500 kernel/rcu/tree.c:2291 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: ff ff 48 89 df e8 f2 95 8c fa eb 82 e9 07 00 00 00 0f 00 2d e4 45 4b 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d4 45 4b 00 fb f4 <c3> 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 8e 18 42 fa e8 99 RSP: 0018:ffff8880a98e7d78 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff1164e11 RBX: ffff8880a98d4340 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a98d4bbc RBP: ffff8880a98e7da8 R08: ffff8880a98d4340 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffffffff88b27078 R14: 0000000000000001 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571 default_idle_call+0x36/0x90 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x377/0x560 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354 start_secondary+0x34e/0x4c0 arch/x86/kernel/smpboot.c:267 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 8877: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 kmem_cache_alloc_trace+0x151/0x750 mm/slab.c:3555 kmalloc include/linux/slab.h:547 [inline] kzalloc include/linux/slab.h:742 [inline] fqdir_init include/net/inet_frag.h:115 [inline] ipv6_frags_init_net+0x48/0x460 net/ipv6/reassembly.c:513 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206 ksys_unshare+0x440/0x980 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 17: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 fqdir_rwork_fn+0x33/0x40 net/ipv4/inet_fragment.c:154 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at ffff88806ed47a00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 24 bytes inside of 512-byte region [ffff88806ed47a00, ffff88806ed47c00) The buggy address belongs to the page: page:ffffea0001bb51c0 refcount:1 mapcount:0 mapping:ffff8880aa400940 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000282a788 ffffea0001bb53c8 ffff8880aa400940 raw: 0000000000000000 ffff88806ed47000 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88806ed47900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88806ed47a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88806ed47a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28inet: frags: uninline fqdir_init()Eric Dumazet1-0/+19
fqdir_init() is not fast path and is getting bigger. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26ipv4: remove redundant assignment to nColin Ian King1-1/+0
The pointer n is being assigned a value however this value is never read in the code block and the end of the code block continues to the next loop iteration. Clean up the code by removing the redundant assignment. Fixes: 1bff1a0c9bbda ("ipv4: Add function to send route updates") Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26inet: frags: rework rhashtable dismantleEric Dumazet1-12/+37
syszbot found an interesting use-after-free [1] happening while IPv4 fragment rhashtable was destroyed at netns dismantle. While no insertions can possibly happen at the time a dismantling netns is destroying this rhashtable, timers can still fire and attempt to remove elements from this rhashtable. This is forbidden, since rhashtable_free_and_destroy() has no synchronization against concurrent inserts and deletes. Add a new fqdir->dead flag so that timers do not attempt a rhashtable_remove_fast() operation. We also have to respect an RCU grace period before starting the rhashtable_free_and_destroy() from process context, thus we use rcu_work infrastructure. This is a refinement of a prior rough attempt to fix this bug : https://marc.info/?l=linux-netdev&m=153845936820900&w=2 Since the rhashtable cleanup is now deferred to a work queue, netns dismantles should be slightly faster. [1] BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:194 [inline] BUG: KASAN: use-after-free in rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212 Read of size 8 at addr ffff8880a6497b70 by task kworker/0:0/5 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.2.0-rc1+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events rht_deferred_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 __read_once_size include/linux/compiler.h:194 [inline] rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212 rht_deferred_worker+0x111/0x2030 lib/rhashtable.c:411 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 32687: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc_node mm/slab.c:3620 [inline] __kmalloc_node+0x4e/0x70 mm/slab.c:3627 kmalloc_node include/linux/slab.h:590 [inline] kvmalloc_node+0x68/0x100 mm/util.c:431 kvmalloc include/linux/mm.h:637 [inline] kvzalloc include/linux/mm.h:645 [inline] bucket_table_alloc+0x90/0x480 lib/rhashtable.c:178 rhashtable_init+0x3f4/0x7b0 lib/rhashtable.c:1057 inet_frags_init_net include/net/inet_frag.h:109 [inline] ipv4_frags_init_net+0x182/0x410 net/ipv4/ip_fragment.c:683 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206 ksys_unshare+0x440/0x980 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 7: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 kvfree+0x61/0x70 mm/util.c:460 bucket_table_free+0x69/0x150 lib/rhashtable.c:108 rhashtable_free_and_destroy+0x165/0x8b0 lib/rhashtable.c:1155 inet_frags_exit_net+0x3d/0x50 net/ipv4/inet_fragment.c:152 ipv4_frags_exit_net+0x73/0x90 net/ipv4/ip_fragment.c:695 ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:154 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:553 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at ffff8880a6497b40 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 48 bytes inside of 1024-byte region [ffff8880a6497b40, ffff8880a6497f40) The buggy address belongs to the page: page:ffffea0002992580 refcount:1 mapcount:0 mapping:ffff8880aa400ac0 index:0xffff8880a64964c0 compound_mapcount: 0 flags: 0x1fffc0000010200(slab|head) raw: 01fffc0000010200 ffffea0002916e88 ffffea000218fe08 ffff8880aa400ac0 raw: ffff8880a64964c0 ffff8880a6496040 0000000100000005 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a6497a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a6497a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8880a6497b00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8880a6497b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a6497c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26net: dynamically allocate fqdir structuresEric Dumazet3-18/+19
Following patch will add rcu grace period before fqdir rhashtable destruction, so we need to dynamically allocate fqdir structures to not force expensive synchronize_rcu() calls in netns dismantle path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26net: add a net pointer to struct fqdirEric Dumazet1-13/+7
fqdir will soon be dynamically allocated. We need to reach the struct net pointer from fqdir, so add it, and replace the various container_of() constructs by direct access to the new field. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26net: rename inet_frags_init_net() to fdir_init()Eric Dumazet1-2/+1
And pass an extra parameter, since we will soon dynamically allocate fqdir structures. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]Eric Dumazet1-12/+6
(struct net *)->ipv4.fqdir will soon be a pointer, so make sure ip4_frags_ns_ctl_table[] does not reference init_net. ip4_frags_ns_ctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26net: rename struct fqdir fieldsEric Dumazet2-28/+28
Rename the @frags fields from structs netns_ipv4, netns_ipv6, netns_nf_frag and netns_ieee802154_lowpan to @fqdir Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26net: rename inet_frags_exit_net() to fqdir_exit()Eric Dumazet2-4/+4
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26inet: rename netns_frags to fqdirEric Dumazet2-36/+36
1) struct netns_frags is renamed to struct fqdir This structure is really holding many frag queues in a hash table. 2) (struct inet_frag_queue)->net field is renamed to fqdir since net is generally associated to a 'struct net' pointer in networking stack. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22net: Set strict_start_type for routes and rulesDavid Ahern1-0/+1
New userspace on an older kernel can send unknown and unsupported attributes resulting in an incompelete config which is almost always wrong for routing (few exceptions are passthrough settings like the protocol that installed the route). Set strict_start_type in the policies for IPv4 and IPv6 routes and rules to detect new, unsupported attributes and fail the route add. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: Rename and export nh_update_mtuDavid Ahern1-2/+2
Rename nh_update_mtu to fib_nhc_update_mtu and export for use by the nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: export fib_info_update_nh_saddrDavid Ahern1-6/+5
Add scope as input argument versus relying on fib_info reference in fib_nh, and export fib_info_update_nh_saddr. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: export fib_flushDavid Ahern1-1/+1
As nexthops are deleted, fib entries referencing it are marked dead. Export fib_flush so those entries can be removed in a timely manner. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: export fib_check_nhDavid Ahern1-6/+6
Change fib_check_nh to take net, table and scope as input arguments over struct fib_config and export for use by nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22ipv4: Add function to send route updatesDavid Ahern1-0/+72
Add fib_info_notify_update to walk the fib and send RTM_NEWROUTE notifications with NLM_F_REPLACE set for entries linked to a fib_info that have nh_updated flag set. This helper will be used by the nexthop code to notify userspace of routes that are impacted when a nexthop config is updated via replace. The new function and its helper are similar to how fib_flush and fib_table_flush work for address delete and link down events. This notification is needed for legacy apps that do not understand the new nexthop object. Apps that are nexthop aware can use the RTA_NH_ID attribute in the route notification to just ignore it. In the future this should be wrapped in a sysctl to allow OS'es that are fully updated to avoid the notificaton storm. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-21Merge tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds43-30/+43
Pull SPDX update from Greg KH: "Here is a series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches" * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3 ...
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13Thomas Gleixner2-26/+2
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details [based] [from] [clk] [highbank] [c] you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 355 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3Thomas Gleixner1-4/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 or later as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 9 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154040.848507137@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner3-0/+3
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner27-0/+27
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner10-0/+10
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-19net: Treat sock->sk_drops as an unsigned int when printingPatrick Talbert3-3/+3
Currently, procfs socket stats format sk_drops as a signed int (%d). For large values this will cause a negative number to be printed. We know the drop count can never be a negative so change the format specifier to %u. Signed-off-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16net: bpfilter: fallback to netfilter if failed to load bpfilter kernel moduleKonstantin Khlebnikov1-4/+2
If bpfilter is not available return ENOPROTOOPT to fallback to netfilter. Function request_module() returns both errors and userspace exit codes. Just ignore them. Rechecking bpfilter_ops is enough. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-3/+4
Daniel Borkmann says: ==================== pull-request: bpf 2019-05-16 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a use after free in __dev_map_entry_free(), from Eric. 2) Several sockmap related bug fixes: a splat in strparser if it was never initialized, remove duplicate ingress msg list purging which can race, fix msg->sg.size accounting upon skb to msg conversion, and last but not least fix a timeout bug in tcp_bpf_wait_data(), from John. 3) Fix LRU map to avoid messing with eviction heuristics upon syscall lookup, e.g. map walks from user space side will then lead to eviction of just recently created entries on updates as it would mark all map entries, from Daniel. 4) Don't bail out when libbpf feature probing fails. Also various smaller fixes to flow_dissector test, from Stanislav. 5) Fix missing brackets for BTF_INT_OFFSET() in UAPI, from Gary. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-16bpf, tcp: correctly handle DONT_WAIT flags and timeo == 0John Fastabend1-1/+4
The tcp_bpf_wait_data() routine needs to check timeo != 0 before calling sk_wait_event() otherwise we may see unexpected stalls on receiver. Arika did all the leg work here I just formatted, posted and ran a few tests. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: Arika Chen <eaglesora@gmail.com> Suggested-by: Arika Chen <eaglesora@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-15tcp: do not recycle cloned skbsEric Dumazet1-1/+1
It is illegal to change arbitrary fields in skb_shared_info if the skb is cloned. Before calling skb_zcopy_clear() we need to ensure this rule, therefore we need to move the test from sk_stream_alloc_skb() to sk_wmem_free_skb() Fixes: 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14tcp: fix retrans timestamp on passive Fast OpenYuchung Cheng1-0/+3
Commit c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open") sets the start of SYNACK retransmission time on passive Fast Open in "retrans_stamp". However the timestamp is not reset upon the handshake has completed. As a result, future data packet retransmission may not update it in tcp_retransmit_skb(). This may lead to socket aborting earlier unexpectedly by retransmits_timed_out() since retrans_stamp remains the SYNACK rtx time. This bug only manifests on passive TFO sender that a) suffered SYNACK timeout and then b) stalls on very first loss recovery. Any successful loss recovery would reset the timestamp to avoid this issue. Fixes: c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14bpf: sockmap remove duplicate queue freeJohn Fastabend1-2/+0
In tcp bpf remove we free the cork list and purge the ingress msg list. However we do this before the ref count reaches zero so it could be possible some other access is in progress. In this case (tcp close and/or tcp_unhash) we happen to also hold the sock lock so no path exists but lets fix it otherwise it is extremely fragile and breaks the reference counting rules. Also we already check the cork list and ingress msg queue and free them once the ref count reaches zero so its wasteful to check twice. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-09net/tcp: use deferred jump label for TCP acked data hookJakub Kicinski1-5/+11
User space can flip the clean_acked_data_enabled static branch on and off with TLS offload when CONFIG_TLS_DEVICE is enabled. jump_label.h suggests we use the delayed version in this case. Deferred branches now also don't take the branch mutex on decrement, so we avoid potential locking issues. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08ipv4: Fix raw socket lookup for local trafficDavid Ahern1-2/+2
inet_iif should be used for the raw socket lookup. inet_iif considers rt_iif which handles the case of local traffic. As it stands, ping to a local address with the '-I <dev>' option fails ever since ping was changed to use SO_BINDTODEVICE instead of cmsg + IP_PKTINFO. IPv6 works fine. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds42-1549/+1295
Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+1
Minor conflict with the DSA legacy code removal. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-06Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull RCU updates from Ingo Molnar: "This cycles's RCU changes include: - a couple of straggling RCU flavor consolidation updates - SRCU updates - RCU CPU stall-warning updates - torture-test updates - an LKMM commit adding support for synchronize_srcu_expedited() - documentation updates - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) net/ipv4/netfilter: Update comment from call_rcu_bh() to call_rcu() tools/memory-model: Add support for synchronize_srcu_expedited() doc/kprobes: Update obsolete RCU update functions torture: Suppress false-positive CONFIG_INITRAMFS_SOURCE complaint locktorture: NULL cxt.lwsa and cxt.lrsa to allow bad-arg detection rcuperf: Fix cleanup path for invalid perf_type strings rcutorture: Fix cleanup path for invalid torture_type strings rcutorture: Fix expected forward progress duration in OOM notifier rcutorture: Remove ->ext_irq_conflict field rcutorture: Make rcutorture_extend_mask() comment match the code tools/.../rcutorture: Convert to SPDX license identifier torture: Don't try to offline the last CPU rcu: Fix nohz status in stall warning rcu: Move forward-progress checkers into tree_stall.h rcu: Move irq-disabled stall-warning checking to tree_stall.h rcu: Organize functions in tree_stall.h rcu: Move FAST_NO_HZ stall-warning code to tree_stall.h rcu: Inline RCU stall-warning info helper functions rcu: Move rcu_print_task_exp_stall() to tree_exp.h rcu: Inline RCU task stall-warning helper functions ...
2019-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2-2/+2
Pablo Neira Ayuso says: =================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next, they are: 1) Move nft_expr_clone() to nft_dynset, from Paul Gortmaker. 2) Do not include module.h from net/netfilter/nf_tables.h, also from Paul. 3) Restrict conntrack sysctl entries to boolean, from Tonghao Zhang. 4) Several patches to add infrastructure to autoload NAT helper modules from their respective conntrack helper, this also includes the first client of this code in OVS, patches from Flavio Leitner. 5) Add support to match for conntrack ID, from Brett Mastbergen. 6) Spelling fix in connlabel, from Colin Ian King. 7) Use struct_size() from hashlimit, from Gustavo A. R. Silva. 8) Add optimized version of nf_inet_addr_mask(), from Li RongQing. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05net: use indirect calls helpers at early demux stagePaolo Abeni1-1/+4
So that we avoid another indirect call per RX packet, if early demux is enabled. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05net: use indirect calls helpers for L3 handler hooksPaolo Abeni1-1/+5
So that we avoid another indirect call per RX packet in the common case. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05ipv4: Move exception bucket to nh_commonDavid Ahern2-30/+23
Similar to the cached routes, make IPv4 exceptions accessible when using an IPv6 nexthop struct with IPv4 routes. Simplify the exception functions by passing in fib_nh_common since that is all it needs, and then cleanup the call sites that have extraneous fib_nh conversions. As with the cached routes this is a change in location only, from fib_nh up to fib_nh_common; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05ipv4: Pass fib_nh_common to rt_cache_routeDavid Ahern1-10/+10
Now that the cached routes are in fib_nh_common, pass it to rt_cache_route and simplify its callers. For rt_set_nexthop, the tclassid becomes the last user of fib_nh so move the container_of under the #ifdef CONFIG_IP_ROUTE_CLASSID. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05ipv4: Move cached routes to fib_nh_commonDavid Ahern2-26/+28
While the cached routes, nh_pcpu_rth_output and nh_rth_input, are IPv4 specific, a later patch wants to make them accessible for IPv6 nexthops with IPv4 routes using a fib6_nh. Move the cached routes from fib_nh to fib_nh_common and update references. Initialization of the cached entries is moved to fib_nh_common_init, and free is moved to fib_nh_common_release. Change in location only, from fib_nh up to fib_nh_common; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04ipmr_base: Do not reset index in mr_table_dumpDavid Ahern1-2/+1
e is the counter used to save the location of a dump when an skb is filled. Once the walk of the table is complete, mr_table_dump needs to return without resetting that index to 0. Dump of a specific table is looping because of the reset because there is no way to indicate the walk of the table is done. Move the reset to the caller so the dump of each table starts at 0, but the loop counter is maintained if a dump fills an skb. Fixes: e1cedae1ba6b0 ("ipmr: Refactor mr_rtm_dumproute") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-18/+49
Three trivial overlapping conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01udp: fix GRO packet of deathEric Dumazet1-3/+10
syzbot was able to crash host by sending UDP packets with a 0 payload. TCP does not have this issue since we do not aggregate packets without payload. Since dev_gro_receive() sets gso_size based on skb_gro_len(skb) it seems not worth trying to cope with padded packets. BUG: KASAN: slab-out-of-bounds in skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826 Read of size 16 at addr ffff88808893fff0 by task syz-executor612/7889 CPU: 0 PID: 7889 Comm: syz-executor612 Not tainted 5.1.0-rc7+ #96 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load16_noabort+0x14/0x20 mm/kasan/generic_report.c:133 skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826 udp_gro_receive_segment net/ipv4/udp_offload.c:382 [inline] call_gro_receive include/linux/netdevice.h:2349 [inline] udp_gro_receive+0xb61/0xfd0 net/ipv4/udp_offload.c:414 udp4_gro_receive+0x763/0xeb0 net/ipv4/udp_offload.c:478 inet_gro_receive+0xe72/0x1110 net/ipv4/af_inet.c:1510 dev_gro_receive+0x1cd0/0x23c0 net/core/dev.c:5581 napi_gro_frags+0x36b/0xd10 net/core/dev.c:5843 tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027 call_write_iter include/linux/fs.h:1866 [inline] do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681 do_iter_write fs/read_write.c:957 [inline] do_iter_write+0x184/0x610 fs/read_write.c:938 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002 do_writev+0x15e/0x370 fs/read_write.c:1037 __do_sys_writev fs/read_write.c:1110 [inline] __se_sys_writev fs/read_write.c:1107 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441cc0 Code: 05 48 3d 01 f0 ff ff 0f 83 9d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d 51 93 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 74 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00 RSP: 002b:00007ffe8c716118 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00007ffe8c716150 RCX: 0000000000441cc0 RDX: 0000000000000001 RSI: 00007ffe8c716170 RDI: 00000000000000f0 RBP: 0000000000000000 R08: 000000000000ffff R09: 0000000000a64668 R10: 0000000020000040 R11: 0000000000000246 R12: 000000000000c2d9 R13: 0000000000402b50 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 5143: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc mm/kasan/common.c:497 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505 slab_post_alloc_hook mm/slab.h:437 [inline] slab_alloc mm/slab.c:3393 [inline] kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555 mm_alloc+0x1d/0xd0 kernel/fork.c:1030 bprm_mm_init fs/exec.c:363 [inline] __do_execve_file.isra.0+0xaa3/0x23f0 fs/exec.c:1791 do_execveat_common fs/exec.c:1865 [inline] do_execve fs/exec.c:1882 [inline] __do_sys_execve fs/exec.c:1958 [inline] __se_sys_execve fs/exec.c:1953 [inline] __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5351: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459 kasan_slab_free+0xe/0x10 mm/kasan/common.c:467 __cache_free mm/slab.c:3499 [inline] kmem_cache_free+0x86/0x260 mm/slab.c:3765 __mmdrop+0x238/0x320 kernel/fork.c:677 mmdrop include/linux/sched/mm.h:49 [inline] finish_task_switch+0x47b/0x780 kernel/sched/core.c:2746 context_switch kernel/sched/core.c:2880 [inline] __schedule+0x81b/0x1cc0 kernel/sched/core.c:3518 preempt_schedule_irq+0xb5/0x140 kernel/sched/core.c:3745 retint_kernel+0x1b/0x2d arch_local_irq_restore arch/x86/include/asm/paravirt.h:767 [inline] kmem_cache_free+0xab/0x260 mm/slab.c:3766 anon_vma_chain_free mm/rmap.c:134 [inline] unlink_anon_vmas+0x2ba/0x870 mm/rmap.c:401 free_pgtables+0x1af/0x2f0 mm/memory.c:394 exit_mmap+0x2d1/0x530 mm/mmap.c:3144 __mmput kernel/fork.c:1046 [inline] mmput+0x15f/0x4c0 kernel/fork.c:1067 exec_mmap fs/exec.c:1046 [inline] flush_old_exec+0x8d9/0x1c20 fs/exec.c:1279 load_elf_binary+0x9bc/0x53f0 fs/binfmt_elf.c:864 search_binary_handler fs/exec.c:1656 [inline] search_binary_handler+0x17f/0x570 fs/exec.c:1634 exec_binprm fs/exec.c:1698 [inline] __do_execve_file.isra.0+0x1394/0x23f0 fs/exec.c:1818 do_execveat_common fs/exec.c:1865 [inline] do_execve fs/exec.c:1882 [inline] __do_sys_execve fs/exec.c:1958 [inline] __se_sys_execve fs/exec.c:1953 [inline] __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff88808893f7c0 which belongs to the cache mm_struct of size 1496 The buggy address is located 600 bytes to the right of 1496-byte region [ffff88808893f7c0, ffff88808893fd98) The buggy address belongs to the page: page:ffffea0002224f80 count:1 mapcount:0 mapping:ffff88821bc40ac0 index:0xffff88808893f7c0 compound_mapcount: 0 flags: 0x1fffc0000010200(slab|head) raw: 01fffc0000010200 ffffea00025b4f08 ffffea00027b9d08 ffff88821bc40ac0 raw: ffff88808893f7c0 ffff88808893e440 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88808893fe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88808893ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88808893ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff888088940000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888088940080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01ipv4: ip_do_fragment: Preserve skb_iif during fragmentationShmulik Ladkani1-0/+1
Previously, during fragmentation after forwarding, skb->skb_iif isn't preserved, i.e. 'ip_copy_metadata' does not copy skb_iif from given 'from' skb. As a result, ip_do_fragment's creates fragments with zero skb_iif, leading to inconsistent behavior. Assume for example an eBPF program attached at tc egress (post forwarding) that examines __sk_buff->ingress_ifindex: - the correct iif is observed if forwarding path does not involve fragmentation/refragmentation - a bogus iif is observed if forwarding path involves fragmentation/refragmentatiom Fix, by preserving skb_iif during 'ip_copy_metadata'. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>