Merge branch 'hashmap_iter_bucket_lock_fix'

Yonghong Song says: ==================== Currently, the bpf hashmap iterator takes a bucket_lock, a spin_lock, before visiting each element in the bucket. This will cause a deadlock if a map update/delete operates on an element with the same bucket id of the visited map. To avoid the deadlock, let us just use rcu_read_lock instead of bucket_lock. This may result in visiting stale elements, missing some elements, or repeating some elements, if concurrent map delete/update happens for the same map. I think using rcu_read_lock is a reasonable compromise. For users caring stale/missing/repeating element issues, bpf map batch access syscall interface can be used. Note that another approach is during bpf_iter link stage, we check whether the iter program might be able to do update/delete to the visited map. If it is, reject the link_create. Verifier needs to record whether an update/delete operation happens for each map for this approach. I just feel this checking is too specialized, hence still prefer rcu_read_lock approach. Patch #1 has the kernel implementation and Patch #2 added a selftest which can trigger deadlock without Patch #1. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
author: Alexei Starovoitov <ast@kernel.org> 2020-09-03 17:36:41 -0700
committer: Alexei Starovoitov <ast@kernel.org> 2020-09-03 17:40:40 -0700
commit: e6135df45e21f1815a5948f452593124b1544a3e (patch)
tree: 1b6e2a0484ce01d82c58a29824b8c251a334fc8d /kernel
parent: libbpf: Remove arch-specific include path in Makefile (diff)
parent: selftests/bpf: Add bpf_{update, delete}_map_elem in hashmap iter program (diff)
download: linux-dev-e6135df45e21f1815a5948f452593124b1544a3e.tar.xz
linux-dev-e6135df45e21f1815a5948f452593124b1544a3e.zip
1 files changed, 4 insertions, 11 deletions
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 78dfff6a501b..7df28a45c66b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1622,7 +1622,6 @@ struct bpf_iter_seq_hash_map_info {
 	struct bpf_map *map;
 	struct bpf_htab *htab;
 	void *percpu_value_buf; // non-zero means percpu hash
-	unsigned long flags;
 	u32 bucket_id;
 	u32 skip_elems;
 };
@@ -1632,7 +1631,6 @@ bpf_hash_map_seq_find_next(struct bpf_iter_seq_hash_map_info *info,
 			   struct htab_elem *prev_elem)
 {
 	const struct bpf_htab *htab = info->htab;
-	unsigned long flags = info->flags;
 	u32 skip_elems = info->skip_elems;
 	u32 bucket_id = info->bucket_id;
 	struct hlist_nulls_head *head;
@@ -1656,19 +1654,18 @@ bpf_hash_map_seq_find_next(struct bpf_iter_seq_hash_map_info *info,
 
 		/* not found, unlock and go to the next bucket */
 		b = &htab->buckets[bucket_id++];
-		htab_unlock_bucket(htab, b, flags);
+		rcu_read_unlock();
 		skip_elems = 0;
 	}
 
 	for (i = bucket_id; i < htab->n_buckets; i++) {
 		b = &htab->buckets[i];
-		flags = htab_lock_bucket(htab, b);
+		rcu_read_lock();
 
 		count = 0;
 		head = &b->head;
 		hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
 			if (count >= skip_elems) {
-				info->flags = flags;
 				info->bucket_id = i;
 				info->skip_elems = count;
 				return elem;
@@ -1676,7 +1673,7 @@ bpf_hash_map_seq_find_next(struct bpf_iter_seq_hash_map_info *info,
 			count++;
 		}
 
-		htab_unlock_bucket(htab, b, flags);
+		rcu_read_unlock();
 		skip_elems = 0;
 	}
 
@@ -1754,14 +1751,10 @@ static int bpf_hash_map_seq_show(struct seq_file *seq, void *v)
 
 static void bpf_hash_map_seq_stop(struct seq_file *seq, void *v)
 {
-	struct bpf_iter_seq_hash_map_info *info = seq->private;
-
 	if (!v)
 		(void)__bpf_hash_map_seq_show(seq, NULL);
 	else
-		htab_unlock_bucket(info->htab,
-				   &info->htab->buckets[info->bucket_id],
-				   info->flags);
+		rcu_read_unlock();
 }
 
 static int bpf_iter_init_hash_map(void *priv_data,
author	Alexei Starovoitov <ast@kernel.org>	2020-09-03 17:36:41 -0700
committer	Alexei Starovoitov <ast@kernel.org>	2020-09-03 17:40:40 -0700
commit	e6135df45e21f1815a5948f452593124b1544a3e (patch)
tree	1b6e2a0484ce01d82c58a29824b8c251a334fc8d /kernel
parent	libbpf: Remove arch-specific include path in Makefile (diff)
parent	selftests/bpf: Add bpf_{update, delete}_map_elem in hashmap iter program (diff)
download	linux-dev-e6135df45e21f1815a5948f452593124b1544a3e.tar.xz linux-dev-e6135df45e21f1815a5948f452593124b1544a3e.zip