diff options
author | Jakub Kicinski <kuba@kernel.org> | 2022-10-28 22:07:47 -0700 |
---|---|---|
committer | Jakub Kicinski <kuba@kernel.org> | 2022-10-28 22:07:48 -0700 |
commit | 02a97e02c64fb3245b84835cbbed1c3a3222e2f1 (patch) | |
tree | eaca7ff82e341c0d6381f56ff571f19eae3af17f /drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c | |
parent | Merge branch 'net-ipa-start-adding-ipa-v5-0-functionality' (diff) | |
parent | net/mlx5: DR, Remove the buddy used_list (diff) | |
download | wireguard-linux-02a97e02c64fb3245b84835cbbed1c3a3222e2f1.tar.xz wireguard-linux-02a97e02c64fb3245b84835cbbed1c3a3222e2f1.zip |
Merge tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-10-24
SW steering updates from Yevgeny Kliteynik:
1) 1st Four patches: small fixes / optimizations for SW steering:
- Patch 1: Don't abort destroy flow if failed to destroy table - continue
and free everything else.
- Patches 2 and 3 deal with fast teardown:
+ Skip sync during fast teardown, as PCI device is not there any more.
+ Check device state when polling CQ - otherwise SW steering keeps polling
the CQ forever, because nobody is there to flush it.
- Patch 4: Removing unneeded function argument.
2) Deal with the hiccups that we get during rules insertion/deletion,
which sometimes reach 1/4 of a second. While insertion/deletion rate
improvement was not the focus here, it still is a by-product of removing these
hiccups.
Another by-product is the reduced standard deviation in measuring the duration
of rules insertion/deletion bursts.
In the testing we add K rules (warm-up phase), and then continuously do
insertion/deletion bursts of N rules.
During the test execution, the driver measures hiccups (amount and duration)
and total time for insertion/deletion of a batch of rules.
Here are some numbers, before and after these patches:
+--------------------------------------------+-----------------+----------------+
| | Create rules | Delete rules |
| +--------+--------+--------+-------+
| | Before | After | Before | After |
+--------------------------------------------+--------+--------+--------+-------+
| Max hiccup [msec] | 253 | 42 | 254 | 68 |
+--------------------------------------------+--------+--------+--------+-------+
| Avg duration of 10K rules add/remove [msec]| 140.07 | 124.32 | 106.99 | 99.51 |
+--------------------------------------------+--------+--------+--------+-------+
| Num of hiccups per 100K rules add/remove | 7.77 | 7.97 | 12.60 | 11.57 |
+--------------------------------------------+--------+--------+--------+-------+
| Avg hiccup duration [msec] | 36.92 | 33.25 | 36.15 | 33.74 |
+--------------------------------------------+--------+--------+--------+-------+
- Patch 5: Allocate a short array on stack instead of dynamically- it is
destroyed at the end of the function.
- Patch 6: Rather than cleaning the corresponding chunk's section of
ste_arrays on chunk deletion, initialize these areas upon chunk creation.
Chunk destruction tend to come in large batches (during pool syncing),
so instead of doing huge memory initialization during pool sync,
we amortize this by doing small initsializations on chunk creation.
- Patch 7: In order to simplifies error flow and allows cleaner addition
of new pools, handle creation/destruction of all the domain's memory pools
and other memory-related fields in a separate init/uninit functions.
- Patch 8: During rehash, write each table row immediately instead of waiting
for the whole table to be ready and writing it all - saves allocations
of ste_send_info structures and improves performance.
- Patch 9: Instead of allocating/freeing send info objects dynamically,
manage them in pool. The number of send info objects doesn't depend on
number of rules, so after pre-populating the pool with an initial batch of
send info objects, the pool is not expected to grow.
This way we save alloc/free during writing STEs to ICM, which by itself can
sometimes take up to 40msec.
- Patch 10: Allocate icm_chunks from their own slab allocator, which lowered
the alloc/free "hiccups" frequency.
- Patch 11: Similar to patch 9, allocate htbl from its own slab allocator.
- Patch 12: Lower sync threshold for ICM hot memory - set the threshold for
sync to 1/4 of the pool instead of 1/2 of the pool. Although we will have
more syncs, each sync will be shorter and will help with insertion rate
stability. Also, notice that the overall number of hiccups wasn't increased
due to all the other patches.
- Patch 13: Keep track of hot ICM chunks in an array instead of list.
After steering sync, we traverse the hot list and finally free all the
chunks. It appears that traversing a long list takes unusually long time
due to cache misses on many entries, which causes a big "hiccup" during
rule insertion. This patch replaces the list with pre-allocated array that
stores only the bookkeeping information that is needed to later free the
chunks in its buddy allocator.
- Patch 14: Remove the unneeded buddy used_list - we don't need to have the
list of used chunks, we only need the total amount of used memory.
* tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: DR, Remove the buddy used_list
net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list
net/mlx5: DR, Lower sync threshold for ICM hot memory
net/mlx5: DR, Allocate htbl from its own slab allocator
net/mlx5: DR, Allocate icm_chunks from their own slab allocator
net/mlx5: DR, Manage STE send info objects in pool
net/mlx5: DR, In rehash write the line in the entry immediately
net/mlx5: DR, Handle domain memory resources init/uninit separately
net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation
net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically
net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy
net/mlx5: DR, Check device state when polling CQ
net/mlx5: DR, Fix the SMFS sync_steering for fast teardown
net/mlx5: DR, In destroy flow, free resources even if FW command failed
====================
Link: https://lore.kernel.org/r/20221027145643.6618-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c')
-rw-r--r-- | drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c | 92 |
1 files changed, 60 insertions, 32 deletions
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c index 91ff19f67695..7879991048ce 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c @@ -3,13 +3,16 @@ #include "dr_types.h" -#define DR_RULE_MAX_STE_CHAIN (DR_RULE_MAX_STES + DR_ACTION_MAX_STES) +#define DR_RULE_MAX_STES_OPTIMIZED 5 +#define DR_RULE_MAX_STE_CHAIN_OPTIMIZED (DR_RULE_MAX_STES_OPTIMIZED + DR_ACTION_MAX_STES) -static int dr_rule_append_to_miss_list(struct mlx5dr_ste_ctx *ste_ctx, +static int dr_rule_append_to_miss_list(struct mlx5dr_domain *dmn, + enum mlx5dr_domain_nic_type nic_type, struct mlx5dr_ste *new_last_ste, struct list_head *miss_list, struct list_head *send_list) { + struct mlx5dr_ste_ctx *ste_ctx = dmn->ste_ctx; struct mlx5dr_ste_send_info *ste_info_last; struct mlx5dr_ste *last_ste; @@ -17,7 +20,7 @@ static int dr_rule_append_to_miss_list(struct mlx5dr_ste_ctx *ste_ctx, last_ste = list_last_entry(miss_list, struct mlx5dr_ste, miss_list_node); WARN_ON(!last_ste); - ste_info_last = kzalloc(sizeof(*ste_info_last), GFP_KERNEL); + ste_info_last = mlx5dr_send_info_alloc(dmn, nic_type); if (!ste_info_last) return -ENOMEM; @@ -120,7 +123,7 @@ dr_rule_handle_one_ste_in_update_list(struct mlx5dr_ste_send_info *ste_info, goto out; out: - kfree(ste_info); + mlx5dr_send_info_free(ste_info); return ret; } @@ -191,8 +194,8 @@ dr_rule_rehash_handle_collision(struct mlx5dr_matcher *matcher, new_ste->htbl->chunk->miss_list = mlx5dr_ste_get_miss_list(col_ste); /* Update the previous from the list */ - ret = dr_rule_append_to_miss_list(dmn->ste_ctx, new_ste, - mlx5dr_ste_get_miss_list(col_ste), + ret = dr_rule_append_to_miss_list(dmn, nic_matcher->nic_tbl->nic_dmn->type, + new_ste, mlx5dr_ste_get_miss_list(col_ste), update_list); if (ret) { mlx5dr_dbg(dmn, "Failed update dup entry\n"); @@ -278,7 +281,8 @@ dr_rule_rehash_copy_ste(struct mlx5dr_matcher *matcher, new_htbl->ctrl.num_of_valid_entries++; if (use_update_list) { - ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL); + ste_info = mlx5dr_send_info_alloc(dmn, + nic_matcher->nic_tbl->nic_dmn->type); if (!ste_info) goto err_exit; @@ -357,6 +361,15 @@ static int dr_rule_rehash_copy_htbl(struct mlx5dr_matcher *matcher, update_list); if (err) goto clean_copy; + + /* In order to decrease the number of allocated ste_send_info + * structs, send the current table row now. + */ + err = dr_rule_send_update_list(update_list, matcher->tbl->dmn, false); + if (err) { + mlx5dr_dbg(matcher->tbl->dmn, "Failed updating table to HW\n"); + goto clean_copy; + } } clean_copy: @@ -387,7 +400,8 @@ dr_rule_rehash_htbl(struct mlx5dr_rule *rule, nic_matcher = nic_rule->nic_matcher; nic_dmn = nic_matcher->nic_tbl->nic_dmn; - ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL); + ste_info = mlx5dr_send_info_alloc(dmn, + nic_matcher->nic_tbl->nic_dmn->type); if (!ste_info) return NULL; @@ -473,13 +487,13 @@ free_ste_list: list_for_each_entry_safe(del_ste_info, tmp_ste_info, &rehash_table_send_list, send_list) { list_del(&del_ste_info->send_list); - kfree(del_ste_info); + mlx5dr_send_info_free(del_ste_info); } free_new_htbl: mlx5dr_ste_htbl_free(new_htbl); free_ste_info: - kfree(ste_info); + mlx5dr_send_info_free(ste_info); mlx5dr_info(dmn, "Failed creating rehash table\n"); return NULL; } @@ -512,11 +526,11 @@ dr_rule_handle_collision(struct mlx5dr_matcher *matcher, struct list_head *send_list) { struct mlx5dr_domain *dmn = matcher->tbl->dmn; - struct mlx5dr_ste_ctx *ste_ctx = dmn->ste_ctx; struct mlx5dr_ste_send_info *ste_info; struct mlx5dr_ste *new_ste; - ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL); + ste_info = mlx5dr_send_info_alloc(dmn, + nic_matcher->nic_tbl->nic_dmn->type); if (!ste_info) return NULL; @@ -524,8 +538,8 @@ dr_rule_handle_collision(struct mlx5dr_matcher *matcher, if (!new_ste) goto free_send_info; - if (dr_rule_append_to_miss_list(ste_ctx, new_ste, - miss_list, send_list)) { + if (dr_rule_append_to_miss_list(dmn, nic_matcher->nic_tbl->nic_dmn->type, + new_ste, miss_list, send_list)) { mlx5dr_dbg(dmn, "Failed to update prev miss_list\n"); goto err_exit; } @@ -541,7 +555,7 @@ dr_rule_handle_collision(struct mlx5dr_matcher *matcher, err_exit: mlx5dr_ste_free(new_ste, matcher, nic_matcher); free_send_info: - kfree(ste_info); + mlx5dr_send_info_free(ste_info); return NULL; } @@ -721,8 +735,8 @@ static int dr_rule_handle_action_stes(struct mlx5dr_rule *rule, list_add_tail(&action_ste->miss_list_node, mlx5dr_ste_get_miss_list(action_ste)); - ste_info_arr[k] = kzalloc(sizeof(*ste_info_arr[k]), - GFP_KERNEL); + ste_info_arr[k] = mlx5dr_send_info_alloc(dmn, + nic_matcher->nic_tbl->nic_dmn->type); if (!ste_info_arr[k]) goto err_exit; @@ -772,7 +786,8 @@ static int dr_rule_handle_empty_entry(struct mlx5dr_matcher *matcher, ste->ste_chain_location = ste_location; - ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL); + ste_info = mlx5dr_send_info_alloc(dmn, + nic_matcher->nic_tbl->nic_dmn->type); if (!ste_info) goto clean_ste_setting; @@ -793,7 +808,7 @@ static int dr_rule_handle_empty_entry(struct mlx5dr_matcher *matcher, return 0; clean_ste_info: - kfree(ste_info); + mlx5dr_send_info_free(ste_info); clean_ste_setting: list_del_init(&ste->miss_list_node); mlx5dr_htbl_put(cur_htbl); @@ -1089,6 +1104,7 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule, size_t num_actions, struct mlx5dr_action *actions[]) { + u8 hw_ste_arr_optimized[DR_RULE_MAX_STE_CHAIN_OPTIMIZED * DR_STE_SIZE] = {}; struct mlx5dr_ste_send_info *ste_info, *tmp_ste_info; struct mlx5dr_matcher *matcher = rule->matcher; struct mlx5dr_domain *dmn = matcher->tbl->dmn; @@ -1098,6 +1114,7 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule, struct mlx5dr_ste_htbl *cur_htbl; struct mlx5dr_ste *ste = NULL; LIST_HEAD(send_ste_list); + bool hw_ste_arr_is_opt; u8 *hw_ste_arr = NULL; u32 new_hw_ste_arr_sz; int ret, i; @@ -1109,9 +1126,23 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule, rule->flow_source)) return 0; - hw_ste_arr = kzalloc(DR_RULE_MAX_STE_CHAIN * DR_STE_SIZE, GFP_KERNEL); - if (!hw_ste_arr) - return -ENOMEM; + ret = mlx5dr_matcher_select_builders(matcher, + nic_matcher, + dr_rule_get_ipv(¶m->outer), + dr_rule_get_ipv(¶m->inner)); + if (ret) + return ret; + + hw_ste_arr_is_opt = nic_matcher->num_of_builders <= DR_RULE_MAX_STES_OPTIMIZED; + if (likely(hw_ste_arr_is_opt)) { + hw_ste_arr = hw_ste_arr_optimized; + } else { + hw_ste_arr = kzalloc((nic_matcher->num_of_builders + DR_ACTION_MAX_STES) * + DR_STE_SIZE, GFP_KERNEL); + + if (!hw_ste_arr) + return -ENOMEM; + } mlx5dr_domain_nic_lock(nic_dmn); @@ -1119,13 +1150,6 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule, if (ret) goto free_hw_ste; - ret = mlx5dr_matcher_select_builders(matcher, - nic_matcher, - dr_rule_get_ipv(¶m->outer), - dr_rule_get_ipv(¶m->inner)); - if (ret) - goto remove_from_nic_tbl; - /* Set the tag values inside the ste array */ ret = mlx5dr_ste_build_ste_arr(matcher, nic_matcher, param, hw_ste_arr); if (ret) @@ -1187,7 +1211,8 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule, mlx5dr_domain_nic_unlock(nic_dmn); - kfree(hw_ste_arr); + if (unlikely(!hw_ste_arr_is_opt)) + kfree(hw_ste_arr); return 0; @@ -1196,7 +1221,7 @@ free_rule: /* Clean all ste_info's */ list_for_each_entry_safe(ste_info, tmp_ste_info, &send_ste_list, send_list) { list_del(&ste_info->send_list); - kfree(ste_info); + mlx5dr_send_info_free(ste_info); } remove_from_nic_tbl: @@ -1205,7 +1230,10 @@ remove_from_nic_tbl: free_hw_ste: mlx5dr_domain_nic_unlock(nic_dmn); - kfree(hw_ste_arr); + + if (unlikely(!hw_ste_arr_is_opt)) + kfree(hw_ste_arr); + return ret; } |