aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/staging/lustre/TODO
blob: 94446487748aa1a3c4067e02b26cf3a4bd85dfbf (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
Currently all the work directed toward the lustre upstream client is tracked
at the following link:

https://jira.hpdd.intel.com/browse/LU-9679

Under this ticket you will see the following work items that need to be
addressed:

******************************************************************************
* libcfs cleanup
*
* https://jira.hpdd.intel.com/browse/LU-9859
*
* Track all the cleanups and simplification of the libcfs module. Remove
* functions the kernel provides. Possible intergrate some of the functionality
* into the kernel proper.
*
******************************************************************************

https://jira.hpdd.intel.com/browse/LU-100086

LNET_MINOR conflicts with USERIO_MINOR

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8130

Fix and simplify libcfs hash handling

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8703

The current way we handle SMP is wrong. Platforms like ARM and KNL can have
core and NUMA setups with things like NUMA nodes with no cores. We need to
handle such cases. This work also greatly simplified the lustre SMP code.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9019

Replace libcfs time API with standard kernel APIs. Also migrate away from
jiffies. We found jiffies can vary on nodes which can lead to corner cases
that can break the file system due to nodes having inconsistent behavior.
So move to time64_t and ktime_t as much as possible.

******************************************************************************
* Proper IB support for ko2iblnd
******************************************************************************
https://jira.hpdd.intel.com/browse/LU-9179

Poor performance for the ko2iblnd driver. This is related to many of the
patches below that are missing from the linux client.
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9886

Crash in upstream kiblnd_handle_early_rxs()
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10394 / LU-10526 / LU-10089

Default to default to using MEM_REG
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10459

throttle tx based on queue depth
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9943

correct WR fast reg accounting
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10291

remove concurrent_sends tunable
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10213

calculate qp max_send_wrs properly
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9810

use less CQ entries for each connection
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10129 / LU-9180

rework map_on_demand behavior
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10129

query device capabilities
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10015

fix race at kiblnd_connect_peer
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9983

allow for discontiguous fragments
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9500

Don't Page Align remote_addr with FastReg
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9448

handle empty CPTs
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9507

Don't Assert On Reconnect with MultiQP
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9472

Fix FastReg map/unmap for MLX5
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9425

Turn on 2 sges by default
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8943

Enable Multiple OPA Endpoints between Nodes
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-5718

multiple sges for work request
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9094

kill timedout txs from ibp_tx_queue
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9094

reconnect peer for REJ_INVALID_SERVICE_ID
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8752

Stop MLX5 triggering a dump_cqe
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8874

Move ko2iblnd to latest RDMA changes
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8875 / LU-8874

Change to new RDMA done callback mechanism

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9164 / LU-8874

Incorporate RDMA map/unamp API's into ko2iblnd

******************************************************************************
* sysfs/debugfs fixes
*
* https://jira.hpdd.intel.com/browse/LU-8066
*
* The original migration to sysfs was done in haste without properly working
* utilities to test the changes. This covers the work to restore the proper
* behavior. Huge project to make this right.
*
******************************************************************************

https://jira.hpdd.intel.com/browse/LU-9431

The function class_process_proc_param was used for our mass updates of proc
tunables. It didn't work with sysfs and it was just ugly so it was removed.
In the process the ability to mass update thousands of clients was lost. This
work restores this in a sane way.

------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9091

One the major request of users is the ability to pass in parameters into a
sysfs file in various different units. For example we can set max_pages_per_rpc
but this can vary on platforms due to different platform sizes. So you can
set this like max_pages_per_rpc=16MiB. The original code to handle this written
before the string helpers were created so the code doesn't follow that format
but it would be easy to move to. Currently the string helpers does the reverse
of what we need, changing bytes to string. We need to change a string to bytes.

******************************************************************************
* Proper user land to kernel space interface for Lustre
*
* https://jira.hpdd.intel.com/browse/LU-9680
*
******************************************************************************

https://jira.hpdd.intel.com/browse/LU-8915

Don't use linux list structure as user land arguments for lnet selftest.
This code is pretty poor quality and really needs to be reworked.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8834

The lustre ioctl LL_IOC_FUTIMES_3 is very generic. Need to either work with
other file systems with similar functionality and make a common syscall
interface or rework our server code to automagically do it for us.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-6202

Cleanup up ioctl handling. We have many obsolete ioctls. Also the way we do
ioctls can be changed over to netlink. This also has the benefit of working
better with HPC systems that do IO forwarding. Such systems don't like ioctls
very well.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9667

More cleanups by making our utilities use sysfs instead of ioctls for LNet.
Also it has been requested to move the remaining ioctls to the netlink API.

******************************************************************************
* Misc
******************************************************************************

------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9855

Clean up obdclass preprocessor code. One of the major eye sores is the various
pointer redirections and macros used by the obdclass. This makes the code very
difficult to understand. It was requested by the Al Viro to clean this up before
we leave staging.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9633

Migrate to sphinx kernel-doc style comments. Add documents in Documentation.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-6142

Possible remaining coding style fix. Remove deadcode. Enforce kernel code
style. Other minor misc cleanups...

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8837

Separate client/server functionality. Functions only used by server can be
removed from client. Most of this has been done but we need a inspect of the
code to make sure.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-8964

Lustre client readahead/writeback control needs to better suit kernel providings.
Currently its being explored. We could end up replacing the CLIO read ahead
abstract with the kernel proper version.

------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9862

Patch that landed for LU-7890 leads to static checker errors
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-9868

dcache/namei fixes for lustre
------------------------------------------------------------------------------

https://jira.hpdd.intel.com/browse/LU-10467

use standard linux wait_events macros work by Neil Brown

------------------------------------------------------------------------------

Please send any patches to Greg Kroah-Hartman <greg@kroah.com>, Andreas Dilger
<andreas.dilger@intel.com>, James Simmons <jsimmons@infradead.org> and
Oleg Drokin <oleg.drokin@intel.com>.