<feed xmlns='http://www.w3.org/2005/Atom'>
<title>wireguard-linux/kernel/trace/ring_buffer.c, branch jd/bump-compilers</title>
<subtitle>WireGuard for the Linux kernel</subtitle>
<id>https://git.zx2c4.com/wireguard-linux/atom/kernel/trace/ring_buffer.c?h=jd%2Fbump-compilers</id>
<link rel='self' href='https://git.zx2c4.com/wireguard-linux/atom/kernel/trace/ring_buffer.c?h=jd%2Fbump-compilers'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/'/>
<updated>2024-08-08T00:26:44Z</updated>
<entry>
<title>ring-buffer: Remove unused function ring_buffer_nr_pages()</title>
<updated>2024-08-08T00:26:44Z</updated>
<author>
<name>Jianhui Zhou</name>
<email>912460177@qq.com</email>
</author>
<published>2024-08-05T11:36:31Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=58f7e4d7ba32758b861807e77535853cacc1f426'/>
<id>urn:sha1:58f7e4d7ba32758b861807e77535853cacc1f426</id>
<content type='text'>
Because ring_buffer_nr_pages() is not an inline function and user accesses
buffer-&gt;buffers[cpu]-&gt;nr_pages directly, the function ring_buffer_nr_pages
is removed.

Signed-off-by: Jianhui Zhou &lt;912460177@qq.com&gt;
Link: https://lore.kernel.org/tencent_F4A7E9AB337F44E0F4B858D07D19EF460708@qq.com
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Fix a race between readers and resize checks</title>
<updated>2024-05-21T23:03:35Z</updated>
<author>
<name>Petr Pavlu</name>
<email>petr.pavlu@suse.com</email>
</author>
<published>2024-05-17T13:40:08Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=c2274b908db05529980ec056359fae916939fdaa'/>
<id>urn:sha1:c2274b908db05529980ec056359fae916939fdaa</id>
<content type='text'>
The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old-&gt;list.prev-&gt;next to point it to the
new page. Following that, if the operation is successful,
old-&gt;list.next-&gt;prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page-&gt;prev-&gt;next or
page-&gt;next-&gt;prev might not be equal back to page for some page in the
ring buffer.

The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:

[  190.271762] ------------[ cut here ]------------
[  190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[  190.271789] Modules linked in: [...]
[  190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[  190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G            E      6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[  190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[  190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[  190.272023] Code: [...]
[  190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[  190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[  190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[  190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[  190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[  190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[  190.272053] FS:  00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[  190.272057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[  190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  190.272077] Call Trace:
[  190.272098]  &lt;TASK&gt;
[  190.272189]  ring_buffer_resize+0x2ab/0x460
[  190.272199]  __tracing_resize_ring_buffer.part.0+0x23/0xa0
[  190.272206]  tracing_resize_ring_buffer+0x65/0x90
[  190.272216]  tracing_entries_write+0x74/0xc0
[  190.272225]  vfs_write+0xf5/0x420
[  190.272248]  ksys_write+0x67/0xe0
[  190.272256]  do_syscall_64+0x82/0x170
[  190.272363]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  190.272373] RIP: 0033:0x7f1bd657d263
[  190.272381] Code: [...]
[  190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[  190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[  190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[  190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[  190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[  190.272412]  &lt;/TASK&gt;
[  190.272414] ---[ end trace 0000000000000000 ]---

Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.

The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():

 ret = rb_head_page_replace(reader, cpu_buffer-&gt;reader_page);
 if (!ret)
 	goto spin;
 for (unsigned i = 0; i &lt; 1U &lt;&lt; 26; i++)  /* inserted delay loop */
 	__asm__ __volatile__ ("" : : : "memory");
 rb_list_head(reader-&gt;list.next)-&gt;prev = &amp;cpu_buffer-&gt;reader_page-&gt;list;

.. and then run the following commands on the target system:

 echo 1 &gt; /sys/kernel/tracing/events/sched/sched_switch/enable
 while true; do
 	echo 16 &gt; /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 	echo 8 &gt; /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 done &amp;
 while true; do
 	for i in /sys/kernel/tracing/per_cpu/*; do
 		timeout 0.1 cat $i/trace_pipe; sleep 0.2
 	done
 done

To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer-&gt;reader_lock.

Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu &lt;petr.pavlu@suse.com&gt;
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Correct stale comments related to non-consuming readers</title>
<updated>2024-05-21T23:03:35Z</updated>
<author>
<name>Petr Pavlu</name>
<email>petr.pavlu@suse.com</email>
</author>
<published>2024-05-17T13:40:07Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=ea70a9628e93168bb5a07b35f7b9a162a65afe9e'/>
<id>urn:sha1:ea70a9628e93168bb5a07b35f7b9a162a65afe9e</id>
<content type='text'>
Adjust the following code documentation:

* Kernel-doc comments for ring_buffer_read_prepare() and
  ring_buffer_read_finish() mention that recording to the ring buffer is
  disabled when the read is active. Remove mention of this restriction
  because it was already lifted in commit 1039221cc278 ("ring-buffer: Do
  not disable recording when there is an iterator").

* Function ring_buffer_read_finish() performs a self-check of the
  ring-buffer by locking cpu_buffer-&gt;reader_lock and then calling
  rb_check_pages(). The preceding comment explains that the lock is
  needed because rb_check_pages() clears the HEAD flag required by
  readers which might be running in parallel. Remove this explanation
  because commit 8843e06f67b1 ("ring-buffer: Handle race between
  rb_move_tail and rb_check_pages") simplified the function so it no
  longer resets the mentioned flag. Nonetheless, the lock is still
  needed because a reader swapping a page into the ring buffer can make
  the underlying doubly-linked list temporarily inconsistent.

This is a non-functional change.

Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-2-petr.pavlu@suse.com

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: Petr Pavlu &lt;petr.pavlu@suse.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Add cast to unsigned long addr passed to virt_to_page()</title>
<updated>2024-05-15T05:28:42Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-05-15T05:05:58Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=b9c6820f029abaabbc37646093866aa730ca0928'/>
<id>urn:sha1:b9c6820f029abaabbc37646093866aa730ca0928</id>
<content type='text'>
The sub-buffer pages are held in an unsigned long array, and when it is
passed to virt_to_page() a cast is needed.

Link: https://lore.kernel.org/all/20240515124808.06279d04@canb.auug.org.au/
Link: https://lore.kernel.org/linux-trace-kernel/20240515010558.4abaefdd@rorschach.local.home

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Fixes: 117c39200d9d ("ring-buffer: Introducing ring-buffer mapping functions")
Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Have mmapped ring buffer keep track of missed events</title>
<updated>2024-05-13T23:55:22Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-03-12T21:54:05Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=fe832be05a8eee5f1488cbcc2c562dd82d079fd6'/>
<id>urn:sha1:fe832be05a8eee5f1488cbcc2c562dd82d079fd6</id>
<content type='text'>
While testing libtracefs on the mmapped ring buffer, the test that checks
if missed events are accounted for failed when using the mapped buffer.
This is because the mapped page does not update the missed events that
were dropped because the writer filled up the ring buffer before the
reader could catch it.

Add the missed events to the reader page/sub-buffer when the IOCTL is done
and a new reader page is acquired.

Note that all accesses to the reader_page via rb_page_commit() had to be
switched to rb_page_size(), and rb_page_size() which was just a copy of
rb_page_commit() but now it masks out the RB_MISSED bits. This is needed
as the mapped reader page is still active in the ring buffer code and
where it reads the commit field of the bpage for the size, it now must
mask it otherwise the missed bits that are now set will corrupt the size
returned.

Link: https://lore.kernel.org/linux-trace-kernel/20240312175405.12fb6726@gandalf.local.home

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Introducing ring-buffer mapping functions</title>
<updated>2024-05-13T22:09:56Z</updated>
<author>
<name>Vincent Donnefort</name>
<email>vdonnefort@google.com</email>
</author>
<published>2024-05-10T14:04:31Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=117c39200d9d760cbd5944bb89efb7b9c51965aa'/>
<id>urn:sha1:117c39200d9d760cbd5944bb89efb7b9c51965aa</id>
<content type='text'>
In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

  ring_buffer_{map,unmap}()

And controls on the ring-buffer:

  ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

  A unique ID for each subbuf of the ring-buffer, currently they are
  only identified through their in-kernel VA.

  A meta-page, where are stored ring-buffer statistics and a
  description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

Link: https://lore.kernel.org/linux-trace-kernel/20240510140435.3550353-3-vdonnefort@google.com

CC: &lt;linux-mm@kvack.org&gt;
Signed-off-by: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Allocate sub-buffers with __GFP_COMP</title>
<updated>2024-05-13T22:09:55Z</updated>
<author>
<name>Vincent Donnefort</name>
<email>vdonnefort@google.com</email>
</author>
<published>2024-05-10T14:04:30Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=c09d4167b550f91ecf5c3db883eea314edc7f532'/>
<id>urn:sha1:c09d4167b550f91ecf5c3db883eea314edc7f532</id>
<content type='text'>
In preparation for the ring-buffer memory mapping, allocate compound
pages for the ring-buffer sub-buffers to enable us to map them to
user-space with vm_insert_pages().

Link: https://lore.kernel.org/linux-trace-kernel/20240510140435.3550353-2-vdonnefort@google.com

Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Only update pages_touched when a new page is touched</title>
<updated>2024-04-11T21:49:57Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-04-09T19:13:09Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=ffe3986fece696cf65e0ef99e74c75f848be8e30'/>
<id>urn:sha1:ffe3986fece696cf65e0ef99e74c75f848be8e30</id>
<content type='text'>
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:

 1) pages_read - incremented whenever a new sub-buffer is consumed
 2) pages_lost - incremented every time a writer overwrites a sub-buffer
 3) pages_touched - incremented when a write goes to a new sub-buffer

The percentage is the calculation of:

  (pages_touched - (pages_lost + pages_read)) / nr_pages

Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.

It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.

This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.

Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Make wake once of ring_buffer_wait() more robust</title>
<updated>2024-03-18T14:11:43Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-03-15T10:31:15Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=b70f2938242a028f8e9473781ede175486a59dc8'/>
<id>urn:sha1:b70f2938242a028f8e9473781ede175486a59dc8</id>
<content type='text'>
The default behavior of ring_buffer_wait() when passed a NULL "cond"
parameter is to exit the function the first time it is woken up. The
current implementation uses a counter that starts at zero and when it is
greater than one it exits the wait_event_interruptible().

But this relies on the internal working of wait_event_interruptible() as
that code basically has:

  if (cond)
    return;
  prepare_to_wait();
  if (!cond)
    schedule();
  finish_wait();

That is, cond is called twice before it sleeps. The default cond of
ring_buffer_wait() needs to account for that and wait for its counter to
increment twice before exiting.

Instead, use the seq/atomic_inc logic that is used by the tracing code
that calls this function. Add an atomic_t seq to rb_irq_work and when cond
is NULL, have the default callback take a descriptor as its data that
holds the rbwork and the value of the seq when it started.

The wakeups will now increment the rbwork-&gt;seq and the cond callback will
simply check if that number is different, and no longer have to rely on
the implementation of wait_event_interruptible().

Link: https://lore.kernel.org/linux-trace-kernel/20240315063115.6cb5d205@gandalf.local.home

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Fixes: 7af9ded0c2ca ("ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: use READ_ONCE() to read cpu_buffer-&gt;commit_page in concurrent environment</title>
<updated>2024-03-17T11:58:53Z</updated>
<author>
<name>linke li</name>
<email>lilinke99@qq.com</email>
</author>
<published>2024-03-02T04:42:21Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=f1e30cb6369251c03f63c564006f96a54197dcc4'/>
<id>urn:sha1:f1e30cb6369251c03f63c564006f96a54197dcc4</id>
<content type='text'>
In function ring_buffer_iter_empty(), cpu_buffer-&gt;commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.

Link: https://lore.kernel.org/linux-trace-kernel/tencent_DFF7D3561A0686B5E8FC079150A02505180A@qq.com

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: linke li &lt;lilinke99@qq.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
</feed>
