<feed xmlns='http://www.w3.org/2005/Atom'>
<title>qemu/migration, branch master</title>
<subtitle>QEMU development tree</subtitle>
<id>https://git.zx2c4.com/qemu/atom/migration?h=master</id>
<link rel='self' href='https://git.zx2c4.com/qemu/atom/migration?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/'/>
<updated>2024-08-20T15:44:13Z</updated>
<entry>
<title>migration/multifd: Free MultiFDRecvParams::data</title>
<updated>2024-08-20T15:44:13Z</updated>
<author>
<name>Peter Maydell</name>
<email>peter.maydell@linaro.org</email>
</author>
<published>2024-08-20T14:44:29Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=4c107870e8b2ba3951ee0c46123f1c3b5d3a19d3'/>
<id>urn:sha1:4c107870e8b2ba3951ee0c46123f1c3b5d3a19d3</id>
<content type='text'>
In multifd_recv_setup() we allocate (among other things)
 * a MultiFDRecvData struct to multifd_recv_state::data
 * a MultiFDRecvData struct to each multfd_recv_state-&gt;params[i].data

(Then during execution we might swap these pointers around.)

But in multifd_recv_cleanup() we free multifd_recv_state-&gt;data
in multifd_recv_cleanup_state() but we don't ever free the
multifd_recv_state-&gt;params[i].data. This results in a memory
leak reported by LeakSanitizer:

(cd build/asan &amp;&amp; \
   ASAN_OPTIONS="fast_unwind_on_malloc=0:strip_path_prefix=/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../" \
   QTEST_QEMU_BINARY=./qemu-system-x86_64 \
   ./tests/qtest/migration-test --tap -k -p /x86_64/migration/multifd/file/mapped-ram )
[...]
Direct leak of 72 byte(s) in 3 object(s) allocated from:
    #0 0x561cc0afcfd8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x218efd8) (BuildId: be72e086d4e47b172b0a72779972213fd9916466)
    #1 0x7f89d37acc50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13
    #2 0x561cc1e9c83c in multifd_recv_setup migration/multifd.c:1606:19
    #3 0x561cc1e68618 in migration_ioc_process_incoming migration/migration.c:972:9
    #4 0x561cc1e3ac59 in migration_channel_process_incoming migration/channel.c:45:9
    #5 0x561cc1e4fa0b in file_accept_incoming_migration migration/file.c:132:5
    #6 0x561cc30f2c0c in qio_channel_fd_source_dispatch io/channel-watch.c:84:12
    #7 0x7f89d37a3c43 in g_main_dispatch debian/build/deb/../../../glib/gmain.c:3419:28
    #8 0x7f89d37a3c43 in g_main_context_dispatch debian/build/deb/../../../glib/gmain.c:4137:7
    #9 0x561cc3b21659 in glib_pollfds_poll util/main-loop.c:287:9
    #10 0x561cc3b1ff93 in os_host_main_loop_wait util/main-loop.c:310:5
    #11 0x561cc3b1fb5c in main_loop_wait util/main-loop.c:589:11
    #12 0x561cc1da2917 in qemu_main_loop system/runstate.c:801:9
    #13 0x561cc3796c1c in qemu_default_main system/main.c:37:14
    #14 0x561cc3796c67 in main system/main.c:48:12
    #15 0x7f89d163bd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #16 0x7f89d163be3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #17 0x561cc0a79fa4 in _start (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x210bfa4) (BuildId: be72e086d4e47b172b0a72779972213fd9916466)

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x561cc0afcfd8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x218efd8) (BuildId: be72e086d4e47b172b0a72779972213fd9916466)
    #1 0x7f89d37acc50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13
    #2 0x561cc1e9bed9 in multifd_recv_setup migration/multifd.c:1588:32
    #3 0x561cc1e68618 in migration_ioc_process_incoming migration/migration.c:972:9
    #4 0x561cc1e3ac59 in migration_channel_process_incoming migration/channel.c:45:9
    #5 0x561cc1e4fa0b in file_accept_incoming_migration migration/file.c:132:5
    #6 0x561cc30f2c0c in qio_channel_fd_source_dispatch io/channel-watch.c:84:12
    #7 0x7f89d37a3c43 in g_main_dispatch debian/build/deb/../../../glib/gmain.c:3419:28
    #8 0x7f89d37a3c43 in g_main_context_dispatch debian/build/deb/../../../glib/gmain.c:4137:7
    #9 0x561cc3b21659 in glib_pollfds_poll util/main-loop.c:287:9
    #10 0x561cc3b1ff93 in os_host_main_loop_wait util/main-loop.c:310:5
    #11 0x561cc3b1fb5c in main_loop_wait util/main-loop.c:589:11
    #12 0x561cc1da2917 in qemu_main_loop system/runstate.c:801:9
    #13 0x561cc3796c1c in qemu_default_main system/main.c:37:14
    #14 0x561cc3796c67 in main system/main.c:48:12
    #15 0x7f89d163bd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #16 0x7f89d163be3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #17 0x561cc0a79fa4 in _start (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x210bfa4) (BuildId: be72e086d4e47b172b0a72779972213fd9916466)

SUMMARY: AddressSanitizer: 96 byte(s) leaked in 4 allocation(s).

Free the params[i].data too.

Cc: qemu-stable@nongnu.org
Fixes: d117ed0699d41 ("migration/multifd: Allow receiving pages without packets")
Signed-off-by: Peter Maydell &lt;peter.maydell@linaro.org&gt;
Reviewed-by: Fabiano Rosas &lt;farosas@suse.de&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>savevm: Fix load_snapshot error path crash</title>
<updated>2024-08-16T13:04:19Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2024-08-13T20:23:26Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=97d2b66dcd8c771065807b4acfd0002dac4385be'/>
<id>urn:sha1:97d2b66dcd8c771065807b4acfd0002dac4385be</id>
<content type='text'>
An error path missed setting *errp, which can cause a NULL deref.

Reviewed-by: Alex Bennée &lt;alex.bennee@linaro.org&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Message-Id: &lt;20240813050638.446172-11-npiggin@gmail.com&gt;
Signed-off-by: Alex Bennée &lt;alex.bennee@linaro.org&gt;
Message-Id: &lt;20240813202329.1237572-19-alex.bennee@linaro.org&gt;
</content>
</entry>
<entry>
<title>migration/multifd: Fix multifd_send_setup cleanup when channel creation fails</title>
<updated>2024-08-02T12:47:40Z</updated>
<author>
<name>Fabiano Rosas</name>
<email>farosas@suse.de</email>
</author>
<published>2024-08-01T17:41:01Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=0bd5b9284fa94a6242a0d27a46380d93e753488b'/>
<id>urn:sha1:0bd5b9284fa94a6242a0d27a46380d93e753488b</id>
<content type='text'>
When a channel fails to create, the code currently just returns. This
is wrong for two reasons:

1) Channel n+1 will not get to initialize it's semaphores, leading to
   an assert when terminate_threads tries to post to it:

 qemu-system-x86_64: ../util/qemu-thread-posix.c:92:
 qemu_mutex_lock_impl: Assertion `mutex-&gt;initialized' failed.

2) (theoretical) If channel n-1 already started creation it will
   defeat the purpose of the channels_created logic which is in place
   to avoid migrate_fd_cleanup() to run while channels are still being
   created.

   This cannot really happen today because the current failure cases
   for multifd_new_send_channel_create() are all synchronous,
   resulting from qio_channel_file_new_path() getting a bad
   filename. This would hit all channels equally.

   But I don't want to set a trap for future people, so have all
   channels try to create (even if failing), and only fail after the
   channels_created semaphore has been posted.

While here, remove the error_report_err call. There's one already at
migrate_fd_cleanup later on.

Cc: qemu-stable@nongnu.org
Reported-by: Jim Fehlig &lt;jfehlig@suse.com&gt;
Fixes: b7b03eb614 ("migration/multifd: Add outgoing QIOChannelFile support")
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>migration: Fix cleanup of iochannel in file migration</title>
<updated>2024-08-02T12:47:40Z</updated>
<author>
<name>Fabiano Rosas</name>
<email>farosas@suse.de</email>
</author>
<published>2024-08-01T17:41:00Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=84ac6fa12df3e96fdae8f3d992a7c2914c9a6ca5'/>
<id>urn:sha1:84ac6fa12df3e96fdae8f3d992a7c2914c9a6ca5</id>
<content type='text'>
The QIOChannelFile object already has its reference decremented by
g_autoptr. Trying to unref an extra time causes:

ERROR:../qom/object.c:1241:object_unref: assertion failed: (obj-&gt;ref &gt; 0)

Fixes: a701c03dec ("migration: Drop reference to QIOChannel if file seeking fails")
Fixes: 6d3279655a ("migration: Fix file migration with fdset")
Reported-by: Jim Fehlig &lt;jfehlig@suse.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>migration: Free removed SaveStateEntry</title>
<updated>2024-08-02T12:47:39Z</updated>
<author>
<name>Akihiko Odaki</name>
<email>akihiko.odaki@daynix.com</email>
</author>
<published>2024-06-27T13:37:51Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=c80e22517f6cfbbbed20e859f146d331694e6488'/>
<id>urn:sha1:c80e22517f6cfbbbed20e859f146d331694e6488</id>
<content type='text'>
This fixes LeakSanitizer warnings.

Signed-off-by: Akihiko Odaki &lt;akihiko.odaki@daynix.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>migration/postcopy: Add postcopy-recover-setup phase</title>
<updated>2024-06-21T12:47:59Z</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2024-06-19T22:30:40Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=4146b77ec7640d3c30d42558e13423594b114385'/>
<id>urn:sha1:4146b77ec7640d3c30d42558e13423594b114385</id>
<content type='text'>
This patch adds a migration state on src called "postcopy-recover-setup".
The new state will describe the intermediate step starting from when the
src QEMU received a postcopy recovery request, until the migration channels
are properly established, but before the recovery process take place.

The request came from Libvirt where Libvirt currently rely on the migration
state events to detect migration state changes.  That works for most of the
migration process but except postcopy recovery failures at the beginning.

Currently postcopy recovery only has two major states:

  - postcopy-paused: this is the state that both sides of QEMU will be in
    for a long time as long as the migration channel was interrupted.

  - postcopy-recover: this is the state where both sides of QEMU handshake
    with each other, preparing for a continuation of postcopy which used to
    be interrupted.

The issue here is when the recovery port is invalid, the src QEMU will take
the URI/channels, noticing the ports are not valid, and it'll silently keep
in the postcopy-paused state, with no event sent to Libvirt.  In this case,
the only thing Libvirt can do is to poll the migration status with a proper
interval, however that's less optimal.

Considering that this is the only case where Libvirt won't get a
notification from QEMU on such events, let's add postcopy-recover-setup
state to mimic what we have with the "setup" state of a newly initialized
migration, describing the phase of connection establishment.

With that, postcopy recovery will have two paths to go now, and either path
will guarantee an event generated.  Now the events will look like this
during a recovery process on src QEMU:

  - Initially when the recovery is initiated on src, QEMU will go from
    "postcopy-paused" -&gt; "postcopy-recover-setup".  Old QEMUs don't have
    this event.

  - Depending on whether the channel re-establishment is succeeded:

    - In succeeded case, src QEMU will move from "postcopy-recover-setup"
      to "postcopy-recover".  Old QEMUs also have this event.

    - In failure case, src QEMU will move from "postcopy-recover-setup" to
      "postcopy-paused" again.  Old QEMUs don't have this event.

This guarantees that Libvirt will always receive a notification for
recovery process properly.

One thing to mention is, such new status is only needed on src QEMU not
both.  On dest QEMU, the state machine doesn't change.  Hence the events
don't change either.  It's done like so because dest QEMU may not have an
explicit point of setup start.  E.g., it can happen that when dest QEMUs
doesn't use migrate-recover command to use a new URI/channel, but the old
URI/channels can be reused in recovery, in which case the old ports simply
can work again after the network routes are fixed up.

Add a new helper postcopy_is_paused() detecting whether postcopy is still
paused, taking RECOVER_SETUP into account too.  When using it on both
src/dst, a slight change is done altogether to always wait for the
semaphore before checking the status, because for both sides a sem_post()
will be required for a recovery.

Cc: Jiri Denemark &lt;jdenemar@redhat.com&gt;
Cc: Prasad Pandit &lt;ppandit@redhat.com&gt;
Reviewed-by: Fabiano Rosas &lt;farosas@suse.de&gt;
Buglink: https://issues.redhat.com/browse/RHEL-38485
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>migration: Cleanup incoming migration setup state change</title>
<updated>2024-06-21T12:47:59Z</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2024-06-19T22:30:39Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=4dd5f7b8d568116b3ce594b0055a47c6db50f49c'/>
<id>urn:sha1:4dd5f7b8d568116b3ce594b0055a47c6db50f49c</id>
<content type='text'>
Destination QEMU can setup incoming ports for two purposes: either a fresh
new incoming migration, in which QEMU will switch to SETUP for channel
establishment, or a paused postcopy migration, in which QEMU will stay in
POSTCOPY_PAUSED until kicking off the RECOVER phase.

Now the state machine worked on dest node for the latter, only because
migrate_set_state() implicitly will become a noop if the current state
check failed.  It wasn't clear at all.

Clean it up by providing a helper migration_incoming_state_setup() doing
proper checks over current status.  Postcopy-paused will be explicitly
checked now, and then we can bail out for unknown states.

Reviewed-by: Fabiano Rosas &lt;farosas@suse.de&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>migration: Use MigrationStatus instead of int</title>
<updated>2024-06-21T12:47:59Z</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2024-06-19T22:30:38Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=a5c24e13e9f176901058b460e61425756322f3e8'/>
<id>urn:sha1:a5c24e13e9f176901058b460e61425756322f3e8</id>
<content type='text'>
QEMU uses "int" in most cases even if it stores MigrationStatus.  I don't
know why, so let's try to do that right and see what blows up..

Reviewed-by: Fabiano Rosas &lt;farosas@suse.de&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>migration: Rename thread debug names</title>
<updated>2024-06-21T12:47:59Z</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2024-06-19T22:30:37Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=60ce47675d74ddae3f13a32767d097d9fecbda4b'/>
<id>urn:sha1:60ce47675d74ddae3f13a32767d097d9fecbda4b</id>
<content type='text'>
The postcopy thread names on dest QEMU are slightly confusing, partly I'll
need to blame myself on 36f62f11e4 ("migration: Postcopy preemption
preparation on channel creation").  E.g., "fault-fast" reads like a fast
version of "fault-default", but it's actually the fast version of
"postcopy/listen".

Taking this chance, rename all the migration threads with proper rules.
Considering we only have 15 chars usable, prefix all threads with "mig/",
meanwhile identify src/dst threads properly this time.  So now most thread
names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except
the bg-snapshot thread which doesn't have a direction.

For multifd threads, making them "mig/{src|dst}/{send|recv}_%d".

We used to have "live_migration" thread for a very long time, now it's
called "mig/src/main".  We may hope to have "mig/dst/main" soon but not
yet.

Reviewed-by: Fabiano Rosas &lt;farosas@suse.de&gt;
Reviewed-by: Zhijian Li (Fujitsu) &lt;lizhijian@fujitsu.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
<entry>
<title>migration/multifd: Avoid the final FLUSH in complete()</title>
<updated>2024-06-21T12:47:59Z</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2024-06-19T22:30:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/qemu/commit/?id=637280aeb242517ede480aa2d5ba1c29d41eac11'/>
<id>urn:sha1:637280aeb242517ede480aa2d5ba1c29d41eac11</id>
<content type='text'>
We always do the flush when finishing one round of scan, and during
complete() phase we should scan one more round making sure no dirty page
existed.  In that case we shouldn't need one explicit FLUSH at the end of
complete(), as when reaching there all pages should have been flushed.

Reviewed-by: Fabiano Rosas &lt;farosas@suse.de&gt;
Tested-by: Fabiano Rosas &lt;farosas@suse.de&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Fabiano Rosas &lt;farosas@suse.de&gt;
</content>
</entry>
</feed>
