qemu/block, branch master

block/blkio: use FUA flag on write zeroes only if supported

2024-08-12T15:41:29Z

libblkio supports BLKIO_REQ_FUA with write zeros requests only since version 1.4.0, so let's inform the block layer that the blkio driver supports it only in this case. Otherwise we can have runtime errors as reported in https://issues.redhat.com/browse/RHEL-32878 Fixes: fd66dbd424 ("blkio: add libblkio block driver") Cc: qemu-stable@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-32878 Signed-off-by: Stefano Garzarella Reviewed-by: Eric Blake Reviewed-by: Philippe Mathieu-Daudé Message-id: 20240808080545.40744-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi

nbd/server: CVE-2024-7409: Cap default max-connections to 100

2024-08-08T21:02:23Z

Allowing an unlimited number of clients to any web service is a recipe for a rudimentary denial of service attack: the client merely needs to open lots of sockets without closing them, until qemu no longer has any more fds available to allocate. For qemu-nbd, we default to allowing only 1 connection unless more are explicitly asked for (-e or --shared); this was historically picked as a nice default (without an explicit -t, a non-persistent qemu-nbd goes away after a client disconnects, without needing any additional follow-up commands), and we are not going to change that interface now (besides, someday we want to point people towards qemu-storage-daemon instead of qemu-nbd). But for qemu proper, and the newer qemu-storage-daemon, the QMP nbd-server-start command has historically had a default of unlimited number of connections, in part because unlike qemu-nbd it is inherently persistent until nbd-server-stop. Allowing multiple client sockets is particularly useful for clients that can take advantage of MULTI_CONN (creating parallel sockets to increase throughput), although known clients that do so (such as libnbd's nbdcopy) typically use only 8 or 16 connections (the benefits of scaling diminish once more sockets are competing for kernel attention). Picking a number large enough for typical use cases, but not unlimited, makes it slightly harder for a malicious client to perform a denial of service merely by opening lots of connections withot progressing through the handshake. This change does not eliminate CVE-2024-7409 on its own, but reduces the chance for fd exhaustion or unlimited memory usage as an attack surface. On the other hand, by itself, it makes it more obvious that with a finite limit, we have the problem of an unauthenticated client holding 100 fds opened as a way to block out a legitimate client from being able to connect; thus, later patches will further add timeouts to reject clients that are not making progress. This is an INTENTIONAL change in behavior, and will break any client of nbd-server-start that was not passing an explicit max-connections parameter, yet expects more than 100 simultaneous connections. We are not aware of any such client (as stated above, most clients aware of MULTI_CONN get by just fine on 8 or 16 connections, and probably cope with later connections failing by relying on the earlier connections; libvirt has not yet been passing max-connections, but generally creates NBD servers with the intent for a single client for the sake of live storage migration; meanwhile, the KubeSAN project anticipates a large cluster sharing multiple clients [up to 8 per node, and up to 100 nodes in a cluster], but it currently uses qemu-nbd with an explicit --shared=0 rather than qemu-storage-daemon with nbd-server-start). We considered using a deprecation period (declare that omitting max-parameters is deprecated, and make it mandatory in 3 releases - then we don't need to pick an arbitrary default); that has zero risk of breaking any apps that accidentally depended on more than 100 connections, and where such breakage might not be noticed under unit testing but only under the larger loads of production usage. But it does not close the denial-of-service hole until far into the future, and requires all apps to change to add the parameter even if 100 was good enough. It also has a drawback that any app (like libvirt) that is accidentally relying on an unlimited default should seriously consider their own CVE now, at which point they are going to change to pass explicit max-connections sooner than waiting for 3 qemu releases. Finally, if our changed default breaks an app, that app can always pass in an explicit max-parameters with a larger value. It is also intentional that the HMP interface to nbd-server-start is not changed to expose max-connections (any client needing to fine-tune things should be using QMP). Suggested-by: Daniel P. Berrangé Signed-off-by: Eric Blake Message-ID: <20240807174943.771624-12-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé [ericb: Expand commit message to summarize Dan's argument for why we break corner-case back-compat behavior without a deprecation period] Signed-off-by: Eric Blake

vvfat: Fix reading files with non-continuous clusters

2024-08-06T18:12:39Z

When reading with `read_cluster` we get the `mapping` with `find_mapping_for_cluster` and then we call `open_file` for this mapping. The issue appear when its the same file, but a second cluster that is not immediately after it, imagine clusters `500 -> 503`, this will give us 2 mappings one has the range `500..501` and another `503..504`, both point to the same file, but different offsets. When we don't open the file since the path is the same, we won't assign `s->current_mapping` and thus accessing way out of bound of the file. From our example above, after `open_file` (that didn't open anything) we will get the offset into the file with `s->cluster_size*(cluster_num-s->current_mapping->begin)`, which will give us `0x2000 * (504-500)`, which is out of bound for this mapping and will produce some issues. Signed-off-by: Amjad Alsharafi Message-ID: <1f3ea115779abab62ba32c788073cdc99f9ad5dd.1721470238.git.amjadsharafi10@gmail.com> [kwolf: Simplified the patch based on Amjad's analysis and input] Signed-off-by: Kevin Wolf

vvfat: Fix wrong checks for cluster mappings invariant

2024-08-06T18:12:39Z

How this `abort` was intended to check for was: - if the `mapping->first_mapping_index` is not the same as `first_mapping_index`, which **should** happen only in one case, when we are handling the first mapping, in that case `mapping->first_mapping_index == -1`, in all other cases, the other mappings after the first should have the condition `true`. - From above, we know that this is the first mapping, so if the offset is not `0`, then abort, since this is an invalid state. The issue was that `first_mapping_index` is not set if we are checking from the middle, the variable `first_mapping_index` is only set if we passed through the check `cluster_was_modified` with the first mapping, and in the same function call we checked the other mappings. One approach is to go into the loop even if `cluster_was_modified` is not true so that we will be able to set `first_mapping_index` for the first mapping, but since `first_mapping_index` is only used here, another approach is to just check manually for the `mapping->first_mapping_index != -1` since we know that this is the value for the only entry where `offset == 0` (i.e. first mapping). Signed-off-by: Amjad Alsharafi Reviewed-by: Kevin Wolf Message-ID: Signed-off-by: Kevin Wolf

vvfat: Fix usage of `info.file.offset`

2024-08-06T18:12:39Z

The field is marked as "the offset in the file (in clusters)", but it was being used like this `cluster_size*(nums)+mapping->info.file.offset`, which is incorrect. Signed-off-by: Amjad Alsharafi Reviewed-by: Kevin Wolf Message-ID: <72f19a7903886dda1aa78bcae0e17702ee939262.1721470238.git.amjadsharafi10@gmail.com> Signed-off-by: Kevin Wolf

vvfat: Fix bug in writing to middle of file

2024-08-06T18:12:39Z

Before this commit, the behavior when calling `commit_one_file` for example with `offset=0x2000` (second cluster), what will happen is that we won't fetch the next cluster from the fat, and instead use the first cluster for the read operation. This is due to off-by-one error here, where `i=0x2000 !< offset=0x2000`, thus not fetching the next cluster. Signed-off-by: Amjad Alsharafi Reviewed-by: Kevin Wolf Tested-by: Kevin Wolf Message-ID: Signed-off-by: Kevin Wolf

block-copy: Fix missing graph lock

2024-08-06T18:12:39Z

The graph lock needs to be held when calling bdrv_co_pdiscard(). Fix block_copy_task_entry() to take it for the call. WITH_GRAPH_RDLOCK_GUARD() was implemented in a weak way because of limitations in clang's Thread Safety Analysis at the time, so that it only asserts that the lock is held (which allows calling functions that require the lock), but we never deal with the unlocking (so even after the scope of the guard, the compiler assumes that the lock is still held). This is why the compiler didn't catch this locking error. Signed-off-by: Kevin Wolf Message-ID: <20240627181245.281403-2-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi Signed-off-by: Kevin Wolf

block/curl: rewrite http header parsing function

2024-07-17T11:04:15Z

Existing code was long, unclear and twisty. This also relaxes the rules a tiny bit: allows to have whitespace before header name and colon and makes the header value match to be case-insensitive. Signed-off-by: Michael Tokarev Reviewed-by: Vladimir Sementsov-Ogievskiy

Consider discard option when writing zeros

2024-07-11T09:06:36Z

When opening an image with discard=off, we punch hole in the image when writing zeroes, making the image sparse. This breaks users that want to ensure that writes cannot fail with ENOSPACE by using fully allocated images[1]. bdrv_co_pwrite_zeroes() correctly disables BDRV_REQ_MAY_UNMAP if we opened the child without discard=unmap or discard=on. But we don't go through this function when accessing the top node. Move the check down to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths. This change implements the documented behavior, punching holes only when opening the image with discard=on or discard=unmap. This may not be the best default but can improve it later. The test depends on a file system supporting discard, deallocating the entire file when punching hole with the length of the entire file. Tested with xfs, ext4, and tmpfs. [1] https://lists.nongnu.org/archive/html/qemu-discuss/2024-06/msg00003.html Signed-off-by: Nir Soffer Message-id: 20240628202058.1964986-3-nsoffer@redhat.com Signed-off-by: Stefan Hajnoczi

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

2024-07-04T16:16:07Z

* meson: Pass objects and dependencies to declare_dependency(), not static_library() * meson: Drop the .fa library suffix * target/i386: drop AMD machine check bits from Intel CPUID * target/i386: add avx-vnni-int16 feature * target/i386: SEV bugfixes * target/i386: SEV-SNP -cpu host support * char: fix exit issues # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaGceoUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNcpgf/XziKojGOTvYsE7xMijOUswYjCG5m # ZVLqxTug8Q0zO/9mGvluKBTWmh8KhRWOovX5iZL8+F0gPoYPG4ONpNhh3wpA9+S7 # H7ph4V6sDJBX4l3OrOK6htD8dO5D9kns1iKGnE0lY60PkcHl+pU8BNWfK1zYp5US # geiyzuRFRRtDmoNx5+o+w+D+W5msPZsnlj5BnPWM+O/ykeFfSrk2ztfdwHKXUhCB # 5FJcu2sWVx+wsdVzdjgT8USi5+VTK4vabq3SfccmNRxBRnJOCU5MrR63stMDceo4 # TswSB88I0WRV1848AudcGZRkjvKaXLyHJ+QTjg2dp7itEARJ3MGsvOpS5A== # =3kv7 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 04 Jul 2024 02:56:58 AM PDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini " [full] # gpg: aka "Paolo Bonzini " [full] * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: target/i386/SEV: implement mask_cpuid_features target/i386: add support for masking CPUID features in confidential guests char-stdio: Restore blocking mode of stdout on exit target/i386: add avx-vnni-int16 feature i386/sev: Fallback to the default SEV device if none provided in sev_get_capabilities() i386/sev: Fix error message in sev_get_capabilities() target/i386: do not include undefined bits in the AMD topoext leaf target/i386: SEV: fix formatting of CPUID mismatch message target/i386: drop AMD machine check bits from Intel CPUID target/i386: pass X86CPU to x86_cpu_get_supported_feature_word meson: Drop the .fa library suffix Revert "meson: Propagate gnutls dependency" meson: Pass objects and dependencies to declare_dependency() meson: merge plugin_ldflags into emulator_link_args meson: move block.syms dependency out of libblock meson: move shared_module() calls where modules are already walked Signed-off-by: Richard Henderson