cgit - A hyperfast web frontend for git repositories written in C.

	Commit message (Collapse)	Author	Age	Files	Lines
*	cache: truncate lock file before filling	Jason A. Donenfeld	2026-04-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lock_slot() opens the lock file with O_RDWR\|O_CREAT but without O_TRUNC. If a previous cgit process was killed between lock_slot() and unlock_slot() (e.g. by a CGI timeout or OOM), the stale lock file remains on disk with the old content, and the kernel releases the fcntl lock. The next process to claim the same cache slot then opens this stale lock file, acquires the fcntl lock, writes its key and generated content on top of the old bytes. If the new response is shorter than what was previously in the file, trailing bytes from the old response survive beyond the end of the new content. fstat() in fill_slot() reports the total file size (including the stale tail), and print_slot() faithfully sends all of it -- producing a response that is the correct page followed by a fragment of whatever previously occupied that lock file. Fix this by truncating the lock file after acquiring the lock and before writing the new key. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	cache: tolerate short writes in print_slot	Hristo Venev	2022-12-19	1	-20/+25
\| \| \| \| \| \| \| \| \| \| \|	sendfile() can return after a short read/write, so we may need to call it more than once. As suggested in the manual page, we fall back to read/write if sendfile fails with EINVAL or ENOSYS. On the read/write path, use write_in_full which deals with short writes. Signed-off-by: Hristo Venev <hristo@venev.name> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	git: update to v2.30.0	Christian Hesse	2020-12-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Update to git version v2.30.0, this requires changes for these upstream commits: * 88894aaeeae92e8cb41143cc2e045f50289dc790 blame: simplify 'setup_scoreboard' interface * 1fbfdf556f2abc708183caca53ae4e2881b46ae2 banned.h: mark non-reentrant gmtime, etc as banned Signed-off-by: Christian Hesse <mail@eworm.de>
*	cache: close race window when unlocking slots	John Keeping	2018-06-27	1	-23/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use POSIX advisory record locks to control access to cache slots, but these have an unhelpful behaviour in that they are released when any file descriptor referencing the file is closed by this process. Mostly this is okay, since we know we won't be opening the lock file anywhere else, but there is one place that it does matter: when we restore stdout we dup2() over a file descriptor referring to the file, thus closing that descriptor. Since we restore stdout before unlocking the slot, this creates a window during which the slot content can be overwritten. The fix is reasonably straightforward: simply restore stdout after unlocking the slot, but the diff is a bit bigger because this requires us to move the temporary stdout FD into struct cache_slot. Signed-off-by: John Keeping <john@keeping.me.uk> Reviewed-by: Christian Hesse <mail@eworm.de>
*	global: spelling fixes	Ville Skyttä	2017-10-15	1	-1/+1
\| \| \| \|	Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
*	cache: flush stdio before restoring FDs	John Keeping	2017-10-03	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	As described in commit 2efb59e (ui-patch: Flush stdout after outputting data, 2014-06-11), we need to ensure that stdout is flushed before restoring the file descriptor when writing to the cache. It turns out that it's not just ui-patch that is affected by this but also raw diff which writes to stdout internally. Let's avoid risking more places doing this by ensuring that stdout is flushed after writing in fill_slot(). Signed-off-by: John Keeping <john@keeping.me.uk>
*	cache: don't check for match with no key	John Keeping	2016-01-17	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	We call open_slot() from cache_ls() without a key since we simply want to read the path out of the header. Should the file happen to contain an empty key then we end up calling memcmp() with NULL and a non-zero length. Fix this by assigning slot->match only if a key is set, which is always will be in the code paths where we use slot->match. Coverity-id: 13807 Signed-off-by: John Keeping <john@keeping.me.uk>
*	cache: use size_t for string lengths	John Keeping	2016-01-17	1	-2/+2
\| \| \| \| \| \| \|	Avoid integer truncation on 64-bit systems. Coverity-id: 13864 Signed-off-by: John Keeping <john@keeping.me.uk>
*	cache: fix resource leak: close file handle before return	Christian Hesse	2015-10-10	1	-3/+9
\| \| \| \| \|	Coverity-id: 13910 Signed-off-by: Christian Hesse <mail@eworm.de>
*	cache.c: fix header order	John Keeping	2015-08-13	1	-3/+3
\| \| \| \| \| \| \| \|	git-compat-util.h may define values that affect how system headers are interpreted, so move sys/sendfile.h after cgit.h (which includes git-compat-util.h). Signed-off-by: John Keeping <john@keeping.me.uk>
*	cache: don't use an integer as a NULL pointer	John Keeping	2015-03-09	1	-1/+1
\| \| \| \|	Signed-off-by: John Keeping <john@keeping.me.uk>
*	cache: use F_SETLK to avoid stale lock files	John Keeping	2015-03-03	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If CGit is killed while it holds a lock on a cache slot (for example because it is taking too long to generate a page), the lock file will be left in place. This prevents any future attempt to use the same slot since it will fail to exclusively create the lock file. Since CGit is the only program that should be manipulating lock files, we can use advisory locking to detect whether another process is actually using the lock file or if it is now stale. I have confirmed that this works on Linux by setting a short TTL in a custom cgitrc and running the following with CGit patched to print a message to stderr if the fcntl(2) fails: $ export CGIT_CONFIG=$PWD/cgitrc $ export QUERY_STRING=url=cgit/tree/ui-shared.c $ ./cgit \| grep -v -e '^<div class=.footer.>' \ -e '^Last-Modified: ' \ -e ^'Expires: ' >expect $ seq 50000 \| dd bs=8192 \| parallel -j200 "diff -u expect <(./cgit \| grep -v -e '^<div class=.footer.>' \ -e '^Last-Modified: ' \ -e ^'Expires: ') \|\| echo BAD" This printed the fail message several times without ever printing "BAD". Signed-off-by: John Keeping <john@keeping.me.uk>
*	Skip cache slot when time-to-live is zero	Lukas Fleischer	2014-02-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	If time-to-live is set to zero, we don't need to regenerate the cache slots on every request. Instead, just skip the caching process and immediately provide the dynamically generated version of the page. Setting time-to-live to zero is useful when you want to disable caching for certain pages. Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
*	cache: use sendfile() instead of a pair of read() + write()	Sebastian Andrzej Siewior	2014-01-19	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \|	sendfile() does the same job and avoids to copy the content into userland and back. One has to define NO_SENDFILE in case the OS (kernel / libc) does not supported. It is disabled by default on non-linux environemnts. According to the glibc, sendfile64() was added in Linux 2.4 (so it has been there for a while) but after browsing over the mapage of FreeBSD's I noticed that the prototype is little different. Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
*	Switch to exclusively using global ctx	Lukas Fleischer	2014-01-17	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Drop the context parameter from the following functions (and all static helpers used by them) and use the global context instead: * cgit_print_http_headers() * cgit_print_docstart() * cgit_print_pageheader() Remove context parameter from all commands Drop the context parameter from the following functions (and all static helpers used by them) and use the global context instead: * cgit_get_cmd() * All cgit command functions. * cgit_clone_info() * cgit_clone_objects() * cgit_clone_head() * cgit_print_plain() * cgit_show_stats() In initialization routines, use the global context variable instead of passing a pointer around locally. Remove callback data parameter for cache slots This is no longer needed since the context is always read from the global context variable. Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
*	cache: don't leave cache_slot fields uninitialized	John Keeping	2014-01-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Valgrind says: ==18344== Conditional jump or move depends on uninitialised value(s) ==18344== at 0x406C83: open_slot (cache.c:63) ==18344== by 0x407478: cache_ls (cache.c:403) ==18344== by 0x404C9A: process_request (cgit.c:639) ==18344== by 0x406BD2: fill_slot (cache.c:190) ==18344== by 0x4071A0: cache_process (cache.c:284) ==18344== by 0x404461: main (cgit.c:952) ==18344== Uninitialised value was created by a stack allocation ==18344== at 0x40738B: cache_ls (cache.c:375) This is caused by the keylen field being used to calculate whether or not a slot is matched. We never then check the value of this and the length of data read depends on the key length read from the file so this isn't dangerous, but it's nice to avoid branching based on uninitialized data. Signed-off-by: John Keeping <john@keeping.me.uk>
*	Update copyright information	Lukas Fleischer	2014-01-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	* Name "cgit Development Team" as copyright holder to avoid listing every single developer. * Update copyright ranges. Signed-off-by: Lukas Fleischer <cgit@crytocrack.de>
*	cache.c: cache ls_cache output properly	John Keeping	2013-05-22	1	-6/+7
\| \| \| \| \| \| \| \| \|	By using the standard library's printf, cache_ls does not redirect its output to the cache when we change the process' stdout file descriptor to point to the cache file. Fix this by using "htmlf" in the same way that we do for writing HTTP headers. Signed-off-by: John Keeping <john@keeping.me.uk>
*	cache.c: fix cache_ls	John Keeping	2013-05-18	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \|	Commit fb3655d (use struct strbuf instead of static buffers, 2013-04-06) broke the logic in cache.c::cache_ls by failing to set slot->cache_name before calling open_slot. While fixing this, also free the strbufs added by that commit once we're done with them. Signed-off-by: John Keeping <john@keeping.me.uk>
*	use struct strbuf instead of static buffers	John Keeping	2013-04-08	1	-37/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use "struct strbuf" from Git to remove the limit on file path length. Notes on scan-tree: This is slightly involved since I decided to pass the strbuf into add_repo() and modify if whenever a new file name is required, which should avoid any extra allocations within that function. The pattern there is to append the filename, use it and then reset the buffer to its original length (retaining a trailing '/'). Notes on ui-snapshot: Since write_archive modifies the argv array passed to it we copy the argv_array values into a new array of char* and then free the original argv_array structure and the new array without worrying about what the values now look like. Signed-off-by: John Keeping <john@keeping.me.uk>
*	Mark several functions/variables static	Lukas Fleischer	2013-03-04	1	-1/+1
\| \| \| \| \| \|	Spotted by parsing the output of `gcc -Wmissing-prototypes [...]`. Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
*	White space around control verbs.	Jason A. Donenfeld	2013-03-04	1	-3/+3
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	Fix several whitespace errors	Lukas Fleischer	2013-03-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	* Remove whitespace at the end of lines. * Replace space indentation by tabs. * Add whitespace before/after several operators ("+", "-", "", ...) Add whitespace to assignments ("foo = bar;"). * Fix whitespace in parameter lists ("foobar(foo, bar, 42)"). Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
*	Fix some warnings to allow -Werror	Ramsay Jones	2008-11-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The type used to declare the st_size field of a 'struct stat' can be a 32- or 64-bit sized type, which can vary from one platform to another, or even from one compilation to another. In particular, on linux, if you include the following define: #define _FILE_OFFSET_BITS 64 prior to including certain system header files, then the type used for the st_size field will be __off64_t, otherwise it will be an __off_t. Note that the above define is included at the top of git-compat-util.h. In cache.c, the "%zd" format specifier expects a "signed size_t", another type which can vary, when an __off64_t or a __off_t is provided. To supress the warning, use the PRIuMAX format specifier and cast the st_size field to uintmax_t. This should work an any platform for which git currently compiles. In ui-plain.c, the size parameter of sha1_object_info() and read_sha1_file() is defined to be "unsigned long " not "size_t ". So, to supress the warning, simply declare size with the correct type. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	cache.c: use %zd for off_t argument	Lars Hjemli	2008-09-01	1	-1/+1
\| \| \| \|	Signed-off-by: Lars Hjemli <hjemli@gmail>
*	cache.c: fix error checking in print_slot()	Lars Hjemli	2008-05-20	1	-4/+7
\| \| \| \| \| \| \| \|	The change to print_slot() in cdc6b2f8e7a8d43dcfe0475a9d3498333ea686b8 made the function return correct errno for read errors while ignoring write errors, which is not what was intended. This patch tries to rectify things. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	cache.c: do not ignore errors from print_slot()	Lars Hjemli	2008-05-18	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \|	If print_slot() fails, the client will be served an inferior response. This patch makes sure that such an error will be returned to main(), which in turn will try to inform about the error in the response itself. The error is also printed to the cache_log, i.e. stderr, which will make the error message appear in error_log (atleast when httpd==apache). Noticed-by: Jim Meyering <jim@meyering.net> Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	cache.c: use xread()/xwrite() from libgit	Lars Hjemli	2008-05-18	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	These functions handles EINTR/EAGAIN errors during read/write operations, which is something cache.c didn't. While at it, fix a bug in print_slot() where errors during reading from the cache slot might go by unnoticed. Noticed-by: Jim Meyering <jim@meyering.net> Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	cache.c: make all io-related functions return errno on error	Lars Hjemli	2008-05-18	1	-9/+24
\| \| \| \| \| \| \| \|	We'll need proper return-values from these functions to make the cache behave correctly (which includes giving proper error messages). Noticed-by: Jim Meyering <jim@meyering.net> Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	cache.c: read(2) returns -1 on error, not 0	Lars Hjemli	2008-05-18	1	-1/+1
\| \| \| \| \|	Noticed-by: Jim Meyering <jim@meyering.net> Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Add page 'ls_cache'	Lars Hjemli	2008-04-28	1	-0/+68
\| \| \| \| \| \| \| \|	This new page will list all entries found in the current cache, which is useful when reviewing the new cache implementation. There are no links to the new page, but it's reachable by adding 'p=ls_cache' to any cgit url. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Redesign the caching layer	Lars Hjemli	2008-04-28	1	-66/+289
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original caching layer in cgit has no upper bound on the number of concurrent cache entries, so when cgit is traversed by a spider (like the googlebot), the cache might end up filling your disk. Also, if any error occurs in the cache layer, no content is returned to the client. This patch redesigns the caching layer to avoid these flaws by * giving the cache a bound number of slots * disabling the cache for the current request when errors occur The cache size limit is implemented by hashing the querystring (the cache lookup key) and generating a cache filename based on this hash modulo the cache size. In order to detect hash collisions, the full lookup key (i.e. the querystring) is stored in the cache file (separated from its associated content by ascii 0). The cache filename is the reversed 8-digit hexadecimal representation of hash(key) % cache_size which should make the filesystem lookup pretty fast (if directory content is indexed/sorted); reversing the representation avoids the problem where all keys have equal prefix. There is a new config option, cache-size, which sets the upper bound for the cache. Default value for this option is 0, which has the same effect as setting nocache=1 (hence nocache is now deprecated). Included in this patch is also a new testfile which verifies that the new option works as intended. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Add cache.h	Lars Hjemli	2008-03-27	1	-0/+1
\| \| \| \| \| \| \| \|	The functions found in cache.c are only used by cgit.c, so there's no point in rebuilding all object files when the cache interface is changed. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Move cgit_repo into cgit_context	Lars Hjemli	2008-02-16	1	-3/+3
\| \| \| \| \| \| \| \|	This removes the global variable which is used to keep track of the currently selected repository, and adds a new variable in the cgit_context structure. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Add all config variables into struct cgit_context	Lars Hjemli	2008-02-16	1	-5/+5
\| \| \| \| \| \| \| \|	This removes another big set of global variables, and introduces the cgit_prepare_context() function which populates a context-variable with compile-time default values. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Introduce struct cgit_context	Lars Hjemli	2008-02-16	1	-2/+2
\| \| \| \| \| \| \| \| \|	This struct will hold all the cgit runtime information currently found in a multitude of global variables. The first cleanup removes all querystring-related variables. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	cache_safe_filename() needs more buffers	Lars Hjemli	2007-05-18	1	-4/+9
\| \| \| \| \| \| \| \|	The single static buffer makes it impossible to use the result of two different calls to this function simultaneously. Fix it by using 4 buffers. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Enable url=value querystring parameter	Lars Hjemli	2007-05-18	1	-3/+6
\| \| \| \| \| \| \|	This makes is possible to use repo-urls like '/pub/scm/git/git.git' and even add path specifications, like '/pub/scm/git/git.git/log/documentation'. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Remove troublesome chars from cachefile names	Lars Hjemli	2007-01-12	1	-0/+16
\| \| \| \| \| \| \|	Add a funtion cache_safe_filename() which replaces possibly bad filename characters with '_'. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Move cache_prepare() to cgit	Lars Hjemli	2007-01-12	1	-22/+0
\| \| \| \| \| \|	This moves some cgit-specific stuff away from cache.c Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Allow relative paths for cgit_cache_root	Lars Hjemli	2006-12-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Make sure we chdir(2) back to the original getcwd(2) when a page has been generated. Also, if the cgit_cache_root do not exist, try to create it. This is a feature intended to ease testing/debugging. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	cache_lock: do xstrdup/free on lockfile	Lars Hjemli	2006-12-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since fmt() uses 8 alternating static buffers, and cache_lock might call cache_create_dirs() multiple times, which in turn might call fmt() twice, after four iterations lockfile would be overwritten by a cachedirectory path. In worst case, this could cause the cachedirectory to be unlinked and replaced by a cachefile. Fix: use xstrdup() on the result from fmt() before assigning to lockfile, and call free(lockfile) before exit. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Don't truncate valid cachefiles	Lars Hjemli	2006-12-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	An embarrassing thinko in cgit_check_cache() would truncate valid cachefiles in the following situation: 1) process A notices a missing/expired cachefile 2) process B gets scheduled, locks, fills and unlocks the cachefile 3) process A gets scheduled, locks the cachefile, notices that the cachefile now exist/is not expired anymore, and continues to overwrite it with an empty lockfile. Thanks to Linus for noticing (again). Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Avoid infinite loops in caching layer	Lars Hjemli	2006-12-11	1	-13/+22
\| \| \| \| \| \| \| \| \| \| \|	Add a global variable, cgit_max_lock_attemps, to avoid the possibility of infinite loops when failing to acquire a lockfile. This could happen on broken setups or under crazy server load. Incidentally, this also fixes a lurking bug in cache_lock() where an uninitialized returnvalue was used. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Fix cache algorithm loophole	Lars Hjemli	2006-12-11	1	-1/+5
\| \| \| \| \| \| \| \|	This closes the door for unneccessary calls to cgit_fill_cache(). Noticed by Linus. Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Add license file and copyright notices	Lars Hjemli	2006-12-10	1	-0/+8
\| \| \| \|	Signed-off-by: Lars Hjemli <hjemli@gmail.com>
*	Add caching infrastructure	Lars Hjemli	2006-12-10	1	-0/+86
	This enables internal caching of page output. Page requests are split into four groups: 1) repo listing (front page) 2) repo summary 3) repo pages w/symbolic references in query string 4) repo pages w/constant sha1's in query string Each group has a TTL specified in minutes. When a page is requested, a cached filename is stat(2)'ed and st_mtime is compared to time(2). If TTL has expired (or the file didn't exist), the cached file is regenerated. When generating a cached file, locking is used to avoid parallell processing of the request. If multiple processes tries to aquire the same lock, the ones who fail to get the lock serves the (expired) cached file. If the cached file don't exist, the process instead calls sched_yield(2) before restarting the request processing. Signed-off-by: Lars Hjemli <hjemli@gmail.com>