summaryrefslogtreecommitdiffstats
path: root/usr.bin/mandoc/preconv.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Rewrite the low-level UTF-8 parser from scratch.schwarze2014-12-191-76/+59
| | | | | | | | | | | | | | It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf, 0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode escape sequences from them, and the algorithm contained strong defenses against any attempt to fix it. This cures an assertion failure in the terminal formatter caused by sneaking in ASCII 0x08 (backspace) by "encoding" it as an (invalid) multibyte UTF-8 sequence, found by jsg@ with afl. As a bonus, the new algorithm also reduces the code in the function by about 20%.
* Add some missing OpenBSD RCS markersschwarze2014-11-281-2/+1
| | | | and a few missing <sys/types.h> inclusions; no code change.
* Remove needless and harmful byte swapping on big endian architectures.schwarze2014-11-141-22/+2
| | | | | Problem found and patch provided by Martin Natano at bitrig, thanks! Tested on macppc by natano@ and on i386, amd64, and sparc64 myself.
* Refactor, no functional change: Remove the parse point from struct buf.schwarze2014-11-011-11/+12
| | | | | | Some functions need multiple parse points, some none at all, and it varies whether any of them need to be passed around. So better pass them as a separate argument, and only when needed.
* support UTF-8 and ISO-8859-1 input by integrating modified partsschwarze2014-10-301-0/+211
of kristaps@' version of the preconv(1) utility into mandoc(1); positive feedback from bentley@ and no concern raised when shown on tech@