summaryrefslogtreecommitdiffstats
path: root/usr.bin/mandoc/chars.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* In terminal output, unify handling of Unicode and numbered characterschwarze2014-10-291-6/+4
| | | | | | | | | | | escape sequences just like it was earlier implemented for -Thtml. Do not let control characters other than ASCII 9 (horizontal tab) propagate to the output, even though groff allows them; but that really doesn't look like a great idea. Let mchars_num2char() return int such that we can distinguish invalid \N syntax from \N'0'. This also reduces the danger of signed char issues popping up.
* Make the character table available to libroff so it can check theschwarze2014-10-281-2/+2
| | | | | | | | validity of character escape names and warn about unknown ones. This requires mchars_spec2cp() to report unknown names again. Fortunately, that doesn't require changing the calling code because according to groff, invalid character escapes should not produce output anyway, and now that we warn about them, that's fine.
* Tighten Unicode escape name parsing.schwarze2014-10-281-9/+4
| | | | | Accept only 0xXXXX, 0xYXXXX, 0x10XXXX with Y != 0. This simplifies mchars_num2uc().
* Fix a regression in term.c rev. 1.89 reported by bentley@:schwarze2014-10-271-2/+2
| | | | | | | | | | In UTF-8 output, do not print anything if mchars_spec2cp() returns 0. In particular, this repairs handling of zero-width spaces (\&). While here, let mchars_spec2cp() return 0xFFFD instead of -1 if the character is not found, simplifying the using code. In HTML output, do not print obfuscated ASCII characters and do not test for one-char escapes, mchars_spec2cp() already does that.
* In -Tascii mode, provide approximations even for some Unicode escapeschwarze2014-10-261-1/+14
| | | | | | | | sequences above codepoint 512 by doing a reverse lookup in the existing mandoc_char(7) character table. Again, groff isn't smart enough to do this and silently discards such escape sequences without printing anything.
* Improve -Tascii output for Unicode escape sequences: For the first 512schwarze2014-10-261-16/+7
| | | | | | | | | | | | code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
* Security fix:schwarze2014-07-231-2/+13
| | | | | | | | | | After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
* KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,schwarze2014-04-201-9/+10
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* The files mandoc.c and mandoc.h contained both specialised low-levelschwarze2014-03-211-1/+2
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Implement the \: (optional line break) escape sequence,schwarze2014-01-221-2/+2
| | | | | | | documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
* Improve handling of the roff(7) "\t" escape sequence:schwarze2013-06-201-2/+2
| | | | | | | | | | | * Parsing macro arguments has to be done in copy mode, which implies replacing "\t" by a literal tab character. * Otherwise, render "\t" as the empty string, not as a 't' character. This fixes formatting of the distfile example in the oldrdist(1) manual. This also shows up in the unzip(1) manual as one of several issues preventing the removal of USE_GROFF from the archivers/unzip port. Thanks to espie@ for attracting my attention to the unzip(1) manual.
* Even though the size of a pointer should not depend on the type of theschwarze2013-05-181-2/+2
| | | | | | | data pointed to, pass the size of the right pointer type to calloc; cosmetic issue reported by Ulrich Spoerlein <uqs@spoerlein.net> found in Coverity Scan CID 978734. No binary change - ok cmp(1).
* mark some arguments "const" that will not be changed; from kristaps@schwarze2011-11-121-7/+10
|
* sync to version 1.11.5:schwarze2011-09-181-4/+4
| | | | | | | | adding an implementation of the eqn(7) language by kristaps@ So far, only .EQ/.EN blocks are handled, in-line equations are not, and rendering is not yet very pretty, but the parser is fairly complete.
* simplify: there's really no need for extra code to reorderschwarze2011-07-081-61/+8
| | | | | the hash chain or an extra function for checking matches; from kristaps@
* Fix two regressions introduced in 1.11.3:schwarze2011-05-291-3/+5
| | | | | | * Do not pass integers outside the ASCII range to isprint(). * Make sure escaped characters are really printed verbatim when the escape sequence has no special meaning.
* Merge release 1.11.3, almost all code by kristaps@:schwarze2011-05-291-85/+44
| | | | | | | | | * Unicode output support (no Unicode input yet, though). * Refactoring: completely handle predefined strings in roff.c. - New function mandoc_escape() replaces a2roffdeco() and mandoc_special(). - Start using mandoc_getarg() in mdoc_argv.c. - Clean up parsing of delimiters in mdoc(7). * And many minor fixes and lots of cleanup.
* Merge version 1.11.1:schwarze2011-04-241-3/+3
| | | | | | | | | | | | | | Again lots of cleanup and maintenance work by kristaps@. - simplify error reporting: less function pointers, more mandoc_[v]msg - main: split document parsing out of main.c into read.c - roff, mdoc, man: improved recognition of control characters - roff: better handling of if/else stack overflows - roff: add some predefined strings for backward compatibility - mdoc, man: empty sections are not errors - mdoc: move delimiter handling to libmdoc - some header restructuring and some minor features and fixes This merge causes two minor regressions that i will fix in separate commits right afterwards.
* Merge version 1.10.10:schwarze2011-04-211-12/+3
| | | | | | | | | | lots of cleanup and maintenance work by kristaps@. - move some main.c globals into struct curparse - move mandoc_*alloc to mandoc.h such that all code can use them - make mandoc_isdelim available to formatting frontends - dissolve mdoc_strings.c, move the code where it is used - make all error reporting functions void, their return values were useless - and various minor cleanups and fixes
* Implement the \N'number' (numbered character) roff escape sequence.schwarze2011-01-301-1/+23
| | | | | | | Don't use it in new manuals, it is inherently non-portable, but we need it for backward-compatibility with existing manuals, for example in Xenocara driver pages. ok kristaps@ matthieu@ jmc@
* Merge kristaps@' cleaner tbl integration, removing mine;schwarze2011-01-041-12/+12
| | | | there are still a few bugs, but fixing these will be easier in tree.
* remove remaining pod2man escapes, mandoc now uses the standard preamble;schwarze2010-09-201-4/+4
| | | | from kristaps@
* Parse and ignore the \k, \o, \w, and \z roff escapes, and recursivelyschwarze2010-09-131-2/+2
| | | | | | | | | | | | | | | ignore embedded escapes and mathematical roff subexpressions. In roff copy mode, resolve "\\" to '\'. Allow ".xx\}" where xx is a macro to close roff conditional scope. Mandoc now handles the special character definitions in the pod2man(1) preamble, so remove the explicit redefinitions in chars.c/chars.in. From kristaps@. I have checked that this causes no relevant change to the Perl manuals. The only change introduced is that some non-ASCII characters rendered incorrectly before are now rendered incorrectly in a different way. For example, e accent aigu was "e", now is "e'" and c cedille was "c", now is "c,".
* Implement a simple, consistent user interface for error handling.schwarze2010-08-201-3/+3
| | | | | | | | | | | | | | | | | We now have sufficient practical experience to know what we want, so this is intended to be final: - provide -Wlevel (warning, error or fatal) to select what you care about - provide -Wstop to stop after parsing a file with warnings you care about - provide consistent exit status codes for those warnings you care about - fully document what warnings, errors and fatal errors mean - remove all other cruft from the user interface, less is more: - remove all -f knobs along with the whole -f option - remove the old -Werror because calling warnings "fatal" is silly - always finish parsing each file, unless fatal errors prevent that This commit also includes a couple of related simplifications behind the scenes regarding error handling. Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and Sascha Wildner (DragonFly BSD) agree with the general direction.
* Remove the standard pod2man \*(C+ pre-predefined string ("C++").schwarze2010-08-181-2/+2
| | | | | | It is always defined in the preamble using .ds when used in manuals. Since we now support .ds, it is no longer necessary to provide it. Triggered by a bug report from Thomas Jeunet, patch by kristaps@.
* Merge bsd.lv version 1.10.5: last larger batch of bug fixes before release.schwarze2010-07-311-10/+9
| | | | | | | | | | | | | NOT including Kristaps' .Bd -literal changes which cause regressions. Features: * -Tpdf now fully working Bugfixes: * proper handling of quoted strings by .ds in roff(7) * allow empty .Dd * make .Sm start no-spacing after the first output word * underline .Ad * minor fixes in -Thtml and some optimisations in terminal output.
* Sync to bsd.lv; in particular, pull in lots of bug fixes.schwarze2010-07-251-28/+69
| | | | | | | | | | | | | | | | | | | | | new features: * support the .in macro in man(7) * support minimal PDF output * support .Sm in mdoc(7) HTML output * support .Vb and .nf in man(7) HTML output * complete the mdoc(7) manual bug fixes: * do not let mdoc(7) .Pp produce a newline before/after .Sh; reported by jmc@ * avoid double blank lines related to man(7) .sp and .br * let man(7) .nf and .fi flush the line; reported by jsg@ and naddy@ * let "\ " produce a non-breaking space; reported by deraadt@ * discard \m colour escape sequences; reported by J.C. Roberts * map undefined 1-character-escapes to the literal character itself maintenance: * express mdoc(7) arguments in terms of an enum for additional type-safety * simplify mandoc_special() and a2roffdeco() * use strcspn in term_word() in place of a manual loop * minor optimisations in the -Tps and -Thtml formatting frontends
* Merge bsd.lv version 1.10.1 (to be released soon).schwarze2010-06-061-2/+2
| | | | | | | | | | | | | | | | | | | | The main step forward is that this now has *much* better .Bl -column support, now supporting many manuals that previously errored out without producing any output. Other fixes include: * do not die from multiple list types, use the first and warn * in .Bl without a type, default to -item * various tweaks to .Dt * fix .In, .Fd, .Ft, .Fn and .Fo formatting * some documentation fixes and additions * and fix a couple of bugs reported by Ulrich Spoerlein: * better support for roff block-end "\}" without a preceding dot * .In must not break the line outside SYNOPSIS * spelling in some error messages While merging, fix one regression in .In spacing that needs to go to bsd.lv, too.
* When a word does not fully fit onto the output line, but it containsschwarze2010-05-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | at least one hyphen, we already had support for breaking the line a the last fitting hyphen. This patch improves this functionality by only breaking at hyphens in free-form text, and by not breaking at hyphens * at the beginning or end of a word or * immediately preceded or followed by another hyphen or * escaped by a preceding backslash. Before this patch, differences in break-at-hyphen support were one of the major sources of noise in automatic comparisons to mdoc(7) groff output. Now, the remaining differences are hard to find among the noise coming from other sources. Where there are still differences, what we do seems to be better than what groff does, see e.g. the chio(1) exchange and position commands for one of the now rare examples. idea and coding by kristaps@ Besides, this was the last substantial code difference left between bsd.lv and openbsd.org. We are now in full sync.
* merge 1.9.17, keeping local patchesschwarze2010-03-261-2/+2
| | | | | | | * much improved pod2man support and low-level roff robustness * have -Tlint imply -Wall and -fstrict * use fewer macros and more enum in libman * and various bug fixes
* sync to release 1.9.15:schwarze2010-02-181-14/+2
| | | | | | | | | * corrected .Vt handling (spotted by Joerg Sonnenberger) * corrected .Xr argument handling (based on my patch) * removed \\ escape sequence (because it is for low-level roff only) * warn about trailing whitespace (suggested by jmc@) * -Txhtml support * and some general cleanup and doc improvements
* sync to 1.9.14: rewrite escape sequence handling:schwarze2009-12-241-4/+5
| | | | | | | | | - new function a2roffdeco - font modes (\f) only affect the current stack point - implement scaling (\s) - implement space suppression (\c) - implement non-breaking space (\~) in -Tascii - many manual improvements
* sync to 1.9.12, mostly portability and refactoring:schwarze2009-12-221-7/+12
| | | | | | | | | | | | | | | | | | | correctness/functionality: - bugfix: do not die when overstep hits the right margin - new option: -fign-escape - and various HTML features portability: - replace bzero(3) by memset(3), which is ANSI C - replace err(3)/warn(3) by perror(3)/exit(3), which is ANSI C - iuse argv[0] instead of __progname - add time.h to various files for FreeBSD compilation simplicity: - do not allocate header/footer data dynamically in *_term.c - provide and use malloc frontends that error out on failure for full changelogs, see http://bsd.lv/cgi-bin/cvsweb.cgi/
* sync to 1.9.6: here is the sync of special characters to new groffschwarze2009-10-191-2/+2
| | | | as mentioned in the preceding manual commit (oops)
* sync to 1.9.5: partial rewrite of special character and predefined stringschwarze2009-10-191-0/+204
tables and the supporting infrastructure, mostly in preparation for HTML output support