summaryrefslogtreecommitdiffstats
path: root/usr.bin/mandoc/html.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Rudimentary implementation of the roff(7) \o escape sequence (overstrike).schwarze2015-01-211-2/+9
| | | | | | This is of some relevance because the pod2man(1) preamble abuses it for the icelandic letter Thorn, instead of simply using \(TP and \(Tp. Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
* resolve some code duplication; no functional changeschwarze2014-12-201-27/+19
|
* Fix the implementation and documentation of \c (continue text input line).schwarze2014-12-021-2/+3
| | | | | In particular, make it work in no-fill mode, too. Reminded by Carsten dot Kunze at arcor dot de (Heirloom roff).
* The header libmandoc.h is part of the internal parser interface,schwarze2014-12-011-2/+1
| | | | | | but html.c is not part of the parser at all, so it cannot include that header, and actually, it doesn't need it. Found while auditing includes after Theo's recent *.h commit.
* In terminal output, unify handling of Unicode and numbered characterschwarze2014-10-291-4/+7
| | | | | | | | | | | escape sequences just like it was earlier implemented for -Thtml. Do not let control characters other than ASCII 9 (horizontal tab) propagate to the output, even though groff allows them; but that really doesn't look like a great idea. Let mchars_num2char() return int such that we can distinguish invalid \N syntax from \N'0'. This also reduces the danger of signed char issues popping up.
* Make the character table available to libroff so it can check theschwarze2014-10-281-22/+4
| | | | | | | | validity of character escape names and warn about unknown ones. This requires mchars_spec2cp() to report unknown names again. Fortunately, that doesn't require changing the calling code because according to groff, invalid character escapes should not produce output anyway, and now that we warn about them, that's fine.
* Handle output encoding for unicode, numbered and named escape sequencesschwarze2014-10-271-23/+11
| | | | | | | | in one common, safe way instead of three different ways. In particular, * skip NUL, it is used to mean "no output desired" * deny 0x01-0x1F and 0x7F-0x9F, print REPLACEMENT CHARACTER instead * print 0x20-0x7E literally or name-encoded, as required * print characters above 0x9F numerically
* Fix a regression in term.c rev. 1.89 reported by bentley@:schwarze2014-10-271-5/+6
| | | | | | | | | | In UTF-8 output, do not print anything if mchars_spec2cp() returns 0. In particular, this repairs handling of zero-width spaces (\&). While here, let mchars_spec2cp() return 0xFFFD instead of -1 if the character is not found, simplifying the using code. In HTML output, do not print obfuscated ASCII characters and do not test for one-char escapes, mchars_spec2cp() already does that.
* Improve -Tascii output for Unicode escape sequences: For the first 512schwarze2014-10-261-3/+13
| | | | | | | | | | | | code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
* Add missing */ after $OpenBSD$ tagchl2014-10-131-1/+1
| | | | ok schwarze@
* Partial eqn(7) rewrite by kristaps@ in order to get operator precedence right.schwarze2014-10-101-1/+2
|
* parse and render "from" and "to" clauses in eqn, and render matrices;schwarze2014-10-091-1/+4
| | | | written by kristaps@ during EuroBSDCon
* initial bits of MathML rendering for eqn(7) -Thtml;schwarze2014-10-091-1/+16
| | | | written by kristaps@ during EuroBSDCon
* Switch HTML output to polyglot HTML5; have only one single -Thml mode.schwarze2014-10-071-58/+38
| | | | | | | | Replace hard-coded widths and alignments with a minimal embedded stylesheet. Do not use <p> because it cannot appear inside block macros. Remove the "summary" attribute because it is not HTML5. Written by kristaps@ some months ago, finished during EuroBSDCon.
* Revert previous, as requested by kristaps@.schwarze2014-08-141-2/+1
| | | | | | | | | | | | | The .Bf block can contain subblocks, so it has to render as an element that can contain flow content. But <em> cannot contain flow content, only phrasing content. Rendering .Em and .Bf differently would by unfortunate, and closing out .Bf before subblocks and re-opening it afterwards would merely complicate both the C code of the program and the generated HTML code. Besides, converting .Em to semantic HTML markup would require some content to be put into <em> and some into <i>, but we cannot automatically distinguish which is which, so strictly speaking, we can't use semantic HTML here but have to fall back to physical markup. Wonders of HTML...
* Begin cleanup of scaling units.schwarze2014-08-131-1/+3
| | | | | | | | | | | | Note that we use 240u := 1i for all devices, even -Tps and -Tpdf. Big fix of -Tascii rendering of f, m, and u. Small fix of -Tascii rendering of c. Big fix of -Thtml rendering of u. Big fix of -Tps rendering of m, p, and u. Clarify -Tps rendering of c. Correct documentation of scaling units, in particular with respect to u. This for example improves rendering of the OpenGL manuals. Joint work with kristaps@.
* Use <em> for .Em and .Bf -emphasis.schwarze2014-08-131-1/+2
| | | | | | | | | | | | | | | | | The vast majority of .Em in real-world manuals is stress emphasis, for which <em> is the correct markup. Admittedly, there are some instances of .Em usage for alternate quality, for which <i> would be a better match. Most of these are technical terms that neither allow semantic markup nor are keywords - for the latter, .Sy would be preferable. A typical example is that the shell breaks input into .Em words . Alternate voice or mood, which would also require <i>, is almost absent from manuals. We cannot satisfy both stress emphasis and alternate quality, so pick the one that fits more often and looks less wrong when off. Patch from Guy Harris <guy at alum dot mit dot edu>. ok bentley@ joerg@NetBSD
* Security fix:schwarze2014-07-231-27/+38
| | | | | | | | | | After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
* Security fix:schwarze2014-07-221-2/+5
| | | | | | | | | | The function print_encode() is used both for plain text and for quoted attribute values. Escape the '"' character such that malicious manuals cannot pull off XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe others) to trigger the latter case. In the former case, escaping does no harm. Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
* Audit strlcpy(3)/strlcat(3) usage.schwarze2014-04-231-1/+7
| | | | | | | | | | | | | * Repair three instances of silent truncation, use asprintf(3). * Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+... to use asprintf(3) instead to make them less error prone. * Cast the return value of four instances where the destination buffer is known to be large enough to (void). * Completely remove three useless instances of strlcpy(3)/strlcat(3). * Mark two places in -Thtml with XXX that can cause information loss and crashes but are not easy to fix, requiring design changes of some internal interfaces. * The file mandocdb.c remains to be audited.
* KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,schwarze2014-04-201-67/+59
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* The files mandoc.c and mandoc.h contained both specialised low-levelschwarze2014-03-211-1/+2
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Implement the \: (optional line break) escape sequence,schwarze2014-01-221-3/+9
| | | | | | | documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
* Fix one case where a non-literal is used as format string.schwarze2014-01-051-2/+2
| | | | | Fix another case where a variable is formatted using the wrong type. Patch from Joerg Sonnenberger <joerg@NetBSD>.
* Implement the roff(7) font-escape sequence \f(BI "bold+italic".schwarze2013-08-081-10/+35
| | | | | This improves the formatting of about 40 base manuals and reduces groff-mandoc formatting differences in base by about 5%.
* Implement the roff \z escape sequence, intended to output the nextschwarze2012-05-281-25/+58
| | | | | | | | | | | | | | | | character without advancing the cursor position; implement it to simply skip the next character, as it will usually be overwritten. With this change, the pod2man(1) preamble user-defined string \*:, intended to render as a diaeresis or umlaut diacritic above the preceding character, is rendered in a slightly less ugly way, though still not correctly. It was rendered as "z.." and is now rendered as ".". Given that the definition of \*: uses elaborate manual \h positioning, there is little chance for mandoc(1) to ever render it correctly, but at least we can refrain from printing out a spurious "z", and we can make the \z do something semi-reasonable for easier cases.
* Sync to version 1.12.0; all code by kristaps@:schwarze2011-10-091-3/+7
| | | | | | | | Implement .Rv in -Tman. Let -man -Tman work a bit like cat(1). Add the -Ofragment option to -T[x]html. Minor fixes in -T[x]html. Lots of apropos(1) and -Tman code cleanup.
* clean up .HP, .IP, .TP, .nf, and \c handling in -T[x]html;schwarze2011-07-081-2/+4
| | | | from kristaps@
* Sync to bsd.lv (all coded by kristaps@):schwarze2011-07-051-2/+1
| | | | | | | | | | - mdoc(7): fix an assertion if the first line after .Bd -column starts with a blank, and some simplifications in mdoc_argv.c - man(7): literal mode ends at .SH and .SS (bug reported by naddy@) - allow .RS/.RE blocks to nest (bug reported by dcoppa@ and gsoares@) - improve vertical spacing of man(7) blocks - roff(7): clear user-defined strings when starting a new file - correct ID tags in -T[x]html
* Merge release 1.11.3, almost all code by kristaps@:schwarze2011-05-291-212/+129
| | | | | | | | | * Unicode output support (no Unicode input yet, though). * Refactoring: completely handle predefined strings in roff.c. - New function mandoc_escape() replaces a2roffdeco() and mandoc_special(). - Start using mandoc_getarg() in mdoc_argv.c. - Clean up parsing of delimiters in mdoc(7). * And many minor fixes and lots of cleanup.
* Merge version 1.11.1:schwarze2011-04-241-10/+1
| | | | | | | | | | | | | | Again lots of cleanup and maintenance work by kristaps@. - simplify error reporting: less function pointers, more mandoc_[v]msg - main: split document parsing out of main.c into read.c - roff, mdoc, man: improved recognition of control characters - roff: better handling of if/else stack overflows - roff: add some predefined strings for backward compatibility - mdoc, man: empty sections are not errors - mdoc: move delimiter handling to libmdoc - some header restructuring and some minor features and fixes This merge causes two minor regressions that i will fix in separate commits right afterwards.
* Merge version 1.10.10:schwarze2011-04-211-50/+9
| | | | | | | | | | lots of cleanup and maintenance work by kristaps@. - move some main.c globals into struct curparse - move mandoc_*alloc to mandoc.h such that all code can use them - make mandoc_isdelim available to formatting frontends - dissolve mdoc_strings.c, move the code where it is used - make all error reporting functions void, their return values were useless - and various minor cleanups and fixes
* Implement the \N'number' (numbered character) roff escape sequence.schwarze2011-01-301-1/+17
| | | | | | | Don't use it in new manuals, it is inherently non-portable, but we need it for backward-compatibility with existing manuals, for example in Xenocara driver pages. ok kristaps@ matthieu@ jmc@
* Merge from bsd.lv, original commit message by kristaps@:schwarze2011-01-161-2/+15
| | | | | | | | | | Change how -Thtml behaves with tables: use multiple rows, with widths set by COL, until an external macro is encountered. At this point in time, close out the table and process the macro. When the first table row is again re-encountered, re-start the table. This requires a bit of tracking added to "struct html", but the change is very small and follows the logic of meta-fonts. This all follows a bug-report by joerg@.
* In case an ID attribute is written in pieces, only protect the firstschwarze2010-12-271-10/+14
| | | | | | | | piece with a prepended 'x', not each piece, such that quoted and unquoted .Sh, .Ss, and .Sx arguments are compatible with each other. Fixing a bug reported by Nicolas Joly <njoly at NetBSD dot org>, avoiding a regression in my first patch as pointed out by njoly as well. "feel free to do so" kristaps@
* Yet another batch of -Thtml polishing from kristaps@:schwarze2010-12-251-29/+27
| | | | | | | | In particular, use <SMALL> for .SM and <CODE> for .Dl. Use <B> for bold and <I> for italic in general. Also call this mandoc 1.10.8 now, as it is functionally equivalent, even though one one set of refactoring patches has not been merged yet because it conflicts with our tbl(1) handling.
* More small -Thtml improvements by kristaps@,schwarze2010-12-221-15/+17
| | | | | | | | in particular, use <B>, <I> and <U> where appropriate. Provide relative widths for header and footer lines. Manuals: More concise short descriptions of output modes. Correct a few places still talking about CSS2 to say CSS1. Code examples should use .Dl, not .D1.
* Significant improvements to -Thtml by kristaps@:schwarze2010-12-191-8/+12
| | | | | | Use less <DIV>, use more <H1>, <H2>, <P>, <BR>, <PRE>, <UL>, <OL>, <DL> etc. Triggered by input from Will Backman. Remove CSS2 note in mandoc.1, which is no longer true.
* * need a space before .No even if it starts with a closing delimiterschwarze2010-10-011-1/+3
| | | | | | | * slightly simplify .Pf *_IGNDELIM code, and share part of it with .No * do not let opening delimiters fall out of the front of .Ns (from kristaps@) This fixes a few spacing issues in csh(1) and ksh(1). OK kristaps@
* Merge the last bits of 1.10.6 (released today), most were already in:schwarze2010-09-271-3/+3
| | | | | | | | | | | | * ignore double-.Pp * ignore .Pp before .Bd and .Bl (unless -compact in specified) * avoid double blank line upon .Pp, .br and friends in literal context * cast enums to int when passing them to exit(3) to please lint(1) While merging, fix a regression introduced by kristaps@: Outside literal mode, double blank lines must both be printed. To achieve this again after kristaps@ improvements in 1.10.6, treat such blank lines as .sp (instead of .Pp as in 1.10.5) and drop .Pp before .sp just like dropping .Pp before .Pp.
* Implement a simple, consistent user interface for error handling.schwarze2010-08-201-3/+3
| | | | | | | | | | | | | | | | | We now have sufficient practical experience to know what we want, so this is intended to be final: - provide -Wlevel (warning, error or fatal) to select what you care about - provide -Wstop to stop after parsing a file with warnings you care about - provide consistent exit status codes for those warnings you care about - fully document what warnings, errors and fatal errors mean - remove all other cruft from the user interface, less is more: - remove all -f knobs along with the whole -f option - remove the old -Werror because calling warnings "fatal" is silly - always finish parsing each file, unless fatal errors prevent that This commit also includes a couple of related simplifications behind the scenes regarding error handling. Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and Sascha Wildner (DragonFly BSD) agree with the general direction.
* Merge bsd.lv version 1.10.5: last larger batch of bug fixes before release.schwarze2010-07-311-1/+3
| | | | | | | | | | | | | NOT including Kristaps' .Bd -literal changes which cause regressions. Features: * -Tpdf now fully working Bugfixes: * proper handling of quoted strings by .ds in roff(7) * allow empty .Dd * make .Sm start no-spacing after the first output word * underline .Ad * minor fixes in -Thtml and some optimisations in terminal output.
* Sync to bsd.lv; in particular, pull in lots of bug fixes.schwarze2010-07-251-13/+31
| | | | | | | | | | | | | | | | | | | | | new features: * support the .in macro in man(7) * support minimal PDF output * support .Sm in mdoc(7) HTML output * support .Vb and .nf in man(7) HTML output * complete the mdoc(7) manual bug fixes: * do not let mdoc(7) .Pp produce a newline before/after .Sh; reported by jmc@ * avoid double blank lines related to man(7) .sp and .br * let man(7) .nf and .fi flush the line; reported by jsg@ and naddy@ * let "\ " produce a non-breaking space; reported by deraadt@ * discard \m colour escape sequences; reported by J.C. Roberts * map undefined 1-character-escapes to the literal character itself maintenance: * express mdoc(7) arguments in terms of an enum for additional type-safety * simplify mandoc_special() and a2roffdeco() * use strcspn in term_word() in place of a manual loop * minor optimisations in the -Tps and -Thtml formatting frontends
* Merge release 1.10.4 (all code by kristaps@), providing four new features:schwarze2010-07-131-13/+27
| | | | | | | | | | 1) Proper .Bk support: allow output line breaks at input line breaks, but keep input lines together in the output, finally fixing synopses like aucat(1), mail(1) and tmux(1). 2) Mostly finished -Tps (PostScript) output. 3) Implement -Thtml output for .Nm blocks and .Bk -words. 4) Allow iterative interpolation of user-defined roff(7) strings. Also contains some minor bugfixes and some performance improvements.
* Remove "pt" from struct roffsu, as CSS (the only reason it was there) isschwarze2010-06-271-6/+6
| | | | | | unclear about which units accept floats/integers, which leads me to assume that it handles either and rounds as appropriate. from kristaps@
* Merge more bits that will be going into 1.10.1:schwarze2010-06-081-3/+1
| | | | | | | | | | | | Clean up vertical spacing in the SYNOPSIS, making the code much more systematic; this doesn't solve all SYNOPSIS problems yet, in particular not those related to keeps, indentation and the low-level .nr roff instruction, but it's a nice step forward and i couldn't find relevant regressions. (from kristaps) Besides, * make the output width configurable (default: -Owidth=80) (kristaps) * use mmap with MAP_SHARED (from Joerg Sonnenberger)
* When a word does not fully fit onto the output line, but it containsschwarze2010-05-261-16/+15
| | | | | | | | | | | | | | | | | | | | | | | at least one hyphen, we already had support for breaking the line a the last fitting hyphen. This patch improves this functionality by only breaking at hyphens in free-form text, and by not breaking at hyphens * at the beginning or end of a word or * immediately preceded or followed by another hyphen or * escaped by a preceding backslash. Before this patch, differences in break-at-hyphen support were one of the major sources of noise in automatic comparisons to mdoc(7) groff output. Now, the remaining differences are hard to find among the noise coming from other sources. Where there are still differences, what we do seems to be better than what groff does, see e.g. the chio(1) exchange and position commands for one of the now rare examples. idea and coding by kristaps@ Besides, this was the last substantial code difference left between bsd.lv and openbsd.org. We are now in full sync.
* merge 1.9.24, keeping local patches; some changes:schwarze2010-05-141-2/+6
| | | | | | | | | * preserve multiple consecutive space characters in input * do not restrict .Cd and .Rv to certain sections (requested by Joerg) * do not run lookup() on quoted words * enum return types for mdoc_args and mdoc_argv * fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel) * various lint and manual fixes
* Merge the good parts of 1.9.23,schwarze2010-04-071-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | avoid the bad parts of 1.9.23, and keep local patches. Input in general: * Basic handling of roff-style font escapes \f, \F. * Quoted punctuation does not count as punctuation. mdoc(7) parser: * Make .Pf callable; noted by Claus Assmann. * Let .Bd and .Bl ignore unknown arguments; noted by deraadt@. * Do not warn when .Er is used outside certain sections. * Replace mdoc_node_free[list] by mdoc_node_delete. * Replace #define by enum for rew*() return values. man(7) parser: * When .TH is missing, use default section and date. Output in general: * Curly braces do not count as punctuation. * No space after .Fl w/o args when a macro follows on the same line. HTML output: * Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT(). * Print whitespace after, not before .Vt .Fn .Ft .Fo. Checked that all manuals in base still build.
* sync to release 1.9.15:schwarze2010-02-181-20/+112
| | | | | | | | | * corrected .Vt handling (spotted by Joerg Sonnenberger) * corrected .Xr argument handling (based on my patch) * removed \\ escape sequence (because it is for low-level roff only) * warn about trailing whitespace (suggested by jmc@) * -Txhtml support * and some general cleanup and doc improvements