| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This is of some relevance because the pod2man(1) preamble abuses it
for the icelandic letter Thorn, instead of simply using \(TP and \(Tp.
Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
|
| |
|
|
|
|
|
| |
In particular, make it work in no-fill mode, too.
Reminded by Carsten dot Kunze at arcor dot de (Heirloom roff).
|
|
|
|
|
|
| |
but html.c is not part of the parser at all, so it cannot include
that header, and actually, it doesn't need it.
Found while auditing includes after Theo's recent *.h commit.
|
|
|
|
|
|
|
|
|
|
|
| |
escape sequences just like it was earlier implemented for -Thtml.
Do not let control characters other than ASCII 9 (horizontal tab)
propagate to the output, even though groff allows them; but that
really doesn't look like a great idea.
Let mchars_num2char() return int such that we can distinguish invalid \N
syntax from \N'0'. This also reduces the danger of signed char issues
popping up.
|
|
|
|
|
|
|
|
| |
validity of character escape names and warn about unknown ones.
This requires mchars_spec2cp() to report unknown names again.
Fortunately, that doesn't require changing the calling code because
according to groff, invalid character escapes should not produce
output anyway, and now that we warn about them, that's fine.
|
|
|
|
|
|
|
|
| |
in one common, safe way instead of three different ways. In particular,
* skip NUL, it is used to mean "no output desired"
* deny 0x01-0x1F and 0x7F-0x9F, print REPLACEMENT CHARACTER instead
* print 0x20-0x7E literally or name-encoded, as required
* print characters above 0x9F numerically
|
|
|
|
|
|
|
|
|
|
| |
In UTF-8 output, do not print anything if mchars_spec2cp() returns 0.
In particular, this repairs handling of zero-width spaces (\&).
While here, let mchars_spec2cp() return 0xFFFD instead of -1
if the character is not found, simplifying the using code.
In HTML output, do not print obfuscated ASCII characters and
do not test for one-char escapes, mchars_spec2cp() already does that.
|
|
|
|
|
|
|
|
|
|
|
|
| |
code points, provide ASCII approximations. This is already much better
than what groff does, which prints nothing for most code points.
A few minor fixes while here:
* Handle Unicode escape sequences in the ASCII range.
* In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8
and the string "<?>" for -Tascii output.
* Handle all one-character escape sequences in mchars_spec2{cp,str}()
and remove the workarounds on the higher level.
|
|
|
|
| |
ok schwarze@
|
| |
|
|
|
|
| |
written by kristaps@ during EuroBSDCon
|
|
|
|
| |
written by kristaps@ during EuroBSDCon
|
|
|
|
|
|
|
|
| |
Replace hard-coded widths and alignments with a minimal embedded stylesheet.
Do not use <p> because it cannot appear inside block macros.
Remove the "summary" attribute because it is not HTML5.
Written by kristaps@ some months ago, finished during EuroBSDCon.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The .Bf block can contain subblocks, so it has to render as an
element that can contain flow content. But <em> cannot contain
flow content, only phrasing content. Rendering .Em and .Bf differently
would by unfortunate, and closing out .Bf before subblocks and
re-opening it afterwards would merely complicate both the C code
of the program and the generated HTML code. Besides, converting
.Em to semantic HTML markup would require some content to be put
into <em> and some into <i>, but we cannot automatically distinguish
which is which, so strictly speaking, we can't use semantic HTML
here but have to fall back to physical markup. Wonders of HTML...
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that we use 240u := 1i for all devices, even -Tps and -Tpdf.
Big fix of -Tascii rendering of f, m, and u.
Small fix of -Tascii rendering of c.
Big fix of -Thtml rendering of u.
Big fix of -Tps rendering of m, p, and u.
Clarify -Tps rendering of c.
Correct documentation of scaling units, in particular with respect to u.
This for example improves rendering of the OpenGL manuals.
Joint work with kristaps@.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vast majority of .Em in real-world manuals is stress emphasis,
for which <em> is the correct markup. Admittedly, there are some
instances of .Em usage for alternate quality, for which <i> would
be a better match. Most of these are technical terms that neither
allow semantic markup nor are keywords - for the latter, .Sy would
be preferable. A typical example is that the shell breaks input into
.Em words .
Alternate voice or mood, which would also require <i>, is almost
absent from manuals.
We cannot satisfy both stress emphasis and alternate quality, so
pick the one that fits more often and looks less wrong when off.
Patch from Guy Harris <guy at alum dot mit dot edu>.
ok bentley@ joerg@NetBSD
|
|
|
|
|
|
|
|
|
|
| |
After decoding numeric (\N) and one-character (\<, \> etc.)
character escape sequences, do not forget to HTML-encode the
resulting ASCII character. Malicious manuals were able to smuggle
XSS content by roff-escaping the HTML-special characters they need.
That's a classic bug type in many web applications, actually... :-(
Found myself while auditing the HTML formatter for safe output handling.
|
|
|
|
|
|
|
|
|
|
| |
The function print_encode() is used both for plain text
and for quoted attribute values.
Escape the '"' character such that malicious manuals cannot pull off
XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe
others) to trigger the latter case.
In the former case, escaping does no harm.
Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Repair three instances of silent truncation, use asprintf(3).
* Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+...
to use asprintf(3) instead to make them less error prone.
* Cast the return value of four instances where the destination
buffer is known to be large enough to (void).
* Completely remove three useless instances of strlcpy(3)/strlcat(3).
* Mark two places in -Thtml with XXX that can cause information loss
and crashes but are not easy to fix, requiring design changes of
some internal interfaces.
* The file mandocdb.c remains to be audited.
|
|
|
|
|
| |
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change
|
|
|
|
|
|
|
| |
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.
|
|
|
|
|
|
|
| |
documented in the Ossanna-Kernighan-Ritter troff manual
and also supported by groff.
Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
|
|
|
|
|
| |
Fix another case where a variable is formatted using the wrong type.
Patch from Joerg Sonnenberger <joerg@NetBSD>.
|
|
|
|
|
| |
This improves the formatting of about 40 base manuals
and reduces groff-mandoc formatting differences in base by about 5%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
character without advancing the cursor position; implement it to
simply skip the next character, as it will usually be overwritten.
With this change, the pod2man(1) preamble user-defined string \*:,
intended to render as a diaeresis or umlaut diacritic above the
preceding character, is rendered in a slightly less ugly way,
though still not correctly. It was rendered as "z.." and is now
rendered as ".".
Given that the definition of \*: uses elaborate manual \h positioning,
there is little chance for mandoc(1) to ever render it correctly,
but at least we can refrain from printing out a spurious "z", and
we can make the \z do something semi-reasonable for easier cases.
|
|
|
|
|
|
|
|
| |
Implement .Rv in -Tman.
Let -man -Tman work a bit like cat(1).
Add the -Ofragment option to -T[x]html.
Minor fixes in -T[x]html.
Lots of apropos(1) and -Tman code cleanup.
|
|
|
|
| |
from kristaps@
|
|
|
|
|
|
|
|
|
|
| |
- mdoc(7): fix an assertion if the first line after .Bd -column
starts with a blank, and some simplifications in mdoc_argv.c
- man(7): literal mode ends at .SH and .SS (bug reported by naddy@)
- allow .RS/.RE blocks to nest (bug reported by dcoppa@ and gsoares@)
- improve vertical spacing of man(7) blocks
- roff(7): clear user-defined strings when starting a new file
- correct ID tags in -T[x]html
|
|
|
|
|
|
|
|
|
| |
* Unicode output support (no Unicode input yet, though).
* Refactoring: completely handle predefined strings in roff.c.
- New function mandoc_escape() replaces a2roffdeco() and mandoc_special().
- Start using mandoc_getarg() in mdoc_argv.c.
- Clean up parsing of delimiters in mdoc(7).
* And many minor fixes and lots of cleanup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Again lots of cleanup and maintenance work by kristaps@.
- simplify error reporting: less function pointers, more mandoc_[v]msg
- main: split document parsing out of main.c into read.c
- roff, mdoc, man: improved recognition of control characters
- roff: better handling of if/else stack overflows
- roff: add some predefined strings for backward compatibility
- mdoc, man: empty sections are not errors
- mdoc: move delimiter handling to libmdoc
- some header restructuring and some minor features and fixes
This merge causes two minor regressions
that i will fix in separate commits right afterwards.
|
|
|
|
|
|
|
|
|
|
| |
lots of cleanup and maintenance work by kristaps@.
- move some main.c globals into struct curparse
- move mandoc_*alloc to mandoc.h such that all code can use them
- make mandoc_isdelim available to formatting frontends
- dissolve mdoc_strings.c, move the code where it is used
- make all error reporting functions void, their return values were useless
- and various minor cleanups and fixes
|
|
|
|
|
|
|
| |
Don't use it in new manuals, it is inherently non-portable, but we
need it for backward-compatibility with existing manuals, for example
in Xenocara driver pages.
ok kristaps@ matthieu@ jmc@
|
|
|
|
|
|
|
|
|
|
| |
Change how -Thtml behaves with tables: use multiple rows, with widths
set by COL, until an external macro is encountered. At this point in
time, close out the table and process the macro. When the first table
row is again re-encountered, re-start the table. This requires a bit of
tracking added to "struct html", but the change is very small and
follows the logic of meta-fonts. This all follows a bug-report by
joerg@.
|
|
|
|
|
|
|
|
| |
piece with a prepended 'x', not each piece, such that quoted and
unquoted .Sh, .Ss, and .Sx arguments are compatible with each other.
Fixing a bug reported by Nicolas Joly <njoly at NetBSD dot org>,
avoiding a regression in my first patch as pointed out by njoly as well.
"feel free to do so" kristaps@
|
|
|
|
|
|
|
|
| |
In particular, use <SMALL> for .SM and <CODE> for .Dl.
Use <B> for bold and <I> for italic in general.
Also call this mandoc 1.10.8 now, as it is functionally equivalent,
even though one one set of refactoring patches has not been merged
yet because it conflicts with our tbl(1) handling.
|
|
|
|
|
|
|
|
| |
in particular, use <B>, <I> and <U> where appropriate.
Provide relative widths for header and footer lines.
Manuals: More concise short descriptions of output modes.
Correct a few places still talking about CSS2 to say CSS1.
Code examples should use .Dl, not .D1.
|
|
|
|
|
|
| |
Use less <DIV>, use more <H1>, <H2>, <P>, <BR>, <PRE>, <UL>, <OL>, <DL> etc.
Triggered by input from Will Backman.
Remove CSS2 note in mandoc.1, which is no longer true.
|
|
|
|
|
|
|
| |
* slightly simplify .Pf *_IGNDELIM code, and share part of it with .No
* do not let opening delimiters fall out of the front of .Ns (from kristaps@)
This fixes a few spacing issues in csh(1) and ksh(1).
OK kristaps@
|
|
|
|
|
|
|
|
|
|
|
|
| |
* ignore double-.Pp
* ignore .Pp before .Bd and .Bl (unless -compact in specified)
* avoid double blank line upon .Pp, .br and friends in literal context
* cast enums to int when passing them to exit(3) to please lint(1)
While merging, fix a regression introduced by kristaps@:
Outside literal mode, double blank lines must both be printed.
To achieve this again after kristaps@ improvements in 1.10.6,
treat such blank lines as .sp (instead of .Pp as in 1.10.5)
and drop .Pp before .sp just like dropping .Pp before .Pp.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NOT including Kristaps' .Bd -literal changes which cause regressions.
Features:
* -Tpdf now fully working
Bugfixes:
* proper handling of quoted strings by .ds in roff(7)
* allow empty .Dd
* make .Sm start no-spacing after the first output word
* underline .Ad
* minor fixes in -Thtml
and some optimisations in terminal output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
new features:
* support the .in macro in man(7)
* support minimal PDF output
* support .Sm in mdoc(7) HTML output
* support .Vb and .nf in man(7) HTML output
* complete the mdoc(7) manual
bug fixes:
* do not let mdoc(7) .Pp produce a newline before/after .Sh; reported by jmc@
* avoid double blank lines related to man(7) .sp and .br
* let man(7) .nf and .fi flush the line; reported by jsg@ and naddy@
* let "\ " produce a non-breaking space; reported by deraadt@
* discard \m colour escape sequences; reported by J.C. Roberts
* map undefined 1-character-escapes to the literal character itself
maintenance:
* express mdoc(7) arguments in terms of an enum for additional type-safety
* simplify mandoc_special() and a2roffdeco()
* use strcspn in term_word() in place of a manual loop
* minor optimisations in the -Tps and -Thtml formatting frontends
|
|
|
|
|
|
|
|
|
|
| |
1) Proper .Bk support: allow output line breaks at input line breaks,
but keep input lines together in the output, finally fixing
synopses like aucat(1), mail(1) and tmux(1).
2) Mostly finished -Tps (PostScript) output.
3) Implement -Thtml output for .Nm blocks and .Bk -words.
4) Allow iterative interpolation of user-defined roff(7) strings.
Also contains some minor bugfixes and some performance improvements.
|
|
|
|
|
|
| |
unclear about which units accept floats/integers, which leads me to
assume that it handles either and rounds as appropriate.
from kristaps@
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up vertical spacing in the SYNOPSIS, making the code much more
systematic; this doesn't solve all SYNOPSIS problems yet, in particular
not those related to keeps, indentation and the low-level .nr roff
instruction, but it's a nice step forward and i couldn't find relevant
regressions. (from kristaps)
Besides,
* make the output width configurable (default: -Owidth=80) (kristaps)
* use mmap with MAP_SHARED (from Joerg Sonnenberger)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
at least one hyphen, we already had support for breaking the line a the
last fitting hyphen. This patch improves this functionality by only
breaking at hyphens in free-form text, and by not breaking at hyphens
* at the beginning or end of a word or
* immediately preceded or followed by another hyphen or
* escaped by a preceding backslash.
Before this patch, differences in break-at-hyphen support were one
of the major sources of noise in automatic comparisons to mdoc(7)
groff output. Now, the remaining differences are hard to find among
the noise coming from other sources.
Where there are still differences, what we do seems to be better than
what groff does, see e.g. the chio(1) exchange and position commands
for one of the now rare examples.
idea and coding by kristaps@
Besides, this was the last substantial code difference left
between bsd.lv and openbsd.org. We are now in full sync.
|
|
|
|
|
|
|
|
|
| |
* preserve multiple consecutive space characters in input
* do not restrict .Cd and .Rv to certain sections (requested by Joerg)
* do not run lookup() on quoted words
* enum return types for mdoc_args and mdoc_argv
* fix auto-closing of LINK tag in -Txhtml (from Daniel Friesel)
* various lint and manual fixes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avoid the bad parts of 1.9.23, and keep local patches.
Input in general:
* Basic handling of roff-style font escapes \f, \F.
* Quoted punctuation does not count as punctuation.
mdoc(7) parser:
* Make .Pf callable; noted by Claus Assmann.
* Let .Bd and .Bl ignore unknown arguments; noted by deraadt@.
* Do not warn when .Er is used outside certain sections.
* Replace mdoc_node_free[list] by mdoc_node_delete.
* Replace #define by enum for rew*() return values.
man(7) parser:
* When .TH is missing, use default section and date.
Output in general:
* Curly braces do not count as punctuation.
* No space after .Fl w/o args when a macro follows on the same line.
HTML output:
* Unify PAIR_*_INIT macros, introduce new PAIR_ID_INIT().
* Print whitespace after, not before .Vt .Fn .Ft .Fo.
Checked that all manuals in base still build.
|
|
|
|
|
|
|
|
|
| |
* corrected .Vt handling (spotted by Joerg Sonnenberger)
* corrected .Xr argument handling (based on my patch)
* removed \\ escape sequence (because it is for low-level roff only)
* warn about trailing whitespace (suggested by jmc@)
* -Txhtml support
* and some general cleanup and doc improvements
|