summaryrefslogtreecommitdiffstats
path: root/usr.bin/mandoc/mandoc.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Rudimentary implementation of the roff(7) \o escape sequence (overstrike).schwarze2015-01-211-5/+7
| | | | | | This is of some relevance because the pod2man(1) preamble abuses it for the icelandic letter Thorn, instead of simply using \(TP and \(Tp. Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
* Fix a read buffer overrun triggered by trailing \s- or trailing \s+schwarze2015-01-011-4/+4
| | | | without the required subsequent argument; found by jsg@ with afl.
* Catch localtime() failure for additional safety;schwarze2014-12-151-1/+3
| | | | patch from Jan Stary <hans at stare dot cz> some time ago.
* Add some missing OpenBSD RCS markersschwarze2014-11-281-1/+1
| | | | and a few missing <sys/types.h> inclusions; no code change.
* Tighten Unicode escape name parsing.schwarze2014-10-281-5/+10
| | | | | Accept only 0xXXXX, 0xYXXXX, 0x10XXXX with Y != 0. This simplifies mchars_num2uc().
* Stricter syntax checking of Unicode character names:schwarze2014-10-131-13/+12
| | | | | | | Require exactly 4, 5 or 6 hex digits and allow nothing else. This avoids mishandling stuff like \[ua] and \C'uA' as Unicode and also fixes underlining in eqn(7) -Thtml output which uses \[ul]. Problem found and semantics suggested by kristaps@.
* kristaps@ found this with valgrind, merge his patch from bsd.lv:schwarze2014-08-181-2/+3
| | | | | | | Fix a corner case where \H<nil> (where <nil> is the \0 character) would cause mandoc_escape() to read past the end of an allocated string. Found when a script scanning of all Mac OSX manuals accidentally also scanned binary (gzip'd) files, discussed with schwarze@ on tech@mdocml.
* Clean up messages related to plain text and to escape sequences.schwarze2014-07-061-3/+3
| | | | | * Mention invalid escape sequences and string names, and fallbacks. * Hierarchical naming.
* Fix handling of escape sequences taking numeric arguments.schwarze2014-07-061-2/+4
| | | | | | | * Repair detection of invalid delimiters. * Discard the invalid delimiter together with the invalid sequence. Note to self: In general, strchr("\0...", c) is a thoroughly bad idea.
* Clean up the warnings related to document structure.schwarze2014-07-011-2/+2
| | | | | | | | | * Hierarchical naming of the related enum mandocerr items. * Mention the offending macro, section title, or string. While here, improve some wordings: * Descriptive instead of imperative style. * Uniform style for "missing" and "skipping". * Where applicable, mention the fallback used.
* Start systematic improvements of error reporting.schwarze2014-06-201-3/+3
| | | | | | | | | | | So far, this covers all WARNINGs related to the prologue. 1) hierarchical naming of MANDOCERR_* constants 2) mention the macro name in messages where that adds clarity 3) add one missing MANDOCERR_DATE_MISSING msg 4) fix the wording of one message related to the man(7) prologue Started on the plane back from Ottawa.
* KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,schwarze2014-04-201-62/+62
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* Fully implement the \B (validate numerical expression) andschwarze2014-04-081-6/+3
| | | | | | | | | | | partially implement the \w (measure text width) escape sequence in a way that makes them usable in numerical expressions and in conditional requests, similar to how \n (interpolate number register) and \* (expand user-defined string) are implemented. This lets mandoc(1) handle the baroque low-level roff code found at the beginning of the ggrep(1) manual. Thanks to pascal@ for the report.
* Accept arbitrary argument delimiters for various roff(7) escape sequences.schwarze2014-04-071-5/+5
| | | | Needed for example by the new Perl pod2man(1) preamble.
* The files mandoc.c and mandoc.h contained both specialised low-levelschwarze2014-03-211-69/+2
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Remove duplicate const specifiers from the declaration of mandoc_escape().schwarze2013-12-301-2/+2
| | | | | Found by Thomas Klausner <wiz at NetBSD dot org> using clang. No functional change.
* Simplify: Remove an unused argument from the mandoc_eos() function.schwarze2013-12-301-5/+5
| | | | No functional change.
* I have no idea how it happened that \B, \H, \h, \L, and \l gotschwarze2013-12-261-8/+6
| | | | | | | | | | | mapped to ESCAPE_NUMBERED (which is for \N and only for \N), that made no sense at all. Properly remap them to ESCAPE_IGNORE. While here, move \B and \w from the group taking number arguments to the group taking string arguments; right now, that doesn't imply any functional change, but if we ever go ahead and implement a parser for roff(7) numerical expressions, it will suddenly start to matter, and cause confusion.
* Parse and ignore the roff(7) escape sequences \d (move half line down)schwarze2013-12-251-1/+9
| | | | | | und \u (move half line up). Found by bentley@ in some DocBook crap. Surprisingly, these two do actually occur in our terminfo(5), so this patch reduces groff-mandoc differences in base by 0.03%.
* s/[Nn]ull/NUL/ in comments where appropriate;schwarze2013-12-251-4/+4
| | | | suggested by Thomas Klausner <wiz @ NetBSD dot org>.
* Support the alternative syntax \C'uXXXX' for Unicode characters.schwarze2013-11-101-2/+5
| | | | | | | | | | It is already documented in the Heirloom troff manual, and groff handles it as well. Bug reported by Bjarni Ingi Gislason <bjarniig at rhi dot hi dot is> on <bug-groff at gnu dot org>. Well, admittedly, that bug was reported against groff, but mandoc was even more broken than groff with respect to this syntax...
* Cleanup suggested by gcc-4.8.1, following hints by Christos Zoulas:schwarze2013-10-051-2/+2
| | | | | | | - avoid bad qualifier casting in roff.c, roff_parsetext() by changing the mandoc_escape arguments to "const char const **" - avoid bad qualifier casting in mandocdb.c, index_merge() - garbage collect a few unused variables elsewhere
* Implement the roff(7) font-escape sequence \f(BI "bold+italic".schwarze2013-08-081-9/+15
| | | | | This improves the formatting of about 40 base manuals and reduces groff-mandoc formatting differences in base by about 5%.
* Improve handling of the roff(7) "\t" escape sequence:schwarze2013-06-201-6/+24
| | | | | | | | | | | * Parsing macro arguments has to be done in copy mode, which implies replacing "\t" by a literal tab character. * Otherwise, render "\t" as the empty string, not as a 't' character. This fixes formatting of the distfile example in the oldrdist(1) manual. This also shows up in the unzip(1) manual as one of several issues preventing the removal of USE_GROFF from the archivers/unzip port. Thanks to espie@ for attracting my attention to the unzip(1) manual.
* Support the .cc request; code by kristaps@, tests by me.schwarze2012-07-071-27/+1
| | | | Needed for sqlite3(1) as reported by espie@.
* While i already got my fingers dirty on mandoc_escape(),schwarze2012-05-281-68/+65
| | | | | | | | | | | profit of the occasion to pull out some spaghetti, that is, three confusing variables and fourteen pointless assignments among them; instead, always operate on the official pointers **start, **end, and *sz, each of which conveys an obvious meaning. No functional change intended, and the new tests confirm that everything still (err...) "works", as far as that word can be applied to the kind of roff(7) mock-up code i'm polishing here.
* Make recursive parsing of roff(7) escapes actually work in the general case,schwarze2012-05-281-118/+37
| | | | | | | | | | | | in particular when the inner escapes are preceded or followed by other terms. While doing so, remove lots of bogus code that was trying to make pointless distinctions between numeric and non-numeric escape sequences, while both actually share the same syntax and we ignore the semantics anyway. This prevents some of the strings defined in the pod2man(1) preamble from producing garbage output, in particular in scandinavian words. Of course, proper rendering of scandinavian national characters cannot be expected even with these fixes.
* Implement the roff \z escape sequence, intended to output the nextschwarze2012-05-281-2/+12
| | | | | | | | | | | | | | | | character without advancing the cursor position; implement it to simply skip the next character, as it will usually be overwritten. With this change, the pod2man(1) preamble user-defined string \*:, intended to render as a diaeresis or umlaut diacritic above the preceding character, is rendered in a slightly less ugly way, though still not correctly. It was rendered as "z.." and is now rendered as ".". Given that the definition of \*: uses elaborate manual \h positioning, there is little chance for mandoc(1) to ever render it correctly, but at least we can refrain from printing out a spurious "z", and we can make the \z do something semi-reasonable for easier cases.
* ISO style "%Y-%m-%d" dates are common in man(7) .TH.schwarze2011-11-171-3/+4
| | | | | | | | | | | They have been considered valid in the past, but were reformatted to the mdoc(7) "Month day, year" style. To make page footers more similar to groff, no longer reformat them, just print them as they are. This doesn't change anything with respect to what's considered valid or what is warned about. Putting this in now such that i can improve the unit test suite.
* Parse and ignore the C font family in \f escapes.schwarze2011-11-121-2/+9
| | | | | | Helps with some real-world manuals that (incorrectly) assume that the C font family means "constant width". Suggested by Andreas Vogele, patch by kristaps@.
* Handle \N numbered character escapes the same way as groff:schwarze2011-10-241-7/+23
| | | | | | | | | | | | | | | | If \N is followed by a digit, ignore \N and the digit. If \N is followed by a non-digit, the next non-digit ends the character number; the two delimiters need not match. Kristaps calls that "gross, but not our fault". This fixes most of src/regress/usr.bin/mandoc/char/N/basic.in, except that handling of non-printable characters still differs from groff. For now, i'm fixing \N only. Other escapes taking numeric arguments may or may not need similar handling, but \N is by far the most important for practical purposes. ok kristaps@
* sync to version 1.11.7 from kristaps@schwarze2011-09-181-33/+6
| | | | | | | | main new feature: support the roff(7) .tr request plus various bugfixes and some refactoring regressions are so minor that it's better to get this in and fix them in the tree
* sync to version 1.11.5:schwarze2011-09-181-7/+16
| | | | | | | | adding an implementation of the eqn(7) language by kristaps@ So far, only .EQ/.EN blocks are handled, in-line equations are not, and rendering is not yet very pretty, but the parser is fairly complete.
* Merge release 1.11.3, almost all code by kristaps@:schwarze2011-05-291-146/+339
| | | | | | | | | * Unicode output support (no Unicode input yet, though). * Refactoring: completely handle predefined strings in roff.c. - New function mandoc_escape() replaces a2roffdeco() and mandoc_special(). - Start using mandoc_getarg() in mdoc_argv.c. - Clean up parsing of delimiters in mdoc(7). * And many minor fixes and lots of cleanup.
* Merge version 1.11.1:schwarze2011-04-241-51/+26
| | | | | | | | | | | | | | Again lots of cleanup and maintenance work by kristaps@. - simplify error reporting: less function pointers, more mandoc_[v]msg - main: split document parsing out of main.c into read.c - roff, mdoc, man: improved recognition of control characters - roff: better handling of if/else stack overflows - roff: add some predefined strings for backward compatibility - mdoc, man: empty sections are not errors - mdoc: move delimiter handling to libmdoc - some header restructuring and some minor features and fixes This merge causes two minor regressions that i will fix in separate commits right afterwards.
* Merge version 1.10.10:schwarze2011-04-211-8/+54
| | | | | | | | | | lots of cleanup and maintenance work by kristaps@. - move some main.c globals into struct curparse - move mandoc_*alloc to mandoc.h such that all code can use them - make mandoc_isdelim available to formatting frontends - dissolve mdoc_strings.c, move the code where it is used - make all error reporting functions void, their return values were useless - and various minor cleanups and fixes
* my $buf = "string"; return $string; is cool in Perl, but not in C;schwarze2011-03-151-17/+23
| | | | | found by Ulrich Spoerlein <uqs at freebsd> using the clang static analyzer; "ok, but please document the numbers" kristaps@
* Clean up date handling,schwarze2011-03-071-28/+47
| | | | | | | | | | | | as a first step to get rid of the frequent petty warnings in this area: - always store dates as strings, not as seconds since the Epoch - for input, try the three most common formats everywhere - for unrecognized format, just pass the date though verbatim - when there is no date at all, still use the current date Originally triggered by a one-line patch from Tim van der Molen, <tbvdm at xs4all dot nl>, which is included here. Feedback and OK on manual parts from jmc@. "please check this in" kristaps@
* Unify roff macro argument parsing (in roff.c, roff_userdef()) and man macroschwarze2011-01-031-3/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | argument parsing (in man_argv.c, man_args()), both having different bugs, to use one common macro argument parser (in mandoc.c, mandoc_getarg()), because from the point of view of roff, man macros are just roff macros, hence their arguments are parsed in exactly the same way. While doing so, fix these bugs: * Escaped blanks (i.e. those preceded by an odd number of backslashes) were mishandled as argument separators in unquoted arguments to user-defined roff macros. * Unescaped blanks preceded by an even number of backslashes were not recognized as argument separators in unquoted arguments to man macros. * Escaped backslashes (i.e. pairs of backslashes) were not reduced to single backslashes both in unquoted and quoted arguments both to user-defined roff macros and to man macros. * Escaped quotes (i.e. pairs of quotes inside quoted arguments) were not reduced to single quotes in man macros. OK kristaps@ Note that mdoc macro argument parsing is yet another beast for no good reason and is probably afflicted by similar bugs. But i don't attempt to fix that right now because it is intricately entangled with lots of unrelated high-level mdoc(7) functionality, like delimiter handling and column list phrase handling. Disentagling that would waste too much time now.
* Merge the last bits of 1.10.6 (released today), most were already in:schwarze2010-09-271-6/+6
| | | | | | | | | | | | * ignore double-.Pp * ignore .Pp before .Bd and .Bl (unless -compact in specified) * avoid double blank line upon .Pp, .br and friends in literal context * cast enums to int when passing them to exit(3) to please lint(1) While merging, fix a regression introduced by kristaps@: Outside literal mode, double blank lines must both be printed. To achieve this again after kristaps@ improvements in 1.10.6, treat such blank lines as .sp (instead of .Pp as in 1.10.5) and drop .Pp before .sp just like dropping .Pp before .Pp.
* Parse and ignore the \k, \o, \w, and \z roff escapes, and recursivelyschwarze2010-09-131-6/+40
| | | | | | | | | | | | | | | ignore embedded escapes and mathematical roff subexpressions. In roff copy mode, resolve "\\" to '\'. Allow ".xx\}" where xx is a macro to close roff conditional scope. Mandoc now handles the special character definitions in the pod2man(1) preamble, so remove the explicit redefinitions in chars.c/chars.in. From kristaps@. I have checked that this causes no relevant change to the Perl manuals. The only change introduced is that some non-ASCII characters rendered incorrectly before are now rendered incorrectly in a different way. For example, e accent aigu was "e", now is "e'" and c cedille was "c", now is "c,".
* Implement a simple, consistent user interface for error handling.schwarze2010-08-201-5/+5
| | | | | | | | | | | | | | | | | We now have sufficient practical experience to know what we want, so this is intended to be final: - provide -Wlevel (warning, error or fatal) to select what you care about - provide -Wstop to stop after parsing a file with warnings you care about - provide consistent exit status codes for those warnings you care about - fully document what warnings, errors and fatal errors mean - remove all other cruft from the user interface, less is more: - remove all -f knobs along with the whole -f option - remove the old -Werror because calling warnings "fatal" is silly - always finish parsing each file, unless fatal errors prevent that This commit also includes a couple of related simplifications behind the scenes regarding error handling. Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and Sascha Wildner (DragonFly BSD) agree with the general direction.
* Ignore \h (local horizontal motion) and \v (local vertical motion) escapesschwarze2010-08-181-10/+12
| | | | | | | just in the same way as \s (font size) escapes, and handle \s escapes not having an explicit plus or minus sign. This fixes the very ugly rendering of "C++" in gcc(1). Reported by Thomas Jeunet, fixed by kristaps@.
* Sync to bsd.lv; in particular, pull in lots of bug fixes.schwarze2010-07-251-136/+111
| | | | | | | | | | | | | | | | | | | | | new features: * support the .in macro in man(7) * support minimal PDF output * support .Sm in mdoc(7) HTML output * support .Vb and .nf in man(7) HTML output * complete the mdoc(7) manual bug fixes: * do not let mdoc(7) .Pp produce a newline before/after .Sh; reported by jmc@ * avoid double blank lines related to man(7) .sp and .br * let man(7) .nf and .fi flush the line; reported by jsg@ and naddy@ * let "\ " produce a non-breaking space; reported by deraadt@ * discard \m colour escape sequences; reported by J.C. Roberts * map undefined 1-character-escapes to the literal character itself maintenance: * express mdoc(7) arguments in terms of an enum for additional type-safety * simplify mandoc_special() and a2roffdeco() * use strcspn in term_word() in place of a manual loop * minor optimisations in the -Tps and -Thtml formatting frontends
* Text ending in a full stop, exclamation mark or question markschwarze2010-07-161-11/+13
| | | | | | | | | | | | | | | | | | should not flag the end of a sentence if: 1) The punctuation is followed by closing delimiters and not preceded by alphanumeric characters, like in "There is no full stop (.) in this sentence" or 2) The punctuation is a child of a macro and not preceded by alphanumeric characters, like in "There is no full stop .Pq \&. in this sentence" jmc@ and sobrado@ like this
* merge release 1.10.2schwarze2010-06-261-31/+55
| | | | | | | | | | | | * bug fixes: - interaction of ASCII_HYPH with special chars (found by Ulrich Spoerlein) - handling of roff conditionals (found by Ulrich Spoerlein) - .Bd -offset will no more default to 6n * maintenance: - more caching of .Bd and .Bl arguments for efficiency - deconstify man(7) validation routines - add FreeBSD library names (provided by Ulrich Spoerlein) * start PostScript font-switching
* Merge bsd.lv version 1.10.1 (to be released soon).schwarze2010-06-061-1/+3
| | | | | | | | | | | | | | | | | | | | The main step forward is that this now has *much* better .Bl -column support, now supporting many manuals that previously errored out without producing any output. Other fixes include: * do not die from multiple list types, use the first and warn * in .Bl without a type, default to -item * various tweaks to .Dt * fix .In, .Fd, .Ft, .Fn and .Fo formatting * some documentation fixes and additions * and fix a couple of bugs reported by Ulrich Spoerlein: * better support for roff block-end "\}" without a preceding dot * .In must not break the line outside SYNOPSIS * spelling in some error messages While merging, fix one regression in .In spacing that needs to go to bsd.lv, too.
* When a word does not fully fit onto the output line, but it containsschwarze2010-05-261-1/+29
| | | | | | | | | | | | | | | | | | | | | | | at least one hyphen, we already had support for breaking the line a the last fitting hyphen. This patch improves this functionality by only breaking at hyphens in free-form text, and by not breaking at hyphens * at the beginning or end of a word or * immediately preceded or followed by another hyphen or * escaped by a preceding backslash. Before this patch, differences in break-at-hyphen support were one of the major sources of noise in automatic comparisons to mdoc(7) groff output. Now, the remaining differences are hard to find among the noise coming from other sources. Where there are still differences, what we do seems to be better than what groff does, see e.g. the chio(1) exchange and position commands for one of the now rare examples. idea and coding by kristaps@ Besides, this was the last substantial code difference left between bsd.lv and openbsd.org. We are now in full sync.
* more end-of-sentence (EOS) handling:schwarze2010-05-151-13/+29
| | | | | | * recognize the end of quoted sentences, and of those in parantheses * detect EOS in append_delims, so it works after all macros by kristaps@
* block-implicit macros now up-propogate end-of-sentence spacing;schwarze2010-05-151-2/+3
| | | | from bsd.lv mandoc.c 1.13 and mdoc_macro.c 1.64