summaryrefslogtreecommitdiffstats
path: root/regress/usr.bin/mandoc/char/unicode (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Fix this test after the recent Unicode update in OpenBSD base.schwarze2020-02-271-1/+1
| | | | | | | | | | The test uses U+07FF NKO TAMAN SIGN because it is the highest code point having a two-byte UTF-8 representation. This character is a new single-width punctuation character in Unicode 11, such that mandoc now does correct horizontal spacing. We already used the code point for the test before it was assigned, which resulted in weird spacing because wcwidth(3) returns -1 for unassigned code points.
* adapt to new <p> output logic (html.c rev. 1.131)schwarze2019-09-037-14/+0
|
* Represent mdoc(7) .Pp (and .sp, and some SYNOPSIS and .Rs features)schwarze2019-01-071-7/+7
| | | | | | | | | | | | | | | | | | | by the <p> HTML element and use the html_fillmode() mechanism for .Bd -unfilled, just like it was done for man(7) earlier, finally getting rid both of the horrible <div class="Pp"></div> hack and of the worst HTML syntax violations caused by nested displays. Care is needed because in some situations, paragraphs have to remain open across several subsequent macros, whereas in other situations, they must get closed together with a block containing them. Some implementation details include: * Always close paragraphs before emitting HTML flow content. * Let html_close_paragraph() also close <pre> for extra safety. * Drop the old, now unused function print_paragraph(). * Minor adjustments in the top-level man(7) node formatter for symmetry. * Bugfix: .Ss heads suspend no-fill mode, even though .Ss doesn't end it. * Bugfix: give up on .Op semantic markup for now, see the comment.
* Improve the ASCII rendering of \(Po (Pound Sterling)schwarze2018-08-2116-90/+86
| | | | | and of the playing card suits to match groff, using feedback from Ralph Corderoy <ralph at inputplus dot co dot uk>.
* Fix some issues found looking at groff_char(7):schwarze2018-08-218-12/+12
| | | | | | * Add two missing characters, \('Y and \('y. * The Weierstrass p is not capital, see http://unicode.org/notes/tn27/. * Add a groff-compatible ASCII transliteration for U+02DC: "~".
* Improve ASCII rendering of a few rare character escape sequencesschwarze2017-08-231-5/+5
| | | | | that can be changed unilaterally because groff fails to render them at all.
* catch up with ASCII renderings in chars.c rev. 1.42schwarze2017-08-2316-103/+103
|
* adapt to hex format of character entities,schwarze2017-07-146-322/+322
| | | | committed by bentley@ in html.c rev. 1.86
* Messages of the -Wbase level now print STYLE:. Since thisschwarze2017-07-0415-95/+100
| | | | | | | | causes horrible churn anyway, profit of the opportunity to stop excessive testing, such that this is hopefully the last instance of such churn. Consistently use OpenBSD RCS tags, blank .Os, blank fourth .TH argument, and Mdocdate like everywhere else. Use -Ios=OpenBSD for platform-independent predictable output.
* cope with changes in BASE messagesschwarze2017-06-252-3/+4
|
* churn related to the new style message about RCS idsschwarze2017-06-172-0/+2
|
* add the \(ru (0.5m baseline ruler) character escape sequence,schwarze2017-06-144-4/+4
| | | | abused by mail/nmh; groff_char(7) confirms that this really exists
* Implement automatic line breakingschwarze2017-06-121-1/+1
| | | | | inside individual table cells that contain text blocks. This cures overlong lines in various Xenocara manuals.
* churn caused by the new Mdocdate messages, no easy way to avoid this :(schwarze2017-06-111-0/+1
|
* add about 15 missing character escape sequences found in groff_char(7);schwarze2017-06-0212-24/+64
| | | | triggered by multimedia/mkvtoolnix mkvmerge(1) using \(S2
* Fix -man -Thtml formatting after .nf (which has nothing to doschwarze2017-01-266-339/+1
| | | | | | | | | | | | | | | | with "literal", by the way, it means "no fill"): * Use <pre> such that whitespace is preserved. * Preserve lines breaks. * For font alternating macros, avoid node recursion which required scary juggling with the fill state. Instead, simply print the text children directly. Missing feature first noticed by kristaps@ in 2011, the again reported by afresh1@ in 2016, and finally reported here: https://github.com/Debian/debiman/issues/21 , which i only found because of Shane Kerr's comment here: https://plus.google.com/110314300533310775053/posts/H1eaw9Yskoc
* Implement line breaking of the generated HTML code at space charactersschwarze2017-01-191-1/+1
| | | | | | | | | in filled text. This does not affect HTML semantics, but makes the HTML code even more humanly readable. While here, - collapse multiple consecutive space characters in filled text - and insert a blank between style entries.
* Make HTML output more human readable by overhauling line break logicschwarze2017-01-187-345/+683
| | | | | around tags and by introducing some simple indentation. No change of HTML semantics intended.
* Fix the mandoc test suite after afresh1@ changed wcwidth(3) in libcschwarze2015-12-021-1/+1
| | | | | | | for the private use area starting at U+E000. Sometimes, even i'm surprised how much stuff these tests keep track of. Originally, they were only intended to catch regressions in mandoc... Issue noticed by daniel@, thanks!
* The recent update to /usr/share/locale/UTF-8/LC_CTYPE by afresh1@schwarze2015-11-062-22/+22
| | | | fixed wcwidth(3) for various unusual characters.
* Reject the escape sequences \[uD800] to \[uDFFF] in the parser.schwarze2015-10-133-4/+6
| | | | | | | These surrogates are not valid Unicode codepoints, so treat them just like any other undefined character escapes: Warn about them and do not produce output. Issue noticed while talking to stsp@, semarie@, and bentley@.
* Apparently, some recent update of Unicode data in the base systemschwarze2015-09-085-31/+31
| | | | | | | | | | changed the output of wcwidth(3) for some weird Unicode characters, causing harmless whitespace changes in mandoc(1) output; fix up the regression suite accordingly. The processing of the characters themselves still works correctly, as it did before, and that's what these tests are intended to make sure. They were never intended to check for whitespace issues. Problem reported by jsg@.
* now that groff handles \(bu properly,schwarze2015-07-188-4/+4
| | | | remove the special casing in the test suite
* Render \(lq and \(rq as '"' in -Tascii mode but leave the renderingschwarze2015-02-178-8/+8
| | | | | | of .Do/.Dc, .Dq, .Lb, and .St untouched. Reduces groff-mandoc differences in base by about 7%. Reminded of the issue by naddy@.
* Rewrite the low-level UTF-8 parser from scratch.schwarze2014-12-195-4/+217
| | | | | | | | | | | | | | It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf, 0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode escape sequences from them, and the algorithm contained strong defenses against any attempt to fix it. This cures an assertion failure in the terminal formatter caused by sneaking in ASCII 0x08 (backspace) by "encoding" it as an (invalid) multibyte UTF-8 sequence, found by jsg@ with afl. As a bonus, the new algorithm also reduces the code in the function by about 20%.
* correct -Tutf8 and -Thtml rendering of \(~=schwarze2014-12-168-36/+8
| | | | | and change the name of \(-~ to \(|= to agree with groff; difference found by Carsten dot Kunze at arcor dot de
* correct some character names to match groff;schwarze2014-12-158-20/+48
| | | | reported by Carsten dot Kunze at arcor dot de
* SKIP_GROFF tests need to adapt to the changed rendering of \(bu, tooschwarze2014-11-101-1/+1
|
* test various recent improvements of special character renderingschwarze2014-10-2921-8/+708
|
* some new and/or updated regression tests for -Tascii, -Tutf8schwarze2014-10-2821-4/+932
| | | | and -Thtml rendering of character escape sequences
* Stricter syntax checking of Unicode character names:schwarze2014-10-134-4/+14
| | | | | | | Require exactly 4, 5 or 6 hex digits and allow nothing else. This avoids mishandling stuff like \[ua] and \C'uA' as Unicode and also fixes underlining in eqn(7) -Thtml output which uses \[ul]. Problem found and semantics suggested by kristaps@.
* inevitable churn caused by the section title changeschwarze2014-08-262-2/+2
|
* Support the alternative syntax \C'uXXXX' for Unicode characters.schwarze2013-11-104-12/+12
| | | | | | | | | | It is already documented in the Heirloom troff manual, and groff handles it as well. Bug reported by Bjarni Ingi Gislason <bjarniig at rhi dot hi dot is> on <bug-groff at gnu dot org>. Well, admittedly, that bug was reported against groff, but mandoc was even more broken than groff with respect to this syntax...
* basic tests for the \[uXXXX] escape sequenceschwarze2013-11-105-0/+58