|Warren Block||Jan 18, 2012 2:49 pm|
|Hiroki Sato||Jan 18, 2012 3:44 pm|
|Warren Block||Jan 18, 2012 5:13 pm|
|Hiroki Sato||Jan 18, 2012 10:57 pm|
|Warren Block||Jan 20, 2012 12:05 pm||.c|
|Gabor Kovesdan||Jan 21, 2012 3:16 pm|
|Warren Block||Jan 21, 2012 4:29 pm|
|Gabor Kovesdan||Jan 23, 2012 9:07 am|
|Warren Block||Jan 23, 2012 11:38 am||.py|
|Hiroki Sato||Jan 24, 2012 5:23 pm|
|Gabor Kovesdan||Jan 24, 2012 6:15 pm|
|Hiroki Sato||Jan 24, 2012 6:18 pm|
|Warren Block||Jan 26, 2012 10:20 am|
|Warren Block||Jan 26, 2012 10:22 am||.diff|
|Hiroki Sato||Jan 26, 2012 7:45 pm|
|Warren Block||Jan 26, 2012 9:46 pm||.diff|
|Hiroki Sato||Jan 26, 2012 10:50 pm||.diff|
|Hiroki Sato||Jan 27, 2012 5:24 am||.diff|
|Warren Block||Jan 27, 2012 7:53 am||.diff|
|Hiroki Sato||Jan 27, 2012 8:58 am||.diff|
|Warren Block||Jan 27, 2012 11:43 am|
|Hiroki Sato||Jan 28, 2012 12:57 am|
|Warren Block||Jan 28, 2012 2:47 pm|
|Hiroki Sato||Jan 28, 2012 10:24 pm|
|Subject:||Re: Tidy and HTML tab spacing|
|From:||Hiroki Sato (hr...@FreeBSD.org)|
|Date:||Jan 18, 2012 3:44:11 pm|
wb> HTML versions of FreeBSD documents are fed through tidy (www/tidy or
wb> www/tidy-devel) for cleanup. There's a bug in tidy that can cause
wb> tab stops to be wrong:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623 wb> wb> Note how DISTNAME and EXTRACT_SUFX do not line up. They are correct wb> in the source book.sgml. wb> wb> So what to do?
I lean to fixing Tidy if possible. The reason why we are using Tidy is to fix mark-ups in rendered results from various tools like Jade, not (only) for human-readability. The results of Tidy are still not perfect from viewpoint of standard conformance, but it is better than nothing even if most of modern www browsers can handle the rendered HTMLs directly.
It is known that there are some problems with entity dereference and white-space handling as you also pointed out.
wb> 3. Tidy could be replaced with some other tool. However, the others
Although I tried xmlindent, xmlformat, and xmllint as a replacement in the past, they were indended for well-formed XML docs and not enough for fixing malformed (sometimes broken) mark-ups.
wb> 4. Add newlines to the HTML in the build process before it gets to wb> tidy: wb> s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/
I think this will break the results because a newline just after ">" is recognized as CDATA.
wb> 5. Don't tidy HTML files at all (suggested as an option by Benedict wb> Reuschling). The unprocessed HTML is ugly, but few people are going wb> to look at it directly. Files that haven't been through tidy are a wb> little larger, about 4% in the case of the Porter's Handbook.
To eliminate Tidy we have to improve standard conformance of the rendered results. I do not know the recent situation precisely because I investigated it seven years ago, but I think it still has some glitches.