| From | Sent On | Attachments |
|---|---|---|
| Warren Block | Jan 18, 2012 2:49 pm | |
| Hiroki Sato | Jan 18, 2012 3:44 pm | |
| Warren Block | Jan 18, 2012 5:13 pm | |
| Hiroki Sato | Jan 18, 2012 10:57 pm | |
| Warren Block | Jan 20, 2012 12:05 pm | .c |
| Gabor Kovesdan | Jan 21, 2012 3:16 pm | |
| Warren Block | Jan 21, 2012 4:29 pm | |
| Gabor Kovesdan | Jan 23, 2012 9:07 am | |
| Warren Block | Jan 23, 2012 11:38 am | .py |
| Hiroki Sato | Jan 24, 2012 5:23 pm | |
| Gabor Kovesdan | Jan 24, 2012 6:15 pm | |
| Hiroki Sato | Jan 24, 2012 6:18 pm | |
| Warren Block | Jan 26, 2012 10:20 am | |
| Warren Block | Jan 26, 2012 10:22 am | .diff |
| Hiroki Sato | Jan 26, 2012 7:45 pm | |
| Warren Block | Jan 26, 2012 9:46 pm | .diff |
| Hiroki Sato | Jan 26, 2012 10:50 pm | .diff |
| Hiroki Sato | Jan 27, 2012 5:24 am | .diff |
| Warren Block | Jan 27, 2012 7:53 am | .diff |
| Hiroki Sato | Jan 27, 2012 8:58 am | .diff |
| Warren Block | Jan 27, 2012 11:43 am | |
| Hiroki Sato | Jan 28, 2012 12:57 am | |
| Warren Block | Jan 28, 2012 2:47 pm | |
| Hiroki Sato | Jan 28, 2012 10:24 pm |
| Subject: | Re: Tidy and HTML tab spacing | |
|---|---|---|
| From: | Warren Block (wbl...@wonkity.com) | |
| Date: | Jan 23, 2012 11:38:50 am | |
| List: | org.freebsd.freebsd-doc | |
| Attachments: | ||
On Mon, 23 Jan 2012, Gabor Kovesdan wrote:
On 2012.01.22. 1:30, Warren Block wrote:
On Sun, 22 Jan 2012, Gabor Kovesdan wrote:
On 2012.01.18. 23:49, Warren Block wrote:
5. Don't tidy HTML files at all (suggested as an option by Benedict Reuschling). The unprocessed HTML is ugly, but few people are going to look at it directly. Files that haven't been through tidy are a little larger, about 4% in the case of the Porter's Handbook.
I also think tidy should be removed. As hrs wrote, new standards should be evaluated and probably they are much better. (I think they are.) If there are some nits, then we should process it with a custom script or something, instead of this crapware.
Tidy does a lot; it would be a lot of work to recreate.
Tidy is also the reason that our webpages are not valid HTML.
A new version of Tidy is supposed to be out soonish. Whether it will solve the problems, I don't know.
What about lxml? Available in ports (devel/py-lxml), reputed to be good at parsing problem HTML and creating good XHTML. A quick test showed that it seems to do okay with <pre> elements.
A quick script to generate a test is attached. The W3C validator says this version of the Porter's Handbook has eight errors, versus the six errors and five warnings of the Tidy version. (The ugly special-case in line 12 drops the lxml version to five errors.)
#!/usr/bin/env python
from lxml import etree import re
inhtml = open('book.html', 'r').read()
tree = etree.HTML(inhtml.replace('\r', '')) outxhtml = '\n'.join([ etree.tostring(stree, pretty_print=True, method="xml") for stree in tree ])
outxhtml = outxhtml.replace('compact="COMPACT"', 'compact="compact"')
f = open('lxml.html', 'w')
f.write('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n')
f.write('<html xmlns="http://www.w3.org/1999/xhtml">\n')
f.write(outxhtml)
f.write('</html>\n')
f.close()
_______________________________________________ free...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-doc To unsubscribe, send any mail to "free...@freebsd.org"






.c