atom feed24 messages in org.freebsd.freebsd-docRe: Tidy and HTML tab spacing
FromSent OnAttachments
Warren BlockJan 18, 2012 2:49 pm 
Hiroki SatoJan 18, 2012 3:44 pm 
Warren BlockJan 18, 2012 5:13 pm 
Hiroki SatoJan 18, 2012 10:57 pm 
Warren BlockJan 20, 2012 12:05 pm.c
Gabor KovesdanJan 21, 2012 3:16 pm 
Warren BlockJan 21, 2012 4:29 pm 
Gabor KovesdanJan 23, 2012 9:07 am 
Warren BlockJan 23, 2012 11:38 am.py
Hiroki SatoJan 24, 2012 5:23 pm 
Gabor KovesdanJan 24, 2012 6:15 pm 
Hiroki SatoJan 24, 2012 6:18 pm 
Warren BlockJan 26, 2012 10:20 am 
Warren BlockJan 26, 2012 10:22 am.diff
Hiroki SatoJan 26, 2012 7:45 pm 
Warren BlockJan 26, 2012 9:46 pm.diff
Hiroki SatoJan 26, 2012 10:50 pm.diff
Hiroki SatoJan 27, 2012 5:24 am.diff
Warren BlockJan 27, 2012 7:53 am.diff
Hiroki SatoJan 27, 2012 8:58 am.diff
Warren BlockJan 27, 2012 11:43 am 
Hiroki SatoJan 28, 2012 12:57 am 
Warren BlockJan 28, 2012 2:47 pm 
Hiroki SatoJan 28, 2012 10:24 pm 
Subject:Re: Tidy and HTML tab spacing
From:Warren Block (wbl@wonkity.com)
Date:Jan 23, 2012 11:38:50 am
List:org.freebsd.freebsd-doc
Attachments:

On Mon, 23 Jan 2012, Gabor Kovesdan wrote:

On 2012.01.22. 1:30, Warren Block wrote:

On Sun, 22 Jan 2012, Gabor Kovesdan wrote:

On 2012.01.18. 23:49, Warren Block wrote:

5. Don't tidy HTML files at all (suggested as an option by Benedict Reuschling). The unprocessed HTML is ugly, but few people are going to look at it directly. Files that haven't been through tidy are a little larger, about 4% in the case of the Porter's Handbook.

I also think tidy should be removed. As hrs wrote, new standards should be evaluated and probably they are much better. (I think they are.) If there are some nits, then we should process it with a custom script or something, instead of this crapware.

Tidy does a lot; it would be a lot of work to recreate.

Tidy is also the reason that our webpages are not valid HTML.

A new version of Tidy is supposed to be out soonish. Whether it will solve the problems, I don't know.

What about lxml? Available in ports (devel/py-lxml), reputed to be good at parsing problem HTML and creating good XHTML. A quick test showed that it seems to do okay with <pre> elements.

A quick script to generate a test is attached. The W3C validator says this version of the Porter's Handbook has eight errors, versus the six errors and five warnings of the Tidy version. (The ugly special-case in line 12 drops the lxml version to five errors.)

#!/usr/bin/env python

from lxml import etree import re

inhtml = open('book.html', 'r').read()

tree = etree.HTML(inhtml.replace('\r', '')) outxhtml = '\n'.join([ etree.tostring(stree, pretty_print=True, method="xml") for stree in tree ])

outxhtml = outxhtml.replace('compact="COMPACT"', 'compact="compact"')

f = open('lxml.html', 'w') f.write('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n') f.write('<html xmlns="http://www.w3.org/1999/xhtml">\n') f.write(outxhtml) f.write('</html>\n') f.close()