1 message in org.python.python-bugs-list[ python-Bugs-921657 ] HTMLParser Par...
FromSent OnAttachments
SourceForge.netMar 23, 2004 5:17 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:[ python-Bugs-921657 ] HTMLParser ParseError in start tagActions...
From:SourceForge.net (nore@sourceforge.net)
Date:Mar 23, 2004 5:17:27 am
List:org.python.python-bugs-list

Bugs item #921657, was opened at 2004-03-23 10:17 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=921657&group_id=5470

Category: Python Library Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Bernd Zimmermann (bernd_zedv) Assigned to: Nobody/Anonymous (nobody) Summary: HTMLParser ParseError in start tag

Initial Comment: when this - obviously correct html - is parsed:

<a href=mailto:xy@domain.com>xyz</a>

this exception is raised: HTMLParseError: junk characters in start tag: &#039;@domain.com>&#039;, at line 1, column 1

I work around this by adding &#039;@&#039; to the allowed character&#039;s class:

import HTMLParser HTMLParser.attrfind = re.compile( r&#039;\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*&#039; r&#039;(\&#039;[^\&#039;]*\&#039;|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\) _#=~@]*))?&#039;)

myparser = HTMLParser.HTMLParser() myparser.feed(&#039;<a ... &#039;)

----------------------------------------------------------------------