Bugs item #921657, was opened at 2004-03-23 10:17
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=921657&group_id=5470
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Bernd Zimmermann (bernd_zedv)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser ParseError in start tag
Initial Comment:
when this - obviously correct html - is parsed:
<a href=mailto:xy...@domain.com>xyz</a>
this exception is raised:
HTMLParseError: junk characters in start
tag: '@domain.com>', at line 1, column 1
I work around this by adding '@' to the
allowed character's class:
import HTMLParser
HTMLParser.attrfind = re.compile(
r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)
_#=~@]*))?')
myparser = HTMLParser.HTMLParser()
myparser.feed('<a ... ')
----------------------------------------------------------------------