4 messages in org.python.python-bugs-list[ python-Bugs-914148 ] xml.sax segfau...
FromSent OnAttachments
SourceForge.netMar 11, 2004 9:13 am 
SourceForge.netMar 15, 2004 5:21 am 
SourceForge.netMar 15, 2004 5:26 am 
SourceForge.netMar 19, 2004 5:45 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:[ python-Bugs-914148 ] xml.sax segfault on errorActions...
From:SourceForge.net (nore@sourceforge.net)
Date:Mar 15, 2004 5:26:35 am
List:org.python.python-bugs-list

Bugs item #914148, was opened at 2004-03-11 06:14 Message generated for change (Comment added) made by moraes You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=914148&group_id=5470

Category: XML Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Adam Sampson (adamsampson) Assigned to: Nobody/Anonymous (nobody) Summary: xml.sax segfault on error

Initial Comment: While (mistakenly) using Mark Pilgrim&#039;s feedparser module to parse data from <http://www.gothamist.com/archives/news_nyc/index.php>, Python segfaults when it should invoke an error handler for invalid XML. The attached code demonstrates the problem; it occurs with Python 2.2.3 and 2.3.3 on my system. I&#039;ve tried to chop the example data down as far as possible, but reducing it any further doesn&#039;t exhibit the problem (it&#039;s currently just above 64k, which might be a coincidence).

The gdb traceback I get from the example is as follows:

#0 normal_updatePosition (enc=0x404a4fc0, ptr=0x40682000 <Address 0x40682000 out of bounds>, end=0x81e87e0 "a></div>\n\n<div id=\content\>\n\n<div class=\blog\>\n<!--\n<rdf:RDF xmlns:rdf=\http://www.w3.org/1999/02/22-rdf-syntax-ns#\\n

xmlns:trackback=\http://madskills.com/public/xml/rss/module/trackback/\\n"..., pos=0x81e7dac) at /120g/gar/python/python23/work/Python-2.3.3/Modules/expat/xmltok_impl.c:1745 #1 0x40484288 in XML_GetCurrentLineNumber (parser=0x81e7c18) at /120g/gar/python/python23/work/Python-2.3.3/Modules/expat/xmlparse.c:1605 #2 0x40481fc5 in set_error (self=0x0, code=XML_ERROR_TAG_MISMATCH) at /120g/gar/python/python23/work/Python-2.3.3/Modules/pyexpat.c:124 #3 0x40480ae7 in xmlparse_Parse (self=0x402fddac, args=0x0) at /120g/gar/python/python23/work/Python-2.3.3/Modules/pyexpat.c:888 #4 0x080fc25a in PyCFunction_Call (func=0x402faa0c, arg=0x402f338c, kw=0xfffffffb) at Objects/methodobject.c:108 #5 0x080aa674 in call_function (pp_stack=0xbffff03c, oparg=0) at Python/ceval.c:3439 #6 0x080a8a2e in eval_frame (f=0x816e45c) at Python/ceval.c:2116 #7 0x080a95bc in PyEval_EvalCodeEx (co=0x40303de0, globals=0xfffffffb, locals=0x0, args=0x816e5a8, argcount=2, kws=0x816a9fc, kwcount=0, defs=0x40321678, defcount=1, closure=0x0) at Python/ceval.c:2663 #8 0x080aa729 in fast_function (func=0xfffffffb, pp_stack=0xbffff1bc, n=2, na=0, nk=135703028) at Python/ceval.c:3529 #9 0x080aa56c in call_function (pp_stack=0xbffff1bc, oparg=0) at Python/ceval.c:3458 #10 0x080a8a2e in eval_frame (f=0x816a894) at Python/ceval.c:2116 #11 0x080a95bc in PyEval_EvalCodeEx (co=0x402fd2a0, globals=0xfffffffb, locals=0x0, args=0x402f3318, argcount=2, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2663 #12 0x080fbda7 in function_call (func=0x4030617c, arg=0x402f330c, kw=0x0) at Objects/funcobject.c:504 #13 0x0805b899 in PyObject_Call (func=0x40682000, arg=0x0, kw=0x0) at Objects/abstract.c:1755 #14 0x08062288 in instancemethod_call (func=0x4030617c, arg=0x402f330c, kw=0x0) at Objects/classobject.c:2433 #15 0x0805b899 in PyObject_Call (func=0x40682000, arg=0x0, kw=0x0) at Objects/abstract.c:1755 #16 0x080aa892 in do_call (func=0x4032025c, pp_stack=0x402f330c, na=0, nk=0) at Python/ceval.c:3644 #17 0x080aa4f9 in call_function (pp_stack=0xbffff5fc, oparg=0) at Python/ceval.c:3460 #18 0x080a8a2e in eval_frame (f=0x818b414) at Python/ceval.c:2116 #19 0x080aa7ad in fast_function (func=0xfffffffb, pp_stack=0xbffff71c, n=2, na=0, nk=1076865996) at Python/ceval.c:3518 #20 0x080aa56c in call_function (pp_stack=0xbffff71c, oparg=0) at Python/ceval.c:3458 #21 0x080a8a2e in eval_frame (f=0x8183814) at Python/ceval.c:2116 #22 0x080a95bc in PyEval_EvalCodeEx (co=0x402ed2a0, globals=0xfffffffb, locals=0x0, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2663 #23 0x080abdb9 in PyEval_EvalCode (co=0x0, globals=0x0, locals=0x0) at Python/ceval.c:537 #24 0x080d7d2b in run_node (n=0x402bb79c, filename=0x0, globals=0x0, locals=0x0, flags=0x0) at Python/pythonrun.c:1265 #25 0x080d74df in PyRun_SimpleFileExFlags (fp=0x8139050, filename=0xbffffa4d "testexpat.py", closeit=-1073743283, flags=0xbffff878) at Python/pythonrun.c:862 #26 0x08054dd5 in Py_Main (argc=1, argv=0xbffff8f4) at Modules/main.c:415 #27 0x0805492b in main (argc=0, argv=0x0) at Modules/python.c:23

----------------------------------------------------------------------

Comment By: Mark Moraes (moraes) Date: 2004-03-15 02:26

Message: Logged In: YES user_id=390363

#! /usr/bin/env python

dhead = """<?xml version="1.0" encoding="ISO-8859-1" ?> <item><title>&#187</title></item> <item><title> """ dtail = """</title></item> """

import xml.sax from cStringIO import StringIO as _StringIO

class _StrictFeedParser: def _err(self, errtype, exc): print errtype, exc.getMessage(), 'line',
exc.getLineNumber(), 'column', exc.getColumnNumber() def fatalError(self, exc): self._err('fatalError', exc) # raise exc # avoids the problem def error(self, exc): self._err('error', exc) def warning(self, exc): self._err('warning', exc)

def parse(data): feedparser = _StrictFeedParser() saxparser = xml.sax.make_parser(["drv_libxml2"]) saxparser.setErrorHandler(feedparser) source = xml.sax.xmlreader.InputSource() source.setByteStream(_StringIO(data)) saxparser.parse(source)

if __name__ == '__main__': for i in xrange(65427,66000,1): print i parse(dhead + 'x'*i + dtail)

----------------------------------------------------------------------

Comment By: Mark Moraes (moraes) Date: 2004-03-15 02:22

Message: Logged In: YES user_id=390363

I ran into this as well -- turns out that 64k is relevant: I have a simpler script that reproduces this problem -- create an unterminated character ref such as "&#171" without the trailing semi-colon and add roughly 64k of data after it. The crash occurs if the sax parser has an ErrorHandler set where the fatalError() method returns normally instead of terminating/raising the exception.

As a defensive measure, I suggest that any call to the fatalError method be followed by a raise of the exception if fatalError returns.

----------------------------------------------------------------------