5 messages in org.python.python-bugs-list[ python-Bugs-639311 ] urllib.basejoi...
FromSent OnAttachments
SourceForge.netMar 23, 2004 5:02 pm 
SourceForge.netMar 23, 2004 6:14 pm 
SourceForge.netMar 25, 2004 10:47 am 
SourceForge.netMar 26, 2004 4:59 am 
SourceForge.netMar 26, 2004 12:02 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:[ python-Bugs-639311 ] urllib.basejoin() mishandles ''Actions...
From:SourceForge.net (nore@sourceforge.net)
Date:Mar 23, 2004 5:02:19 pm
List:org.python.python-bugs-list

Bugs item #639311, was opened at 2002-11-16 04:34 Message generated for change (Comment added) made by bcannon You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=639311&group_id=5470

Category: Python Library Group: Python 2.2.1

Status: Closed Resolution: Invalid

Priority: 5 Submitted By: Mike Brown (mike_j_brown) Assigned to: Nobody/Anonymous (nobody) Summary: urllib.basejoin() mishandles ''

Initial Comment: It's not entirely clear whether urllib.basejoin() intends to implement RFC 2396's "resolution of relative URI references to absolute form" faithfully, but it seems to behave improperly when given an empty string as the relative URI to make absolute.

from urllib import basejoin basejoin('http://host/foo/bar.xml','')

'http://host/foo/'

I believe it should return the base as-is, because the empty string is a reference to the document that contains that reference... and presumably the document's URI is what you're passing in as the base.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)

Date: 2004-03-23 13:37

Message: Logged In: YES user_id=357491

I disagree with what you are expecting. For instance, if you run ``urllib.basejoin("http://python.org/index.html", "/doc")`` it returns "http://python.org/dev", which makes sense. So changing its behavior based on it being an empty string would not strictly match how the function works when compared to being given any string.

And on top of things the function is not even documented, so you really shouldn't be expecting any specific behavior.

Closing this as invalid.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon) Date: 2003-05-21 20:14

Message: Logged In: YES user_id=357491

Perhaps urllib.basejoin (which is not documented) should just become a wrapper for urlparse.urljoin ? It won't solve this bug but it would cut back on unneeded code.

----------------------------------------------------------------------

Comment By: Mike Brown (mike_j_brown) Date: 2002-11-26 02:41

Message: Logged In: YES user_id=371366

I was partly mistaken; the document's URI is not necessarily the base. A reference with an empty path (e.g., an empty string or just a fragment identifier) is a reference to the current document, regardless of the base URI you are resolving against. A base URI is only for resolving relative URIs that are not referencing the current document. See some discussion at http://lists.w3.org/Archives/Public/uri/2002Jan/0015.html

So neither urllib.basejoin() nor urlparse.urljoin() fully implement the RFC 2396 "resolution to absolute form", since there would need to be a way to indicate "current document" other than returning the base.

Nevertheless, basejoin()'s behavior differs from urlparse.urljoin ()'s when presented with the empty string, and it's not clear whether that is intentional.

----------------------------------------------------------------------