83 messages in org.w3.www-tagRe: whenToUseGet-7 counter-proposal
FromSent OnAttachments
Dan ConnollyApr 15, 2002 8:50 am 
Larry MasinterApr 15, 2002 1:44 pm 
David OrchardApr 15, 2002 3:01 pm 
David OrchardApr 15, 2002 3:19 pm 
Mark BakerApr 15, 2002 8:00 pm 
Keith MooreApr 15, 2002 8:37 pm 
Scott CantorApr 15, 2002 9:28 pm 
Edwin KhodabakchianApr 15, 2002 9:34 pm 
David OrchardApr 15, 2002 10:18 pm 
Paul PrescodApr 15, 2002 11:17 pm 
Tim BrayApr 15, 2002 11:32 pm 
Mark NottinghamApr 16, 2002 1:01 am 
Tim BrayApr 16, 2002 1:02 am 
Mark NottinghamApr 16, 2002 1:09 am 
Paul PrescodApr 16, 2002 2:11 am 
Paul PrescodApr 16, 2002 3:02 am 
Mark BakerApr 16, 2002 4:54 am 
Williams, StuartApr 16, 2002 8:22 am 
Keith MooreApr 16, 2002 8:32 am 
jon...@research.att.comApr 16, 2002 8:44 am 
Scott CantorApr 16, 2002 8:55 am 
Paul PrescodApr 16, 2002 9:40 am 
Mark NottinghamApr 16, 2002 9:42 am 
Hutchison, NigelApr 16, 2002 9:43 am 
Henrik Frystyk NielsenApr 16, 2002 10:48 am 
Bullard, Claude L (Len)Apr 16, 2002 1:46 pm 
Larry MasinterApr 16, 2002 6:39 pm 
Roy T. FieldingApr 16, 2002 7:54 pm 
Larry MasinterApr 16, 2002 10:10 pm 
Graham KlyneApr 17, 2002 1:54 am 
Paul PrescodApr 18, 2002 12:33 am 
Graham KlyneApr 18, 2002 9:11 am 
Alex RousskovApr 18, 2002 9:30 am 
Paul PrescodApr 18, 2002 9:45 am 
Graham KlyneApr 18, 2002 11:58 am 
Roy T. FieldingApr 18, 2002 3:11 pm 
Don BoxApr 18, 2002 6:28 pm 
Mark BakerApr 18, 2002 8:50 pm 
Keith MooreApr 18, 2002 8:54 pm 
Paul PrescodApr 18, 2002 10:00 pm 
Graham KlyneApr 19, 2002 12:53 am 
Bill de hÓraApr 19, 2002 4:18 am 
Roy T. FieldingApr 19, 2002 1:20 pm 
Anne Thomas ManesApr 22, 2002 3:23 pm 
Paul PrescodApr 22, 2002 4:01 pm 
Anne Thomas ManesApr 22, 2002 8:17 pm 
Paul PrescodApr 22, 2002 10:21 pm 
Anne Thomas ManesApr 23, 2002 5:36 am 
Paul PrescodApr 23, 2002 12:03 pm 
Paul PrescodApr 23, 2002 2:09 pm 
Roy T. FieldingApr 23, 2002 2:14 pm 
Bullard, Claude L (Len)Apr 23, 2002 2:50 pm 
Joshua AllenApr 23, 2002 2:53 pm 
David OrchardApr 23, 2002 4:14 pm 
Keith MooreApr 23, 2002 5:05 pm 
Roy T. FieldingApr 23, 2002 5:14 pm 
Simon St.LaurentApr 23, 2002 5:18 pm 
Larry MasinterApr 23, 2002 6:31 pm 
Mark BakerApr 23, 2002 6:36 pm 
Paul PrescodApr 23, 2002 8:03 pm 
Tim BrayApr 23, 2002 8:30 pm 
Dan ConnollyApr 23, 2002 9:05 pm 
Joshua AllenApr 23, 2002 9:10 pm 
Anne Thomas ManesApr 23, 2002 9:28 pm 
Mark NottinghamApr 23, 2002 9:42 pm 
Jeff BoneApr 23, 2002 9:42 pm 
Joshua AllenApr 23, 2002 10:02 pm 
Paul PrescodApr 23, 2002 10:05 pm 
Joshua AllenApr 23, 2002 10:27 pm 
Joshua AllenApr 23, 2002 10:38 pm 
Mark NottinghamApr 23, 2002 10:57 pm 
Mark NottinghamApr 23, 2002 11:16 pm 
Joshua AllenApr 23, 2002 11:20 pm 
Dan ConnollyApr 23, 2002 11:23 pm 
Tim BrayApr 23, 2002 11:56 pm 
Bullard, Claude L (Len)Apr 24, 2002 7:23 am 
Larry MasinterApr 24, 2002 8:47 am 
Keith MooreApr 24, 2002 10:46 am 
Bullard, Claude L (Len)Apr 24, 2002 10:56 am 
Aaron SwartzApr 24, 2002 11:27 am 
Mike DierkenApr 24, 2002 12:06 pm 
David OrchardApr 25, 2002 10:54 am 
Roy T. FieldingMay 5, 2002 3:38 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: whenToUseGet-7 counter-proposalActions...
From:Tim Bray (tbr@textuality.com)
Date:Apr 23, 2002 11:56:54 pm
List:org.w3.www-tag

Joshua Allen wrote:

I wasn't claiming that crawlers *can't* crawl querystrings, but any crawlers I have used require you to deliberately turn this on or specify in a filter which querystrings are "safe". I run a crawler internally at Microsoft which crawls pages with querystrings, in fact. But I deliberately configured it to do so, and only with pages that I know to be "safe". I could show you search results that index URLs with querystrings, but that certainly doesn't mean that I consider *all* URLs with querystrings to be "safe" to GET.

I have written two very large-scale high-performance web crawlers that were deployed in production, processing hundreds of millions of web pages. Yes, any such beast has a bunch of heuristics for staying away from dangerous pages. But the existence of a '?' just isn't good enough. When you run a large public robot you get 2 classes of complaint: 1. "you moron, your robot went in my off-limits area and now I'm going to get fired and they'll turn off my child's iron lung" 2. "you moron, why aren't you indexing my pages, because if I don't get more traffic to my website I'll go bankrupt and they'll turn off my child's iron lung." The Robot Exclusion Protocol helps. Intelligent self-defense helps. But robots really do live & die on the assumption that if it's a URI and there's no keep-off sign, you can do a GET on it.

There is no way to guarantee that all URLs will be free of GET side-effects, and it would be misleading to tell people that such a guarantee exists.

No, but if someone posts a URL for which doing a GET produces a side-effect you can legitimately (and I believe in a court of law) tell 'em to take a flying leap if they come after you for the consequences of doing a GET. -Tim