10 messages in com.googlegroups.bloggerdevRE: [bloggerDev] Re: Problem With Fee...
FromSent OnAttachments
Colin19 Sep 2006 12:20 
Marcelo Calbucci19 Sep 2006 12:30 
Colin19 Sep 2006 13:31 
Colin19 Sep 2006 21:56 
Marcelo Calbucci20 Sep 2006 11:33 
Colin20 Sep 2006 15:40 
Marcelo Calbucci20 Sep 2006 16:48 
Colin20 Sep 2006 19:07 
Marcelo Calbucci21 Sep 2006 09:38 
Colin21 Sep 2006 13:15 
Subject:RE: [bloggerDev] Re: Problem With Feed's Images In Production
From:Marcelo Calbucci (marc@sampa.com)
Date:09/20/2006 04:48:27 PM
List:com.googlegroups.bloggerdev

- The spawning a separate thread to get images faster might not be a good idea. If a crawler, like Google Bot, start hitting your site, it might be as aggressive as a 1 request per second, you might end up with many images on your queue that will never be requested, only the HTML.

- For the file name, I simply replace every "/" with a "_" to save it to disk (plus a few security checks to guarantee the name is safe), so, this file: http://photos1.blogger.com/blogger/2382/774/400/DSC_0516.jpg Would be saved as: "2382_774_400_DSC_0516.jpg"

I'm pretty sure that file names can re-occur, this is, "DSC_0516.jpg" can appear (and it does) on many different blogs and they are different images.

If you want to see my implementation running, sign up at www.sampa.com and after you create your site, select to add a new "Blogger Connector".

Good luck.

-----Original Message----- From: blog@googlegroups.com [mailto:blog@googlegroups.com] On Behalf Of Colin Sent: Wednesday, September 20, 2006 3:41 PM To: Blogger Data API Subject: [bloggerDev] Re: Problem With Feed's Images In Production

"The secret for a good implementation is to save those files on disk once you get them from blogger so you don't spend too much on bandwidth."

That was pretty much my implementation, though I take it one step further because of a detail about how my site is structured: my users see a summary of entries on screen X before they ever see the actual blog post on page Y. Working under that constraint, I am able to spawn a separate thread when they view page X which starts downloading the pics to serve on page Y (if they haven't been downloaded previously).

Using such asynchronous processing, the response time for page X isn't adversely affected even if no pictures have been downloaded yet, which is nice because the image download process for all entries took about 5-15 seconds in my trials which I didn't want any user to have to wait for.

I also found the process of retrieving the feed rather expensive (1-4 seconds) so I cache a blogger's feed for several hours on the server. In my case, there are a very limited number of users who may have blogs to display so caching a whole XmlDocument isn't unreasonable.

Anyway, the only thing that freaks me out about the file names is that I don't know if image names are guaranteed to be unique if judging solely by blogger id and the image name part of the URL.

In other words, I don't know if blog post A can have pic Z.jpg while blog post B has a completely different Z.jpg. The is plausible if unlikely if the middle part of the URL ("/2382/774/400/" for example) is associated with a particular blog post. Then again, I'm also nervous to use the middle part of the URL in my naming because I don't know how static the structure is.

Oh well.

Thanks again for the help!