24 messages in com.googlegroups.google-enterprise-developerRe: Metadata only crawl for images
FromSent OnAttachments
Anthony Smith22 May 2008 14:13 
Anthony Smith23 May 2008 05:08 
Jeff Ling23 May 2008 07:32 
Anthony Smith23 May 2008 11:23 
Jeff Ling23 May 2008 11:46 
Anthony Smith23 May 2008 11:57 
Jeff Ling23 May 2008 13:10 
Anthony Smith27 May 2008 06:52 
Anthony Smith29 May 2008 12:29 
Jeff Ling29 May 2008 12:34 
Anthony Smith29 May 2008 12:46 
Anthony Smith30 May 2008 13:29 
Jeff Ling30 May 2008 13:33 
Anthony Smith30 May 2008 13:45 
Jeff Ling30 May 2008 13:50 
Anthony Smith30 May 2008 13:52 
Anthony Smith30 May 2008 15:14 
Jeff Ling01 Jun 2008 08:24 
Anthony Smith02 Jun 2008 13:47 
John Lacey05 Jun 2008 01:41 
Anthony Smith05 Jun 2008 06:39 
Anthony Smith05 Jun 2008 06:45 
John Lacey06 Jun 2008 11:42 
Anthony Smith06 Jun 2008 12:23 
Subject:Re: Metadata only crawl for images
From:Jeff Ling (jeff@google.com)
Date:05/29/2008 12:34:03 PM
List:com.googlegroups.google-enterprise-developer

Anthony,

You probably wanna contact support - I don't know the answer.

Jeff

On Thu, May 29, 2008 at 12:30 PM, Anthony Smith < anth@frontlinelogic.com> wrote:

I'm sending my reply to this message in case it didn't come across.

Yes, the site is protected. We're using Metadata and URL feeds. The GSA is doing the crawling. The Crawler Access has been configured and seems to be working properly. PDFs and DOC files are getting indexed just fine. The exact error we're getting is "Error: Other 4xx HTTP response code." This error is only happening on image files (PNG, JPEG, GIF, etc...).

Any ideas?

*Anthony Smith*, Developer Frontline Logic, Inc. http://www.frontlinelogic.com

Office Number +1 765-854-0739 Mobile Number +1 765-461-5254

On May 23, 2008, at 4:10 PM, Jeff Ling wrote:

I guess the site is protected? Are you using meta-url feeds or content feeds? Is GSA doing the crawling (it seems so)? Have you configured Crawler Access if that's the case?

On Fri, May 23, 2008 at 11:58 AM, Anthony Smith < anth@frontlinelogic.com> wrote:

Yes, I have tried and yes it does work. *Anthony Smith*, Developer Frontline Logic, Inc. http://www.frontlinelogic.com

Office Number +1 765-854-0739 Mobile Number +1 765-461-5254

On May 23, 2008, at 2:47 PM, Jeff Ling wrote:

Have you tried to access the same URL from a browser? Does it work?

On Fri, May 23, 2008 at 11:24 AM, Anthony Smith < anth@frontlinelogic.com> wrote:

Thanks for the reply Jeff. We've commented out the lines pertaining to images (jpeg, gif, png, etc...) in the crawl exception patterns. The GSA is now trying to crawl them but we're getting a *Error: Other 4xx HTTP Response Code* (or something along those lines). The items are still not searchable. Is there something more I need to do GSA wise or something I might be missing in our connector implementation by any chance? Thanks! *Anthony Smith*, Developer Frontline Logic, Inc. http://www.frontlinelogic.com

Office Number +1 765-854-0739 Mobile Number +1 765-461-5254

On May 23, 2008, at 10:33 AM, Jeff Ling wrote:

You could definitely do that - make sure the files with image extensions are not excluded from the "Crawl & Index" exclusion patterns - by default they are.

On Fri, May 23, 2008 at 5:09 AM, Anthony Smith < anth@frontlinelogic.com> wrote:

I forgot to mention the most important part! This is for connector development. We have images coming out of our repository with metadata attached to them and we'd like them to be searchable based on those metadata values.

Hey Folks,

Is there a way to "crawl" images only for metadata? We understand that an image can't be full-text indexed but we're still sending metadata information on the images that we'd still like to search on. Any help would be appreciated!