9 messages in com.googlegroups.google-enterprise-developerRe: GSA Content feed for MS Word files
FromSent OnAttachments
ekovalev05 Dec 2006 06:43 
jdandrea06 Dec 2006 08:47 
jdandrea06 Dec 2006 09:03 
ekovalev06 Dec 2006 09:51 
Jeff Ragusa07 Dec 2006 12:29 
ekovalev08 Dec 2006 07:41 
jdandrea22 Dec 2006 06:05 
ekovalev29 Dec 2006 07:18 
jdandrea13 Feb 2007 08:41 
Subject:Re: GSA Content feed for MS Word files
From:jdandrea (jdan@gmail.com)
Date:12/22/2006 06:05:15 AM
List:com.googlegroups.google-enterprise-developer

Greetings! Been away for a li'l bit, but I'm catching up now ...

ekovalev wrote:

I just tried test content feed with embedded ms word encoding with mmencode. Works fine.

Terrific!

One thing when I tried to include three word documents to one feed I got internal error on GSA, but when I upload each document in separate feed it worked without problem.

Hmm. Can you post the XML on a web site (private link) for download/inspection? I'm curious.

One more question. If ever my files become web accessible I have another problem, I don't have file extensions on the server side.

Ahh. That could pose a challenge when saving the files to disk - and in some cases when viewing them within the browser (depending on the server-side MIME Type or Content Type settings).

[Reads more of the thread.] Yep, what JeffR said.

It seems GSA uses mimetype from feed only to represent doc type in front of title on result page.

Ahh. Interesting. So then if you use this:

<record url="..." mimetype="text/plain"> <content encoding="base64binary">Zm9vIGJhcgo</content> </record>

You're saying there's no distinction of "is this Word or PDF" ... interesting.

So Jeff - what's happening is we have a feed posting content that isn't available via HTTP, is encoded using base64, and doesn't have an explicit mime type (other than text/plain which I presume is used along the base64binary encoding). The question then becomes ... how is the _real_ MIME type determined? Since I've never crawled content this way, hmm ... I'm gonna have to try this. :)