9 messages in com.googlegroups.google-enterprise-developerRe: GSA Content feed for MS Word files| Subject: | Re: GSA Content feed for MS Word files![]() |
|---|---|
| From: | jdandrea (jdan...@gmail.com) |
| Date: | 12/22/2006 06:05:15 AM |
| List: | com.googlegroups.google-enterprise-developer |
Greetings! Been away for a li'l bit, but I'm catching up now ...
ekovalev wrote:
I just tried test content feed with embedded ms word encoding with mmencode. Works fine.
Terrific!
One thing when I tried to include three word documents to one feed I got internal error on GSA, but when I upload each document in separate feed it worked without problem.
Hmm. Can you post the XML on a web site (private link) for download/inspection? I'm curious.
One more question. If ever my files become web accessible I have another problem, I don't have file extensions on the server side.
Ahh. That could pose a challenge when saving the files to disk - and in some cases when viewing them within the browser (depending on the server-side MIME Type or Content Type settings).
[Reads more of the thread.] Yep, what JeffR said.
It seems GSA uses mimetype from feed only to represent doc type in front of title on result page.
Ahh. Interesting. So then if you use this:
<record url="..." mimetype="text/plain"> <content encoding="base64binary">Zm9vIGJhcgo</content> </record>
You're saying there's no distinction of "is this Word or PDF" ... interesting.
So Jeff - what's happening is we have a feed posting content that isn't available via HTTP, is encoded using base64, and doesn't have an explicit mime type (other than text/plain which I presume is used along the base64binary encoding). The question then becomes ... how is the _real_ MIME type determined? Since I've never crawled content this way, hmm ... I'm gonna have to try this. :)
-- Joe D'Andrea Liquid Joe LLC +1 (908) 781-0323




