7 messages in com.googlegroups.social-graph-apiRe: SG API not picking up rel=me link...
FromSent OnAttachments
Stuart Langridge03 Jul 2008 06:11 
Martin Atkins03 Jul 2008 10:00 
Stuart Langridge03 Jul 2008 12:19 
Brad Fitzpatrick03 Jul 2008 12:33 
Stuart Langridge03 Jul 2008 12:36 
Brad Fitzpatrick03 Jul 2008 12:43 
Bob Ngu04 Jul 2008 08:39 
Subject:Re: SG API not picking up rel=me links on my pages
From:Brad Fitzpatrick (brad@google.com)
Date:07/03/2008 12:33:20 PM
List:com.googlegroups.social-graph-api

On Thu, Jul 3, 2008 at 12:20 PM, Stuart Langridge <si@kryogenix.org> wrote:

http://socialgraph.apis.google.com/lookup?q=http://www.kryogenix.org&fme=1&pretty=1

doesn't seem to be picking much up. I'd expect it to follow the / contact link, because it has rel="me" on it, and then from that page follow other rel="me" links to places like Twitter and Flickr and so on. How can I find out why the lookup code isn't picking up my links? If it's me in the wrong then I'm happy to change things around...

There's an API for running pages through the API for testing purposes: http://code.google.com/apis/socialgraph/docs/testparse.html

But having said that, your pages do seem to be being parsed as expected:

[snip test]

Yep. Hence my puzzlement :)

It is reassuring to know that it's not just that I've got it wrong, anyway. Is socialgraph.apis just running an older version of the code?

I put up the /testparse interface so people can tell the difference between the parsers sucking versus the crawl coverage sucking.

My goal's been to get the parsers as good as possible first, then I'm going to start addressing the crawl coverage issues. Googlebot doesn't necessarily care about crawling the same things that the SGAPI would like. I need to give it steering directions.

There's also a lot of data I'm not using yet. I also want to work on latency. From the time Googlebot hits your site, I want it in the SGAPI index within minutes (if not sooner), not the hours/days/more it can take now. The main data source I use now is the web index which has a bunch of stuff I don't need in it.... e.g. Pagerank/etc. So I should be using a lower-latency, lower-level data source for day-to-day stuff, and just using the web index for back-fill and to learn about gaps that I should steer Googlebot towards.

Short-term I want to build a public / open source regression test suite for the parsers (not the parsers themselves, though -- too inseparable and not that interesting) and let everybody see everything that is and should be parsed. Then others could in theory maintain that and report bugs of missing things in the parsers/canonicalization while I switch gears to working mainly on coverage issues.

I might also put up a rate-limited, google-login-required "Crawl my page and updat the index for the SGAPI now" page.

- Brad