7 messages in com.googlegroups.social-graph-apiRe: HTTP URL clustering/truncating pr...| From | Sent On | Attachments |
|---|---|---|
| Brad Fitzpatrick | 07 Jul 2008 08:25 | |
| Joseph Smarr | 07 Jul 2008 09:01 | |
| Brad Fitzpatrick | 07 Jul 2008 09:03 | |
| artemy tregubenko | 07 Jul 2008 09:23 | |
| Joseph Smarr | 07 Jul 2008 09:28 | |
| Martin Atkins | 07 Jul 2008 10:54 | |
| Brad Fitzpatrick | 07 Jul 2008 11:06 |
| Subject: | Re: HTTP URL clustering/truncating proposal (based on me links)![]() |
|---|---|
| From: | Joseph Smarr (jsm...@gmail.com) |
| Date: | 07/07/2008 09:01:39 AM |
| List: | com.googlegroups.social-graph-api |
I like it, and I think it's well-founded (I also heard this sub-path argument from Tantek, and I buy it). Couldn't you also Is that for domains like flickr? I think I could make the same argument there. js
On Mon, Jul 7, 2008 at 8:26 AM, Brad Fitzpatrick <brad...@google.com> wrote:
While most URLs in the Social Graph API are canonicalized using the open source sgnodemapper code, a fair number of nodes in the graph aren't, and never will be. Namely, "vanity domains".
Currently, these are all unique nodes in the graph:
http://bradfitz.com/ http://bradfitz.com/foaf.xml http://identi.ca/bradfitz http://identi.ca/bradfitz/foaf http://factoryjoe.com/ http://factoryjoe.com/hcard.html http://factoryjoe.com/blog http://factoryjoe.com/blog/2006/02/10/uspto-to-hold-open-source-meeting/ http://factoryjoe.com/blog/2006/07/25/hresume-plugin-now-available/ ......
And so on.
I'd like to cluster the three logical sets above, truncated as follows:
http://bradfitz.com/ http://identi.ca/bradfitz http://factoryjoe.com/
But where to do the truncation? Nobody likes brittle heuristics like hacky one-off regexp rules or similar.
Fortunately we have a much better data source: "me" links. (whether they're XFN, an openid delegate tag, rss/atom/foaf link, etc.)
Talking to Tantek Çelik awhile back, he'd mentioned there's an implict me link from a URL to its parent, (http://foo.com/bar/ ---me--> http://foo.com/) but not vice-versa (which might seem more intuitive) because (as he roughly said), "A root must always be able to partition its namespace." Consider that if http://foo.com/ implied a me link to http://foo.com/users/attacker/ , then user "attacker" could me link back to foo.com and cluster the whole site together.
Unfortunately, I don't see an explanation of this at http://gmpg.org/xfn/11 so I'm afraid I might be remembering it wrong.
But ideally what I'd like to do, if I'm not grossly confused:
If a url ${prefix} has a me link to url ${prefix} + ${suffix}, and the number of path components in the latter URL are greater than those of the former, then truncate at ${prefix}.
That is, whenever a site http://foo.com/ has a me link (XFN or otherwise:
RSS/Atom/FOAF) to http://foo.com/anything, we truncate at http://foo.com/and any
links in the graph too
http://foo.com/* now become http://foo.com/
The path component part is necessary because of all the sites which for what I imagine are aesthetic reasons have their URLs like this:
... instead of what one could argue is a bit more technically correct, like this:
So considering that people are going to use things like /username as the URL, we need to guard against this case:
http://foo.com/dude http://foo.com/dude2_unrelated
If the rule were purely prefix-based, then the first dude, being naive or malicious, could "me"-link to dude2_unrelated and cluster with him, stealing all his outgoing and incoming edges, dirtying up the data.
If this is technically sound, then http://factoryjoe.com/ will have one node in the graph for his site, rather than the hundreds or more he does today. Likewise, a lot of people with a domain + foaf file (like me) will have 1 node on my vanity domain, not two, when doing simple fme=1 queries from it.
Thoughts?
- Brad
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Social Graph API" group.
To post to this group, send email to soci...@googlegroups.com
To unsubscribe from this group, send email to
soci...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/social-graph-api?hl=en
-~----------~----~----~----~------~----~------~--~---




