| From | Sent On | Attachments |
|---|---|---|
| Reilly Hayes | Aug 21, 2009 4:34 pm | |
| Stefano Mazzocchi | Aug 22, 2009 10:43 am | |
| Richard Newman | Aug 22, 2009 11:17 am | |
| Richard Newman | Aug 22, 2009 11:26 am | |
| Iain Sproat | Aug 22, 2009 12:06 pm | |
| Iain Sproat | Aug 22, 2009 8:10 pm | |
| Philip Kendall | Aug 24, 2009 4:18 am | |
| Reilly Hayes | Aug 24, 2009 9:51 am | |
| Kurt Bollacker | Aug 24, 2009 10:13 am | |
| Reilly Hayes | Aug 24, 2009 10:40 am | |
| Brian Karlak | Aug 31, 2009 3:49 pm | |
| Brian Karlak | Aug 31, 2009 4:12 pm | |
| Brian Karlak | Sep 2, 2009 4:49 pm | |
| Tom Morris | Sep 2, 2009 5:01 pm | |
| Philip Kendall | Sep 3, 2009 7:33 am | |
| Stefano Mazzocchi | Sep 3, 2009 9:18 am | |
| Stefano Mazzocchi | Sep 3, 2009 11:01 am | |
| Ed Laurent | Sep 3, 2009 11:22 am | |
| Kirrily Robert | Sep 3, 2009 11:37 am | |
| Brian Karlak | Sep 3, 2009 11:45 am | |
| Tom Morris | Sep 3, 2009 2:41 pm | |
| Vishal Talwar | Sep 3, 2009 3:11 pm | |
| Bryan Cheung | Sep 3, 2009 3:36 pm | |
| Bryan Cheung | Sep 3, 2009 4:27 pm | |
| Brian Karlak | Sep 4, 2009 1:50 pm | |
| Brian Karlak | Sep 4, 2009 1:56 pm | |
| Brian Karlak | Sep 4, 2009 2:09 pm |
| Subject: | [Data-modeling] The Curse of the ISBN | |
|---|---|---|
| From: | Reilly Hayes (rf...@metaweb.com) | |
| Date: | Aug 21, 2009 4:34:03 pm | |
| List: | com.freebase.data-modeling | |
Hello All --
One of the challenges of loading books is dealing with ISBNs. Both the ISO and Wikipedia claim that they are unique identifiers for book editions. Because of this, we'd really like ISBNs to act as keys within Freebase. Ideally, we'd like to have an /isbn/ namespace so that people can externally reference book editions in Freebase with an ISBN-based URI.
However, experience in the field shows that ISBNs aren't guaranteed to be unique. Publishers can and do reuse ISBNs. Sometimes they are reused for a completely different book. More commonly, they are reused for the same book but with differences in format or binding. This is still a small subset of cases, but it is common enough that we can't ignore or skip these cases. But Freebase keys can point to one and only one Freebase topic. Once a value wants to point to two or more topics, it can no longer be used as a key.
So, we're left with a paradox. ISBNs should act like keys, allowing
external users to reference freebase entities by ISBNs -- but ISBNs
can't be keys, since we can't guarantee uniqueness. And note that
ISBNs are the only identifiers that have this problem: UPC codes are
also notoriously reused. Freebase needs some way to deal with these
"weak keys" that somehow solves all of these constraints in a general
way. Specifically, a "weak key" should:
Provide a consistent pattern that can be used across all weak keys
Provide a mechanism to pretend the key is strong by returning a single
"best" item
Clearly demarcate that the semantics in the keyspace are different
from "normal" keys
Allow identification of all entities that share the weak key
We've spent quite a bit of time over the last few months discussing
ways to resolve this conundrum, and we think we've finally come up
with an acceptable solution that we'd like to get your feedback on.
The basic idea is that ISBNs should point to their own dedicated nodes
of type /book/isbn. Then, instead of having a /book/book_edition/isbn
be a /type/rawstring value, it will instead be a property link to the /
book/isbn node.
A root-level namespace ("/weak/") will be created that holds all
namespaces with the weak key nature.
Keys in the weak namespace point to weak key containers. For example "/
weak/isbn/9780670063260" will point to the "container node" for that
ISBN.
Weak key containers for ISBN will be typed as /book/isbn.
The /book/book_edition/isbn13 will be created as a property that
points to nodes with an expected type of /book/isbn.
Add a property to the key value type reversing the property from the
target type (/book/isbn/items.) (Note that, because of permissioning
it is essential that the master property be FROM /book/book_edition
TO /book/isbn.)
Containers will be named with the ISBN (for client display purposes).
For example, container node "/weak/isbn/9780670063260" will be named
"9780670063260".
The container node is cotyped as namespace, containing the single key
"best" that points to the object that "best". For example, "/weak/isbn/
9780670063260/best" would resolve to
http://www.freebase.com/edit/topic/guid/9202a8c04000641f80000000099fe6b6
Gardening tasks will be created that will look for /book/isbn nodes
that don't fit these rules, and create all necessary links so that the
rules are fulfilled.
We've thought through all the consequences of this proposal, and we're
fairly certain that this proposal gives us the desired behavior,
without too many adverse side effects. We can go into the details in
follow-up emails if you're interested.
Please let us know you're thoughts. We'd like to implement this proposal (along with ISBN13 normalization, remember that?) before the end of the month.
-r
_______________________________________________ Data-modeling mailing list Data...@freebase.com http://lists.freebase.com/mailman/listinfo/data-modeling





