atom feed27 messages in com.freebase.data-modeling[Data-modeling] The Curse of the ISBN
FromSent OnAttachments
Reilly HayesAug 21, 2009 4:34 pm 
Stefano MazzocchiAug 22, 2009 10:43 am 
Richard NewmanAug 22, 2009 11:17 am 
Richard NewmanAug 22, 2009 11:26 am 
Iain SproatAug 22, 2009 12:06 pm 
Iain SproatAug 22, 2009 8:10 pm 
Philip KendallAug 24, 2009 4:18 am 
Reilly HayesAug 24, 2009 9:51 am 
Kurt BollackerAug 24, 2009 10:13 am 
Reilly HayesAug 24, 2009 10:40 am 
Brian KarlakAug 31, 2009 3:49 pm 
Brian KarlakAug 31, 2009 4:12 pm 
Brian KarlakSep 2, 2009 4:49 pm 
Tom MorrisSep 2, 2009 5:01 pm 
Philip KendallSep 3, 2009 7:33 am 
Stefano MazzocchiSep 3, 2009 9:18 am 
Stefano MazzocchiSep 3, 2009 11:01 am 
Ed LaurentSep 3, 2009 11:22 am 
Kirrily RobertSep 3, 2009 11:37 am 
Brian KarlakSep 3, 2009 11:45 am 
Tom MorrisSep 3, 2009 2:41 pm 
Vishal TalwarSep 3, 2009 3:11 pm 
Bryan CheungSep 3, 2009 3:36 pm 
Bryan CheungSep 3, 2009 4:27 pm 
Brian KarlakSep 4, 2009 1:50 pm 
Brian KarlakSep 4, 2009 1:56 pm 
Brian KarlakSep 4, 2009 2:09 pm 
Subject:[Data-modeling] The Curse of the ISBN
From:Reilly Hayes (rf@metaweb.com)
Date:Aug 21, 2009 4:34:03 pm
List:com.freebase.data-modeling

Hello All --

One of the challenges of loading books is dealing with ISBNs. Both the ISO and Wikipedia claim that they are unique identifiers for book editions. Because of this, we'd really like ISBNs to act as keys within Freebase. Ideally, we'd like to have an /isbn/ namespace so that people can externally reference book editions in Freebase with an ISBN-based URI.

However, experience in the field shows that ISBNs aren't guaranteed to be unique. Publishers can and do reuse ISBNs. Sometimes they are reused for a completely different book. More commonly, they are reused for the same book but with differences in format or binding. This is still a small subset of cases, but it is common enough that we can't ignore or skip these cases. But Freebase keys can point to one and only one Freebase topic. Once a value wants to point to two or more topics, it can no longer be used as a key.

So, we're left with a paradox. ISBNs should act like keys, allowing external users to reference freebase entities by ISBNs -- but ISBNs can't be keys, since we can't guarantee uniqueness. And note that ISBNs are the only identifiers that have this problem: UPC codes are also notoriously reused. Freebase needs some way to deal with these "weak keys" that somehow solves all of these constraints in a general way. Specifically, a "weak key" should: Provide a consistent pattern that can be used across all weak keys Provide a mechanism to pretend the key is strong by returning a single "best" item Clearly demarcate that the semantics in the keyspace are different from "normal" keys Allow identification of all entities that share the weak key We've spent quite a bit of time over the last few months discussing ways to resolve this conundrum, and we think we've finally come up with an acceptable solution that we'd like to get your feedback on. The basic idea is that ISBNs should point to their own dedicated nodes of type /book/isbn. Then, instead of having a /book/book_edition/isbn be a /type/rawstring value, it will instead be a property link to the / book/isbn node. A root-level namespace ("/weak/") will be created that holds all namespaces with the weak key nature. Keys in the weak namespace point to weak key containers. For example "/ weak/isbn/9780670063260" will point to the "container node" for that ISBN. Weak key containers for ISBN will be typed as /book/isbn. The /book/book_edition/isbn13 will be created as a property that points to nodes with an expected type of /book/isbn. Add a property to the key value type reversing the property from the target type (/book/isbn/items.) (Note that, because of permissioning it is essential that the master property be FROM /book/book_edition TO /book/isbn.) Containers will be named with the ISBN (for client display purposes). For example, container node "/weak/isbn/9780670063260" will be named "9780670063260". The container node is cotyped as namespace, containing the single key "best" that points to the object that "best". For example, "/weak/isbn/ 9780670063260/best" would resolve to
http://www.freebase.com/edit/topic/guid/9202a8c04000641f80000000099fe6b6 Gardening tasks will be created that will look for /book/isbn nodes that don't fit these rules, and create all necessary links so that the rules are fulfilled. We've thought through all the consequences of this proposal, and we're fairly certain that this proposal gives us the desired behavior, without too many adverse side effects. We can go into the details in follow-up emails if you're interested.

Please let us know you're thoughts. We'd like to implement this proposal (along with ISBN13 normalization, remember that?) before the end of the month.

-r