atom feed15 messages in org.georgialibraries.list.open-ils-devRe: [OPEN-ILS-DEV] Monograph Parts
FromSent OnAttachments
Mike RylanderFeb 15, 2011 12:08 pm.pdf
Dan WellsFeb 15, 2011 1:35 pm 
Hardy, ElaineFeb 16, 2011 6:28 am 
Mike RylanderFeb 16, 2011 9:36 am 
Mike RylanderFeb 16, 2011 9:55 am 
Hardy, ElaineFeb 16, 2011 10:44 am 
Mike RylanderFeb 16, 2011 12:09 pm 
Dan WellsFeb 16, 2011 1:43 pm 
Mike RylanderFeb 16, 2011 5:27 pm 
Hardy, ElaineFeb 17, 2011 11:14 am 
Hardy, ElaineFeb 17, 2011 11:34 am 
Mike RylanderFeb 17, 2011 1:05 pm 
Dan WellsFeb 17, 2011 3:53 pm 
Mike RylanderFeb 18, 2011 8:53 am 
Hardy, ElaineFeb 21, 2011 11:03 am 
Subject:Re: [OPEN-ILS-DEV] Monograph Parts
From:Dan Wells (db@calvin.edu)
Date:Feb 17, 2011 3:53:55 pm
List:org.georgialibraries.list.open-ils-dev

Hello Mike,

Thank you for the detailed response. I'll *try* to keep this reply more brief, and see if I can highlight a few things which still concern me (and places where I was not clear).

It makes good sense, but I think we could ultimately benefit by putting less emphasis on a bib record point of view and tilting things a bit more towards the item point of view.

I don't see why items (particularly, barcoded physical items) should be the focus. On the other hand, records are the core of a bibliographic system -- everything else (items and their attributes, access restrictions for electronic resources, libraries-as-places, etc) is a set of filter axes to apply atop the record when searching for or manipulating it. The record is the nucleus, and everything else enhances/vivifies/subdivides it.

I do not agree with this, at least not entirely. Bibliographic records are very important, but that is in large part due to the current reality and how we got here. I think we can agree that libraries exist to organize and provide access to content (via 'items', whether physical or digital). A monolithic record is a convenient descriptive tool, but not the only one, and in the future may not be the best one. Slightly loosening the link between items and records may be just one small step forward.

From the item perspective these proposals are modeling the same thing, a mapping of items to contents, and the fewer ways we have

to

do that, the better (as long as we cover all the cases). With a simpler mapping table of item to record(-part), we easily traverse in either direction, and we have ultimate flexibility.

But I contend that we don't traverse in either direction for a given relationship, nor do we have a need for ultimate flexibility at the cost of complexity. (I'm not referring to schema complexity here, but code complexity -- the need for inferences that will certainly come from commingling aggregation and dis-integration.)

The direction of traversal is critical, and should be ensconced in the schema. Not only does this make the code for each function simpler (we don't have to infer a relationship, it's dictated by the fact that we're using part or multi-home) but it models what libraries actually do: barcode parts of a work (volumes, disks, etc); or, collect manifestations of many records into a big binder (or e-reader) with one barcode on the outside.

So, if I have a bib record on my screen, and I ask the question, "which items' contents does this record represent?", we

can

simply go record->part(s)->item(s).

IMO, the question one would ask is, "what and where are the things (nominally, barcoded physical items) that contain what I describe?" ISTM that it's very important to know, and perhaps even critical for efficient workflow, to list separately subsets of what a record describes (parts) and "bound" items that contain the described work along with others. With a unified map there's no mechanism other than a magic value (or a human hoping to interpret another human's label correctly) to distinguish these concepts. With parts and multi-home separate, it's obviously a natural property.

***EDIT*** I might (finally!) understand your perspective, see paragraph near the end ***/EDIT*** I think this is the point where I am missing something. Why distinguish the concepts? I think all we need to model is the concept of "contains" (copy contains part). If we are dealing with a record/part, we can list the copies which contain it, and if we are dealing with a copy, we can list what it contains. What is the source of the ambiguity? We dis-integrate first (where needed), then aggregate the parts.

On the other hand, if I have an item, and I ask the question "what are the contents of this item?", we can go item->part(s)->record(s). Naturally we can traverse related records (via items) and related items (via records/parts) as well.

This is directly supported by multi-homed items, with the exception that you do need to look at the call number to get the primary record. I don't see a practical drawback to this, since that's what the code already does, and will still have to do as long as the record field exists on asset.call_number (null-ability or elimination of which is mentioned below).

This also eliminates the primacy of call numbers when managing items, which I see as a benefit.

There are three problems I see here:

* Call numbers will always have a first-class billing, regardless of how they're implemented, since they represent something physical. Two things, actually: the location in a range of other items (shelf order and position), and a tag pasted to the spine of the item. * I can't see any obvious benefit to eliminating the record<->call_number link (mentioned directly below, and intimated here) * The mounds and mounds of code that assume and depend on the existence of the record->call number->copy hierarchy that will instantly break

The immediate benefit to breaking this link is that an item (and by association its call number) can now fully exist in the context of any record which describes it, even if only in part. We could transition by using code which builds the current hierarchy dynamically (that is, go record->copy->call_number, then attach the call number to the current record context). So if item 12345 with Call Number ABC123 is linked through the contents map to both Record A and Record B, when viewing A we see:

Record A --ABC123 ----12345

and of course the same with B:

Record B --ABC123 ----12345

The item row might somehow indicate its 'special'-ness (which is going to be needed in some way regardless), but would be otherwise transparent. It is also not strictly necessary to null the call_number.record_id value, as we can just as easily overwrite it temporarily as needed, and it could be a useful fallback.

Or stated more simply, I feel our foundational assumptions in relating items to records should be: 1) Records describe contents 2) Items contain contents 3) Item content boundaries can overlap record content boundaries in various ways

I see this as an oversimplification from the conceptual point of view -- it fails to recognize that the arity of the relationship (which I call direction) is important and different. IOW, record<<->item serves a completely different function from record<->>item, and forcing them both through a record<<->>item relational model does both a disservice.

I can't agree with "completely" different, and if you view the record/part as a sort of really expressive tag of some kind, I feel like they are not so different at all.

All that said, I know from experience to trust your judgement (most of the time ;). For my own future benefit, do you have cases already in mind where this flexibility would end up causing 'split-brain' logic? (Or maybe I

have

a

split brain...)

Split-brain is probably a misnomer ... we have to commingle the logic for aggregation and dis-integration (disaggregation?) wherever we use either.

From a practical point of view, here are some more random-ish thoughts that don't seem to fit directly into this response elsewhere ... ;)

When going from records to items (via the Monograph Parts infrastructure as described), we need to be able to name label the subdivision that the part represents in relation to the record as a whole -- we need to be able to say "barcode X contains only Volume 1 of the content described by record A". This is not something we need to do for binding in the general case (note, however, that you can indeed use both at the same time -- multi-home and parts -- to get the effect of "volume 1 of record A is bound with some other things").

Correct me if I am wrong, but you can only do this if that "some other thing" is not another part. So if, for instance, I have a set of books, each with a different record, and each including a 'CD supplement', I cannot create a copy which is a binder containing all the CD supplements. Or, if I have a multi-volume work in two languages, I cannot bind the English and French V.1s (etc.) together. Or if I buy a few e-book Bibles, I cannot put all the Old Testaments on reader 1 and all the New Testaments on reader 2. These limits are a direct result of one-part-per-copy, and multi-home doesn't change that, does it?

Also, the only purpose of the record-to-item path is to dis-integrate the record into constituent, separately barcoded items, so there is only one relationship type.

However, going in the other direction, from items to records (via Multi-homed Items as described) we do not need a label -- what we need instead is a /reason/ for the relationship. Bound-with, e-reader, etc. IOW, there are multiple potenial causes for the relationship being created.

With the possible exception of bilingual, it seems to me that the records themselves have no special relationship, but rather that the relationship only exists at the item level. As such, we don't actually need a reason. These labels can usefully describe the character of an item, so it makes sense to include them as a copy attribute if one does not wish to make a new item type.

Not surfacing these differences explicitly (in my case, by using separate, though admittedly superficially similar mechanisms) is inviting trouble down the road, IMO.

Now (fastforwarding to your schema outline below), IIUC, what you're attempting to do with the copy_type table is to have a magic value of "Multi-part" inform us that the direction is from record to item, and all others are the other direction. From a bibliographic point of view this is incorrect -- it's not the copy that is Multi-part, it is the record. From a normalization point of view, this is not modeling reality IMO, and because it uses magic rows in a table it's brittle against DML.

That was not my intention. The copy_type does not need to be set at all, other than for convenience of labeling as I noted above. "Multi-part" is not intended as magic, just a generic way to say "this item shows up on more than one record, but the reason why can't be neatly expressed in a label" (and maybe not the best choice of term at that, especially since I used the word 'part' (multi-record'?)). Probably should have left it out!

Also, I think this quote from Elaine deserves a bit more attention:

I'm particularly interested in how this would function in a consortium like PINES where different libraries might process a multipart set differently. For example, one library might process and circulate a 3 part DVD set as one item, where another might put each in a separate container with a separate barcode.

If we want the complete-set copy from Library A to conclusively fulfill a P-level hold from Library B, we will want to allow multiple parts per copy.

Or

am I missing something?

You're not ... I interpreted what she was saying differently (that different libraries would be /able/ to spit records along different lines), and I see what you're saying. We could allow a copy to belong to multiple parts (it's a trivial change to the schema), but it would be the responsibility of the cataloger with the item in hand to make sure that the copy is in the appropriate parts -- not hard, except that some parts may not exist yet. ;) (And, of course, this existential problem exists no matter the scheme*.)

I was not expecting that libraries in the same system could divvy up the record differently, but rather that the parts would be set globally at reasonable common denominator and then assembled locally as needs dictated. I am certainly fine with allowing local divvying to happen, but by not even allowing multiple parts per copy, we are effectively forcing an immediate choice between local part-bundling practice and accurate resource sharing.

Converting from one part per copy to multiple is simple at the database level, and would be nearly trivial in higher level code, but until we have use in the field I think it's a solution without a problem, because of the cataloging overhead of trying to keep every copy current across all parts as parts are added to a bib when each library adds their own subdivision scheme for the bib. For that reason I left it out explicitly. (*It also invites the desire for a "collection of parts" concept that is a much bigger, and more importantly, controversial project. That too, though, is not barred from the future with the design as it stands.)

Finally, for those it may help, here is a quick version of a simple item-record schema. The part concerning copy_type is optional, but I

wanted

it

to show a more complete replacement for the proposed tables:

CREATE TABLE biblio.part ( id SERIAL PRIMARY KEY, record BIGINT NOT NULL REFERENCES biblio.record_entry (id), label TEXT NOT NULL, label_sortkey TEXT NOT NULL, CONSTRAINT record_label_unique UNIQUE (record,label) );

CREATE TABLE asset.copy_contents_map ( id SERIAL PRIMARY KEY, --record BIGINT NOT NULL REFERENCES biblio.record_entry (id), --optional path to partless items, or we force records to have at least one part part INT NOT NULL REFERENCES biblio.part (id) ON DELETE CASCADE target_copy BIGINT NOT NULL -- points to asset.copy );

CREATE TABLE asset.copy_type ( id SERIAL PRIMARY KEY, name TEXT NOT NULL UNIQUE -- i18n );

INSERT INTO asset.copy_type (name) VALUES (‘Bound Volume’), (‘Bilingual’), (‘Back-to-back’), (‘Set’), (‘Multi-part’); --generic type

-- ALSO: -- asset.copy grows a nullable reference to asset.copy_type

-- asset.call_number.record is nullable (should be null for new-style copies)

Given the codebase, that will be a large and separate project, if ever undertaken, and is not something we can look at now if we want anything discussed here to happen in a near-term release. I won't discount it out of hand for all time, just for this time. ;)

While it took me a (long) while to realize it, I think the source of our disagreement may be what I will call the "is-ness" factor. Does a bib record tell us what an item *contains*, or does it tell us what an item *is*? Well, traditionally it tries to do both, and it has always been a problem. I am unwittingly assuming that describing contents matters more and more, and describing containers matters less and less. Doing so makes it difficult to truly represent a content-less container record (like an e-book reader record), but if we no longer need such things (because the item already appears wherever the contents are described), maybe it is not such a loss.

I understand that my perspective is not always (ever?) the most realistic. My aim is only to try to encourage a little more pain now if it even *might* save us from greater pain in the future. Since I know you are a speedy and tireless worker, it may be best at this point to just wait and see the code, which will probably illuminate for me some of the issues I don't yet see.

Dan

--miker

Dan

--miker

Dan

--

Daniel Wells, Library Programmer Analyst db@calvin.edu Hekman Library at Calvin College 616.526.7133

On 2/15/2011 at 3:09 PM, Mike Rylander <mryl@gmail.com> wrote:

I'll be starting work on an implementation of Monograph Parts (think: DIsks 1-3; Volume A-D; etc), well, right now, in a git branch that I'll push to

http://git.esilibrary.com/?p=evergreen-equinox.git;a=summary

but I wanted to get the basic plan out there for comment. So, attached you'll find a PDF outlining the plan. Comments and feedback are welcome, but time for significant changes is slim.

This is not intended to cover every single possible use of the concept of Monograph Parts, but I believe it is a straight-forward design that offers a very good code-to-feature ratio and should be readily used by existing sites after upgrading to a version containing the code.