|Mike Rylander||Feb 15, 2011 12:08 pm|
|Dan Wells||Feb 15, 2011 1:35 pm|
|Hardy, Elaine||Feb 16, 2011 6:28 am|
|Mike Rylander||Feb 16, 2011 9:36 am|
|Mike Rylander||Feb 16, 2011 9:55 am|
|Hardy, Elaine||Feb 16, 2011 10:44 am|
|Mike Rylander||Feb 16, 2011 12:09 pm|
|Dan Wells||Feb 16, 2011 1:43 pm|
|Mike Rylander||Feb 16, 2011 5:27 pm|
|Hardy, Elaine||Feb 17, 2011 11:14 am|
|Hardy, Elaine||Feb 17, 2011 11:34 am|
|Mike Rylander||Feb 17, 2011 1:05 pm|
|Dan Wells||Feb 17, 2011 3:53 pm|
|Mike Rylander||Feb 18, 2011 8:53 am|
|Hardy, Elaine||Feb 21, 2011 11:03 am|
|Subject:||Re: [OPEN-ILS-DEV] Monograph Parts|
|From:||Mike Rylander (mryl...@gmail.com)|
|Date:||Feb 16, 2011 5:27:19 pm|
First, thanks, Dan, for taking the time to consider the design. I know we've been down this road before, and I'm committed to moving this functionality forward, so I've made some cuts and simplifications to the earlier grand designs we've discussed. Also, having ruminated on this for quite a while in the context of practice, and discussions about day-to-day needs from a relatively broad set of libraries, my thoughts have crystallized fairly well. That's what's below -- a cold, technical, emotionless review of my thinking based on (what I understand to be) working needs and practice, cast in the need for some expedience and to spec projects that can be undertaken within a single dev/release cycle. Can we do more to generalize the architecture? Indeed we can, and should, over time. And thanks for pushing for that.
Now let the wild rumpus begin! :)
On Wed, Feb 16, 2011 at 4:44 PM, Dan Wells <db...@calvin.edu> wrote:
On 2/16/2011 at 12:37 PM, Mike Rylander <mryl...@gmail.com> wrote:
On Tue, Feb 15, 2011 at 4:35 PM, Dan Wells <db...@calvin.edu> wrote:
At first glance I think this is a very welcome development, and I have a
just a few comments. First, I would advocate for some kind of a 'label_sortkey' on biblio.monograph_part. Even if all it did was pad numbers, it would solve 95% of 'part' sorting problems.
That's a very good idea ... I will make it so. Proposed initial algorithm: strip non-spacing marks, force everything to upper case, remove all spaces, left-pad numeric strings with '0' to 5 characters. Thoughts?
Sounds good to me.
(NOTE: I'm kinda loath to invent something like the call number classification normalizer setup for this, and I don't think that will work directly with these strings. And without some field testing we won't do a good jobs of covering our bases with anything non-trival.)
Second, and perhaps this was already discarded as a simplification measure,
but I think we should consider dropping the primary key on asset.copy_part_map.target_copy to allow for multiple parts per copy. This would not only better reflect reality in certain cases, but I think it could also lay some groundwork for future bound-with functionality (put 'part's on your boundwith records (or let the map point to a part *or* a record), then sever the link from call_number to record).
Before spec'ing out this, I'd already begun working up something separate to cover (among several other use-cases) bound-with. I'll post that soon (hopefully today). The short version of why I intentionally kept bound-with and monograph parts separate is that the former is about aggregating multiple bib records (several metadata records involved in one physical thing) and the latter is about dis-integration (one metadata records covering multiple physical things). While we /could/ design a subsystem that goes both ways, the implicit complexity (and split-brain logic) required outweighs both the design and maintenance simplicity of single-function infrastructure. I'm normally in favor of infrastructure re-use, but in this case the concepts being modeled have opposite purposes (from the bib record point of view).
Is that too ramble-y to make sense? ;)
It makes good sense, but I think we could ultimately benefit by putting less emphasis on a bib record point of view and tilting things a bit more towards the item point of view.
I don't see why items (particularly, barcoded physical items) should be the focus. On the other hand, records are the core of a bibliographic system -- everything else (items and their attributes, access restrictions for electronic resources, libraries-as-places, etc) is a set of filter axes to apply atop the record when searching for or manipulating it. The record is the nucleus, and everything else enhances/vivifies/subdivides it.
From the item perspective these proposals are modeling the same thing, a mapping of items to contents, and the fewer ways we have to do that, the better (as long as we cover all the cases). With a simpler mapping table of item to record(-part), we easily traverse in either direction, and we have ultimate flexibility.
But I contend that we don't traverse in either direction for a given relationship, nor do we have a need for ultimate flexibility at the cost of complexity. (I'm not referring to schema complexity here, but code complexity -- the need for inferences that will certainly come from commingling aggregation and dis-integration.)
The direction of traversal is critical, and should be ensconced in the schema. Not only does this make the code for each function simpler (we don't have to infer a relationship, it's dictated by the fact that we're using part or multi-home) but it models what libraries actually do: barcode parts of a work (volumes, disks, etc); or, collect manifestations of many records into a big binder (or e-reader) with one barcode on the outside.
So, if I have a bib record on my screen, and I ask the question, "which items' contents does this record represent?", we can simply go record->part(s)->item(s).
IMO, the question one would ask is, "what and where are the things (nominally, barcoded physical items) that contain what I describe?" ISTM that it's very important to know, and perhaps even critical for efficient workflow, to list separately subsets of what a record describes (parts) and "bound" items that contain the described work along with others. With a unified map there's no mechanism other than a magic value (or a human hoping to interpret another human's label correctly) to distinguish these concepts. With parts and multi-home separate, it's obviously a natural property.
On the other hand, if I have an item, and I ask the question "what are the contents of this item?", we can go item->part(s)->record(s). Naturally we can traverse related records (via items) and related items (via records/parts) as well.
This is directly supported by multi-homed items, with the exception that you do need to look at the call number to get the primary record. I don't see a practical drawback to this, since that's what the code already does, and will still have to do as long as the record field exists on asset.call_number (null-ability or elimination of which is mentioned below).
This also eliminates the primacy of call numbers when managing items, which I see as a benefit.
There are three problems I see here:
* Call numbers will always have a first-class billing, regardless of how they're implemented, since they represent something physical. Two things, actually: the location in a range of other items (shelf order and position), and a tag pasted to the spine of the item. * I can't see any obvious benefit to eliminating the record<->call_number link (mentioned directly below, and intimated here) * The mounds and mounds of code that assume and depend on the existence of the record->call number->copy hierarchy that will instantly break
Or stated more simply, I feel our foundational assumptions in relating items to records should be: 1) Records describe contents 2) Items contain contents 3) Item content boundaries can overlap record content boundaries in various ways
I see this as an oversimplification from the conceptual point of view -- it fails to recognize that the arity of the relationship (which I call direction) is important and different. IOW, record<<->item serves a completely different function from record<->>item, and forcing them both through a record<<->>item relational model does both a disservice.
All that said, I know from experience to trust your judgement (most of the time ;). For my own future benefit, do you have cases already in mind where this flexibility would end up causing 'split-brain' logic? (Or maybe I have a split brain...)
Split-brain is probably a misnomer ... we have to commingle the logic for aggregation and dis-integration (disaggregation?) wherever we use either.
From a practical point of view, here are some more random-ish thoughts that don't seem to fit directly into this response elsewhere ... ;)
When going from records to items (via the Monograph Parts infrastructure as described), we need to be able to name label the subdivision that the part represents in relation to the record as a whole -- we need to be able to say "barcode X contains only Volume 1 of the content described by record A". This is not something we need to do for binding in the general case (note, however, that you can indeed use both at the same time -- multi-home and parts -- to get the effect of "volume 1 of record A is bound with some other things"). Also, the only purpose of the record-to-item path is to dis-integrate the record into constituent, separately barcoded items, so there is only one relationship type.
However, going in the other direction, from items to records (via Multi-homed Items as described) we do not need a label -- what we need instead is a /reason/ for the relationship. Bound-with, e-reader, etc. IOW, there are multiple potenial causes for the relationship being created.
Not surfacing these differences explicitly (in my case, by using separate, though admittedly superficially similar mechanisms) is inviting trouble down the road, IMO.
Now (fastforwarding to your schema outline below), IIUC, what you're attempting to do with the copy_type table is to have a magic value of "Multi-part" inform us that the direction is from record to item, and all others are the other direction. From a bibliographic point of view this is incorrect -- it's not the copy that is Multi-part, it is the record. From a normalization point of view, this is not modeling reality IMO, and because it uses magic rows in a table it's brittle against DML.
Also, I think this quote from Elaine deserves a bit more attention:
I'm particularly interested in how this would function in a consortium like PINES where different libraries might process a multipart set differently. For example, one library might process and circulate a 3 part DVD set as one item, where another might put each in a separate container with a separate barcode.
If we want the complete-set copy from Library A to conclusively fulfill a P-level hold from Library B, we will want to allow multiple parts per copy. Or am I missing something?
You're not ... I interpreted what she was saying differently (that different libraries would be /able/ to spit records along different lines), and I see what you're saying. We could allow a copy to belong to multiple parts (it's a trivial change to the schema), but it would be the responsibility of the cataloger with the item in hand to make sure that the copy is in the appropriate parts -- not hard, except that some parts may not exist yet. ;) (And, of course, this existential problem exists no matter the scheme*.)
Converting from one part per copy to multiple is simple at the database level, and would be nearly trivial in higher level code, but until we have use in the field I think it's a solution without a problem, because of the cataloging overhead of trying to keep every copy current across all parts as parts are added to a bib when each library adds their own subdivision scheme for the bib. For that reason I left it out explicitly. (*It also invites the desire for a "collection of parts" concept that is a much bigger, and more importantly, controversial project. That too, though, is not barred from the future with the design as it stands.)
Finally, for those it may help, here is a quick version of a simple item-record schema. The part concerning copy_type is optional, but I wanted it to show a more complete replacement for the proposed tables:
CREATE TABLE biblio.part ( id SERIAL PRIMARY KEY, record BIGINT NOT NULL REFERENCES biblio.record_entry (id), label TEXT NOT NULL, label_sortkey TEXT NOT NULL, CONSTRAINT record_label_unique UNIQUE (record,label) );
CREATE TABLE asset.copy_contents_map ( id SERIAL PRIMARY KEY, --record BIGINT NOT NULL REFERENCES biblio.record_entry (id), --optional path to partless items, or we force records to have at least one part part INT NOT NULL REFERENCES biblio.part (id) ON DELETE CASCADE target_copy BIGINT NOT NULL -- points to asset.copy );
CREATE TABLE asset.copy_type ( id SERIAL PRIMARY KEY, name TEXT NOT NULL UNIQUE -- i18n );
INSERT INTO asset.copy_type (name) VALUES (‘Bound Volume’), (‘Bilingual’), (‘Back-to-back’), (‘Set’), (‘Multi-part’); --generic type
-- ALSO: -- asset.copy grows a nullable reference to asset.copy_type
-- asset.call_number.record is nullable (should be null for new-style copies)
Given the codebase, that will be a large and separate project, if ever undertaken, and is not something we can look at now if we want anything discussed here to happen in a near-term release. I won't discount it out of hand for all time, just for this time. ;)
Daniel Wells, Library Programmer Analyst db...@calvin.edu Hekman Library at Calvin College 616.526.7133
On 2/15/2011 at 3:09 PM, Mike Rylander <mryl...@gmail.com> wrote:
I'll be starting work on an implementation of Monograph Parts (think: DIsks 1-3; Volume A-D; etc), well, right now, in a git branch that I'll push to http://git.esilibrary.com/?p=evergreen-equinox.git;a=summary
but I wanted to get the basic plan out there for comment. So, attached you'll find a PDF outlining the plan. Comments and feedback are welcome, but time for significant changes is slim.
This is not intended to cover every single possible use of the concept of Monograph Parts, but I believe it is a straight-forward design that offers a very good code-to-feature ratio and should be readily used by existing sites after upgrading to a version containing the code.