atom feed7 messages in org.koha.lists.koha-devel[Koha-devel] Searching and ILL (from:...
FromSent OnAttachments
Joshua FerraroJun 22, 2005 8:43 am 
MJ RayJun 26, 2005 10:47 am 
Stephen HedgesJun 27, 2005 5:12 am 
Joshua FerraroJul 28, 2005 5:45 am 
MJ RayAug 7, 2005 10:32 am 
Joshua FerraroAug 7, 2005 11:16 am 
MJ RayAug 9, 2005 2:57 pm 
Subject:[Koha-devel] Searching and ILL (from: Searching Group Meeting Notes)
From:Joshua Ferraro (jm@liblime.com)
Date:Jul 28, 2005 5:45:21 am
List:org.koha.lists.koha-devel

Hi all,

Sorry to delay my response to MJ's original message. I hope what I have to say is worth the wait ...

I've been thinking about the discussions we've had over the past couple months about Zebra, CQL, SRW/SRU, Google Query Syntax, Opensearch, RSS, RDF, Metasearching, OAI-PMH etc. and I've organized my thoughts a bit in the hopes that we can begin planning implementation directions.

I think we all agree on a few things:

1. Koha needs a query syntax

2. Koha databases should be open to external sources

3. Koha's OPAC should have union catalog / metasearching / federated searching capability and integrated ILL (inter-library loan)

4. Koha should allow queries to be syndicated via some form of RSS

Where we get hung up is actual implementation details. So RDF vs RSS 2.0 vs. Atom; OpenSearch vs. SRW/U vs. OAI-PMH; CQL vs. the still unnamed ':' syntax adopted by so many search engines and described by MJ in an earlier email.

So here's what I propose:

1. Koha needs a query syntax A: CQL 'Common Query Language'

"Traditionally, query languages have fallen into two camps: Powerful and expressive languages which are not easily readable nor writable by non-experts (e.g. SQL, PQF, and XQuery), on one hand; one the other hand, simple and intuitive languages not powerful enough to express complex concepts (e.g. CCL or google's query language). CQL's goal is to combine simplicity and intuitiveness of expression with the richness of Z39.50's type-1 query. As any good text based interface, CQL is intended to 'do what you mean' for simple, every day queries, while allowing means to express complex concepts when necessary."

Examples of CQL queries:

cat title = fish title exact fish cat or dog cat not frog a or b and c not d "fish food" prox/unit=sentence

More examples at:

http://www.loc.gov/z3950/agency/zing/cql/sample-queries.html

CQL is a mature, well-defined, and easy-to-use syntax for searching library catalogs and other sources; and support for it comes with Zebra automatically. The downside is it's not very widely implemented (it's new still). I propose that Koha formally adopt CQL as the default syntax for searching: looking forward it's going to be the next Z39.50 standard for library catalogs (and hopefully other search engines as well).

The main problem I have with going with MJ's suggestion is that we've not found a well-defined syntax definition. So we're not really sure how to do thinks like proximity searching or other more complex search syntax types.

If we can find a well-defined document describing this "googlish" syntax, it would be trivial to translate that syntax into CQL so that Koha can support both syntaxes within the main input box.

We should continue to support the 'advanced' search page for allowing patrons to perform complex queries without having to learn the syntax.

Finally, it's important to remember that although some users will use the syntax and advanced search, 99% probably won't. But that doesn't mean that it's not important to have the syntax. There's a (mostly) nicely written article in Library Journal that brings up some good points regarding research and the weaknesses of the keyword method: http://www.libraryjournal.com/article/CA623006.html

I don't agree with everything there but it's certainly worth some consideration..

2. Koha databases should be open to external sources

With Zebra, Koha will automatically be open to SRW/U and Z39.50. I see no harm in also including an OpenSearch gateway (the one I wrote is basically a OpenSearch->Z39.50 proxy). OpenSearch enables Koha's catalog to be searchable by A9's OpenSearch portal as well as other OpenSearch portals out there. So I propose that Koha support all three of the major standards for record sharing to maximize the number of clients that can access the database.

3. Koha's OPAC should have union catalog / metasearching / federated searching capability and integrated ILL (inter-library loan)

The goal here is to allow Koha maximum flexibility when selecting sources for searching for the metasearch interface. So we don't want to limit ourselves to the library world. So while ideologically OpenSearch may be flawed (RSS vs. RDF), the fact that it's so easy to implement (when compared to SRW/U for instance) means that lots of sources have appeared almost overnight. On the other hand, Z39.50 and SRW/U do allow more targeted searching. But SRW/U is not widely implemented and Z39.50 is limited to library sources. So I propose a three layer OPAC (at lease conceptually): front-end for syntax processing and interface design; a proxy to pick the correct protocol to use for searching; and a series of back-end search services that conform to the three major query resolvers (SRW/U, Z39.50, OpenSearch).

I also propose that we work together with the PINES project and possibly Amazon.com to extend the OpenSearch standard to include ranking, support for ILL, CQL in the query term, etc. Of course, PINES is open to this (I've been working with Mike Rylander on OpenSearch for a couple of months now) ... and it seems Amazon.com may be as well. Here's an excerpt from an email I recieved from Amazon.com regarding OpenSearch and SRW/U:

Thanks for your comments! We've been speaking with the people over at NISO that are responsible for the SRW/SRU specifications. There is a lot of value in there -- and we're definitely interested in making OpenSearch a useful tool for as many people as possible.

In fact, the effort to define OpenSearch 2.0 is already under way. We recently launched a blog over at blog.a9.com, and over the next several weeks I will be posting about our plans for future versions of OpenSearch and soliciting community involvement. It would be great if you could add your thoughts on the blog when I post about where we'd like to see version 2.0 go.

In fact, I'll be posting later today about OpenSearch 1.1. This point release won't break back-compatibility, so it won't have most of the new features that you are referring to, but it is a good starting point for discussion.

I really appreciate your work with OpenSearch, and hope that you don't hesitate to contact me directly with any ideas that you have about the project.

4. Koha should allow queries to be syndicated via some form of RSS

The best way I can address the RSS 0.9/1.0(RDF) vs. RSS 2.0 is in the context of MJ's comments ... so here goes:

On Sun, Jun 26, 2005 at 06:46:11PM +0100, MJ Ray wrote:

= Summary =

Resource Description Framework is popular with librarians and RDF Site Summary is RSS 1, which is not the same as Really Simple Syndication (RSS 2). RDF Site Summary versions are 1.x and Really Simple Syndication are 2.x, so many developers go for the higher number and never mind the different words. I'm surprised it's happened in koha-devel, as RDF is popular with librarians and information scientists, who are using it to help build the Semantic Web, which is where this talk of distributed searching seems to be heading.

I think RSS 1 already has solved some of the problems facing us if we use opensearch, I think more RDF use could open interesting applications for koha and I think RSS 2's namespace problem is a pain.

Good summmary ... thanks for that.

= The Namespace Problem =

The problem is that the spec RSS 2 says "the elements defined in this document are not themselves members of a namespace" and while that looks like a really smart idea to simplify parsing, it makes a few processes and applications difficult. There are these elements, floating around without a namespace, disconnected and trying to claim to be the root in any file containing RSS 2 elements.

Basically, imagine writing a large perl system without using modules at all, putting it all in the global namespace. Yes, it used to be done and can still be done, but most people don't do it any more. Why don't we do it? Isolation. It helps to keep things in neat little units, making it easier to test and easier to change one with less risk of messing up the others. I know we're still not very good at unit testing koha modules, but can everyone agree with the general idea it's better we use modules than have it all in one big flat namespace?

I grok the Perl analogy and I agree that RSS 2 namespaces aren't ideal. The problem is that OpenSearch is widely adopted and if we want to tap into those sources we'll need an OpenSearch search and retrieval engine.

= Problems Already Solved =

Also interesting for libraries is the availability of the Dublin Core metadata elements in an RSS 1.0 main module. A lot of the things opensearch is trying to do have already been in RDF Site Summary for years, such as returning metadata appropriate to search results. Look at the mod_search module - what do we need to do that isn't already developed by the XML-DEV hackers?

We need to tap into the available sources using other standards instead of just focusing on library-specific search applications.

= Interesting Applications =

Almost certainly, ILL is one thing I've not seen yet. I think the OpenIll namespace is interesting and should be used, especially if we can build bridges to other system developers. I'm not sure what should be in there, as I've not done much with ILL. I hope that it can be used alongside RDF and maybe be more general for it, linking with Dublin Core and other useful namespaces. Is that possible?

Absolutely. I think that Koha's metasearching should definitely support searching of DC and related namespaces.

= Other Parts of OpenSearch =

So, if we avoid having OpenSearch Really Simple Syndication in the koha's core (use a translator or something loosely coupled), that leaves the query and description parts of OpenSearch. I wondered whether we can convince other library systems to put a <link rel="index" type="application/xml+rdf" href="..." /> tag or similar in their page's head. Then configuring an "external searches" setting in koha's parameters could be as simple as cut-and-pasting or drag-and-dropping URLs, with koha figuring out the details from that (actually, we could probably do some from a search form... but that's getting far too clever for now).

I think maybe the COinS project may do something like this. Here's a recent email on the web4lib list:

A group of us have been working to crystallize a spec for putting OpenURL metadata into HTML (following on a paper by Dan Chudnov and friends http://www.ariadne.ac.uk/issue43/chudnov/. )

Ross Singer came up with a catchy name for this: "COinS", short for ContextObject in Span. After a bunch of trials, we've declared it "stable enough for implementation", and put the spec at http://ocoins.info/

Version 2.0 of our OpenURL Referrer Firefox plugin adds support for OpenURL COinS; we hope that soon there will be many other ways that COinS can be put to use, as well as many sites that support COinS. So far there is an open-access journal, the Wikipedia Book sources page, Peter Binkley's Blog and a few static web pages demonstrating how it may be used.

Eric Hellman

The main attraction of opensearch seems to be that it lets your results appear on A9. I've yet to meet many A9 users: do any other search engines use opensearch yet?

Yes ... lots. Peruse through the 'columns' section of the opensearch.a9.com site and you'll see many many search engines that have adopted the standard. So in my view, the main attraction of opensearch is that so many search engines have (and will) adopt it because it's simple to implement and works 'well enough' to get the job done 99% of the time. The fact that Koha catalogs can show up in a A9 search is secondary to me.

Also, given their past, have Amazon said that it is patentless? If it was through some loose part, it wouldn't be too painful if it's unusable by some later.

I'm curious about the patent issues ... I'll contact Amazon and find out.

So ... I know that was long. I hope you made it this far. Please give me some feedback. I'm not trying to polarize the discussion so if you've got points to make please say them and I'll do my best to understand and then respond ...

Cheers,