atom feed130 messages in edu.ku.nhm.mailman.taxacomRe: [Taxacom] validation of taxon names
FromSent OnAttachments
Armand TurpelFeb 14, 2012 12:41 am 
Stephen ThorpeFeb 14, 2012 1:01 pm 
Armand TurpelFeb 15, 2012 12:59 am 
Roderic PageFeb 15, 2012 1:50 am 
Paul KirkFeb 15, 2012 1:55 am 
greg whitbreadFeb 15, 2012 5:05 am 
Chris ThompsonFeb 15, 2012 8:03 am 
David PattersonFeb 15, 2012 8:07 am 
Roderic PageFeb 15, 2012 9:12 am 
Doug YanegaFeb 15, 2012 10:40 am 
Paul KirkFeb 15, 2012 10:51 am 
Roderic PageFeb 15, 2012 11:23 am 
Armand TurpelFeb 15, 2012 11:39 am 
Stephen ThorpeFeb 15, 2012 12:47 pm 
Jim CroftFeb 15, 2012 1:06 pm 
Curtis ClarkFeb 15, 2012 5:49 pm 
Tony...@csiro.auFeb 15, 2012 7:19 pm 
Roderic PageFeb 15, 2012 10:18 pm 
Jim CroftFeb 15, 2012 10:29 pm 
Armand TurpelFeb 16, 2012 5:12 am 
Roderic PageFeb 16, 2012 8:24 am 
Doug YanegaFeb 16, 2012 9:45 am 
Chuck MillerFeb 16, 2012 11:57 am 
Bradley BoyleFeb 16, 2012 2:45 pm 
Richard ZanderFeb 16, 2012 3:10 pm 
Stephen ThorpeFeb 16, 2012 3:24 pm 
Frederick W. SchuelerFeb 16, 2012 3:31 pm 
Chuck MillerFeb 16, 2012 4:14 pm 
Stephen ThorpeFeb 16, 2012 4:28 pm 
Chris ThompsonFeb 16, 2012 7:05 pm 
Kim van der LindeFeb 16, 2012 7:13 pm 
Neal EvenhuisFeb 16, 2012 7:27 pm 
Kim van der LindeFeb 16, 2012 7:38 pm 
Stephen ThorpeFeb 16, 2012 7:40 pm 
muscapaulFeb 17, 2012 12:16 am 
Dr Brian TaylorFeb 17, 2012 12:23 am 
Armand TurpelFeb 17, 2012 1:25 am 
Chris ThompsonFeb 17, 2012 7:03 am 
Chris ThompsonFeb 17, 2012 11:23 am 
Stephen ThorpeFeb 17, 2012 1:09 pm 
Chris ThompsonFeb 17, 2012 2:04 pm 
Tony...@csiro.auFeb 17, 2012 2:32 pm 
Richard PyleFeb 17, 2012 2:51 pm 
Stephen ThorpeFeb 17, 2012 3:16 pm 
Stephen ThorpeFeb 17, 2012 3:18 pm 
Richard PyleFeb 17, 2012 3:22 pm 
Stephen ThorpeFeb 17, 2012 3:37 pm 
Richard PyleFeb 17, 2012 5:06 pm 
Tony...@csiro.auFeb 17, 2012 5:18 pm 
Stephen ThorpeFeb 17, 2012 5:27 pm 
Curtis ClarkFeb 17, 2012 7:39 pm 
Stephen ThorpeFeb 17, 2012 8:04 pm 
Richard ZanderFeb 18, 2012 9:26 am 
Richard ZanderFeb 18, 2012 9:59 am 
Richard PyleFeb 18, 2012 11:33 am 
Curtis ClarkFeb 18, 2012 6:45 pm 
Richard PyleFeb 18, 2012 8:59 pm 
Paul van RijckevorselFeb 19, 2012 12:36 am 
Roderic PageFeb 19, 2012 5:48 am 
Paul van RijckevorselFeb 19, 2012 7:36 am 
Roderic PageFeb 19, 2012 8:09 am 
Paul van RijckevorselFeb 19, 2012 8:58 am 
Curtis ClarkFeb 19, 2012 8:59 am 
Curtis ClarkFeb 19, 2012 9:49 am 
Frederick W. SchuelerFeb 19, 2012 10:29 am 
Richard PyleFeb 19, 2012 12:14 pm 
Stephen ThorpeFeb 19, 2012 12:45 pm 
Bob MesibovFeb 19, 2012 2:23 pm 
Walker, KenFeb 19, 2012 2:36 pm 
Stephen ThorpeFeb 19, 2012 2:38 pm 
60 later messages
Subject:Re: [Taxacom] validation of taxon names
From:Armand Turpel (arma@gmail.com)
Date:Feb 16, 2012 5:12:15 am
List:edu.ku.nhm.mailman.taxacom

Dear Roderic,

I have some doubts while reading the architecture description here:

http://www.biomedcentral.com/1471-2105/6/48

Amongst the problems your described, my experience is that web services are slow and technically difficult to maintain. I don’t think that such a service can handle my demand in the initial post of this thread. But there is a tool which is may worth to mention. Document based databases such as couchdb can solve a lot of those problems. Here an excerpt from the couchdb site:

“CouchDB is a peer based distributed database system. Any number of CouchDB hosts (servers and offline-clients) can have independent “replica copies” of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes are replicated bi-directionally...”

An other advantage of such a system is that it serves also as an application server from which applications can be replicated. Setup and maintaining a network based on this needs much less effort.

What do you think? As far as I know you have worked with couchdb:

http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html

a+

arm

2012/2/16 Roderic Page <r.p@bio.gla.ac.uk>

Dear Tony,

Yes, in many ways we are close, we just haven't put the bits together in one highly visible place that people can use.

I used to run the "Taxonomic Search Engine" described in http://dx.doi.org/10.1186/1471-2105-6-48 , which was an example of a federated search engine that queried multiple sources on the fly (a bit like http://botany.si.edu/ing/ but I used web services and extracted and reformatted the data rather than show results as web pages inside frames).

The two biggest problems were the source services going offline, or changing their web services (breaking my code). There also performance issues when doing "live" searching as opposed to searching a local list. You've covered all these issues nicely. There's also the problem of redundancy. Because some lists are aggregations of other lists, we can end up with the same source being represented several times in the results, but without the user being aware of this.

I just think there's clearly scope for bringing these sorts of services together in one place and providing people with some tools to address the problems they face, rather than the chaotic landscape we present at the moment.

Regards

Rod

On 16 Feb 2012, at 03:19, <Tony@csiro.au> <Tony@csiro.au> wrote:

Dear all,

My take on the issue/s below...

Basically what is sought is a taxonomic name reconciliation service (TNRS) - now, where have I heard that before...

In my mind this comprises three coupled components:

(1) a web (human and machine) interface to search a "master list" or global taxon register (2) a fuzzy match component to cope with misspelled queries (3) the master list itself.

Prototypes of just such a system have been in existence for several years, notably: - the GRIN Taxonomic Nomenclature Checker

http://pgrdoc.bioversity.cgiar.org/taxcheck/ (the granddaddy of such systems), operating over the GRIN database for higher plants

over many/most genus namesplus species names from catalogue of Life in the main, and more recently

- the iPlant TNRS http://tnrs.iplantcollaborative.org/TNRSapp.html, operatin g over the TROPICOS plant database - the World Register of Marine Species portal http://www.marinespecies.org/aphia.php operating over its own data, and - the Australian "National Species Lists" search service

http://biodiversity.org.au/service/taxamatch (working over different master lists as their reference database) as mentioned by Greg; plus no doubt others I have not mentioned.

So we could say that the technical aspects of (1) and (2) are basically

not a problem. The residual problem is the construction of the (actual or virtual) "master list". An example of a virtual "master list" (for plants, at genus level only) is provided by the Index Nominum Genericorum (ING) portal at http://botany.si.edu/ing/ : entering a query in the first text box does a real time distributed search of designated resources which between them provide a (partly overlapping) coverage of most/all plant genus names, comprising ING itself, IPNI, Index Nominum Algarum, TROPICOS, and Index Fungorum in this instance (however without any fuzzy match function). This is one approach; benefits are the removal of the need for one site to hold all the names to search over, plus the removal of any synchronization issues between a remote "point of truth" for the records and a central cache of the data. Downsides are the fact that deduplication/data harmonization of potential duplicates from multiple sources is not done; multiple taxonomic concepts/hierarchies may be in use at the various providers; it is difficult to provide fuzzy search over remotely hosted data; plus if any provider is off line at time of query, their data are not searched.

The alternative is for remote provider content to be regularly crawled

or exported, then cached centrally for the search process. This also provides the option for additional QA / data deduplication and harmonization and also notionally improved performance (provided that sufficient resources can be thrown at the one machine where the searches execute in real time). Disadvantages then include the need for continuous assembly and reassembly of the aggregate dataset, and the possibility of the central "view" of the data being out-of-synch with the latest changes at the provider; but something which can be managed in the main (as is already done my many similar "aggregators" of species distribution records).

So the residual question is, where are all the data - also in some

cases, whose content is the most up-to-date / complete / authoritative / accepted in case of potential multiple sources; plus of course, filling gaps where currently there is no obvious source of content for a particular group or region. These are questions which concern key projects at the present time, exemplified by the Catalogue of Life partnership (Sp2000 plus ITIS) for extant data, other sources for data on fossils, and the "Global Names" partnership, to name a few. Others more qualified than myself can answer this question and look at associated issues of resourcing, persistence, data completeness, data sharing culture and the like but at least what we have is a start...

So the upshot of the above is:

- For plants, GRIN and iPlant already provide most of the desired functionality (also ING distributed search for genera) - for marine species, try WoRMS - for (e.g.) Australian species, Greg's "NSLs" project resources as

above, for Europaean species, PESI http://www.eu-nomen.eu/portal/, and so on

- for Cat. of Life species, plus genera from ING (plants) and Nomenclator Zoologicus (animals) plus elsewhere, my own IRMNG.

None of the above resources are as yet complete or completely populated

(in my case, definitely not...) however they are not only pointers along the road but useable resources today.

How do we get to where we want to be? Improve the master list, keep it

up-to-date, continuously improve the quality and completeness of the accessible data... But it does require a "client focus" which provides strong directions for the types of services to be provide, and their actual useability when accessed. (Who has the mandate / who pays are different issues of course).

Probably little of this is news to Taxacomers, but I thought I would

just show that it is not all gloom and doom. And this is not to mention the myriad (and often excellent) taxon-specific database projects out there, of which Paul Kirk and Chris Thompson have already mentioned shining examples, to name but two... - most of which are already engaged in either Sp2000, Global Names Architecture, or both.

Regards - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

-----Original Message----- From: taxa@mailman.nhm.ku.edu [mailto:taxacom- boun@mailman.nhm.ku.edu] On Behalf Of Roderic Page Sent: Thursday, 16 February 2012 6:24 AM To: taxacom Subject: Re: [Taxacom] validation of taxon names

Dear Doug,

Regarding

#4 is something that cannot be objectively determined, because synonymy is almost invariably subjective.

presumably the fact that person x asserted that two names are synonyms can be determined objectively, and that's all we need to know.

Regards

On 15 Feb 2012, at 18:40, Doug Yanega wrote:

I would observe that, for zoological names, of the following list:

1. Is this a name? 2. Is this the correct way to write it? 3. Is this name currently in use? 4. What other names are related to this name (e.g., synonyms, lexical variants)? 5. Where was this name published? Can I see that publication?

at least 1 and 5 are questions for which an objective and definitive answer (via application of the ICZN for #1) can be arrived at, and that the answer will not change. Thus, these are things which could be made part of a permanent public archive (hopefully, something like ZooBank).

#2 and 3 are things that can, in essence, be objectively determined under the Code, but are subject to the nuance of "prevailing usage" - that is, a sudden change in how taxonomists treat a name can shift the answer from "no" to "yes" (in both cases) or from "yes" to "no" (for #2). One hope that I have is that a mechanism for Registration can be implemented in the future which will prevent such fluctuation, and thus make the answers to 2 and 3 immutable, as well.

#4 is something that cannot be objectively determined, because synonymy is almost invariably subjective.

Realistically, then, this list represents a mixed bag of the immediately attainable, the potentially attainable, and the unattainable. It might be more productive to focus on the former categories, in terms of a community-wide goal. I'll further note that if taxonomists want a system of Registration that will result in permanently stable names, then they are probably going to have to insist upon it, *and* be willing to participate in the process (because such a process is likely to require public review). I'm not 100% sure whether botanical names would work exactly the same way, but I expect that the situation would be pretty much the same.

Peace,

Doug Yanega Dept. of Entomology Entomology Research Museum Univ. of California, Riverside, CA 92521-0314 skype: dyanega phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's) http://cache.ucr.edu/~heraty/yanega.html "There are some enterprises in which a careful disorderliness is the true method" - Herman Melville, Moby Dick, Chap. 82

--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK

Email: r.p@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodp@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

_______________________________________________

Taxacom Mailing List Taxa@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as: site: mailman.nhm.ku.edu/pipermail/taxacom your search terms here

_______________________________________________

Taxacom Mailing List Taxa@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these
methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom
your search terms here