| From | Sent On | Attachments |
|---|---|---|
| Armand Turpel | Feb 14, 2012 12:41 am | |
| Stephen Thorpe | Feb 14, 2012 1:01 pm | |
| Armand Turpel | Feb 15, 2012 12:59 am | |
| Roderic Page | Feb 15, 2012 1:50 am | |
| Paul Kirk | Feb 15, 2012 1:55 am | |
| greg whitbread | Feb 15, 2012 5:05 am | |
| Chris Thompson | Feb 15, 2012 8:03 am | |
| David Patterson | Feb 15, 2012 8:07 am | |
| Roderic Page | Feb 15, 2012 9:12 am | |
| Doug Yanega | Feb 15, 2012 10:40 am | |
| Paul Kirk | Feb 15, 2012 10:51 am | |
| Roderic Page | Feb 15, 2012 11:23 am | |
| Armand Turpel | Feb 15, 2012 11:39 am | |
| Stephen Thorpe | Feb 15, 2012 12:47 pm | |
| Jim Croft | Feb 15, 2012 1:06 pm | |
| Curtis Clark | Feb 15, 2012 5:49 pm | |
| Tony...@csiro.au | Feb 15, 2012 7:19 pm | |
| Roderic Page | Feb 15, 2012 10:18 pm | |
| Jim Croft | Feb 15, 2012 10:29 pm | |
| Armand Turpel | Feb 16, 2012 5:12 am | |
| Roderic Page | Feb 16, 2012 8:24 am | |
| Doug Yanega | Feb 16, 2012 9:45 am | |
| Chuck Miller | Feb 16, 2012 11:57 am | |
| Bradley Boyle | Feb 16, 2012 2:45 pm | |
| Richard Zander | Feb 16, 2012 3:10 pm | |
| Stephen Thorpe | Feb 16, 2012 3:24 pm | |
| Frederick W. Schueler | Feb 16, 2012 3:31 pm | |
| Chuck Miller | Feb 16, 2012 4:14 pm | |
| Stephen Thorpe | Feb 16, 2012 4:28 pm | |
| Chris Thompson | Feb 16, 2012 7:05 pm | |
| Kim van der Linde | Feb 16, 2012 7:13 pm | |
| Neal Evenhuis | Feb 16, 2012 7:27 pm | |
| Kim van der Linde | Feb 16, 2012 7:38 pm | |
| Stephen Thorpe | Feb 16, 2012 7:40 pm | |
| muscapaul | Feb 17, 2012 12:16 am | |
| Dr Brian Taylor | Feb 17, 2012 12:23 am | |
| Armand Turpel | Feb 17, 2012 1:25 am | |
| Chris Thompson | Feb 17, 2012 7:03 am | |
| Chris Thompson | Feb 17, 2012 11:23 am | |
| Stephen Thorpe | Feb 17, 2012 1:09 pm | |
| Chris Thompson | Feb 17, 2012 2:04 pm | |
| Tony...@csiro.au | Feb 17, 2012 2:32 pm | |
| Richard Pyle | Feb 17, 2012 2:51 pm | |
| Stephen Thorpe | Feb 17, 2012 3:16 pm | |
| Stephen Thorpe | Feb 17, 2012 3:18 pm | |
| Richard Pyle | Feb 17, 2012 3:22 pm | |
| Stephen Thorpe | Feb 17, 2012 3:37 pm | |
| Richard Pyle | Feb 17, 2012 5:06 pm | |
| Tony...@csiro.au | Feb 17, 2012 5:18 pm | |
| Stephen Thorpe | Feb 17, 2012 5:27 pm | |
| Curtis Clark | Feb 17, 2012 7:39 pm | |
| Stephen Thorpe | Feb 17, 2012 8:04 pm | |
| Richard Zander | Feb 18, 2012 9:26 am | |
| Richard Zander | Feb 18, 2012 9:59 am | |
| Richard Pyle | Feb 18, 2012 11:33 am | |
| Curtis Clark | Feb 18, 2012 6:45 pm | |
| Richard Pyle | Feb 18, 2012 8:59 pm | |
| Paul van Rijckevorsel | Feb 19, 2012 12:36 am | |
| Roderic Page | Feb 19, 2012 5:48 am | |
| Paul van Rijckevorsel | Feb 19, 2012 7:36 am | |
| Roderic Page | Feb 19, 2012 8:09 am | |
| Paul van Rijckevorsel | Feb 19, 2012 8:58 am | |
| Curtis Clark | Feb 19, 2012 8:59 am | |
| Curtis Clark | Feb 19, 2012 9:49 am | |
| Frederick W. Schueler | Feb 19, 2012 10:29 am | |
| Richard Pyle | Feb 19, 2012 12:14 pm | |
| Stephen Thorpe | Feb 19, 2012 12:45 pm | |
| Bob Mesibov | Feb 19, 2012 2:23 pm | |
| Walker, Ken | Feb 19, 2012 2:36 pm | |
| Stephen Thorpe | Feb 19, 2012 2:38 pm | |
| 60 later messages | ||
| Subject: | Re: [Taxacom] validation of taxon names | |
|---|---|---|
| From: | Armand Turpel (arma...@gmail.com) | |
| Date: | Feb 16, 2012 5:12:15 am | |
| List: | edu.ku.nhm.mailman.taxacom | |
Dear Roderic,
I have some doubts while reading the architecture description here:
http://www.biomedcentral.com/1471-2105/6/48
Amongst the problems your described, my experience is that web services are slow and technically difficult to maintain. I don’t think that such a service can handle my demand in the initial post of this thread. But there is a tool which is may worth to mention. Document based databases such as couchdb can solve a lot of those problems. Here an excerpt from the couchdb site:
“CouchDB is a peer based distributed database system. Any number of CouchDB hosts (servers and offline-clients) can have independent “replica copies” of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes are replicated bi-directionally...”
An other advantage of such a system is that it serves also as an application server from which applications can be replicated. Setup and maintaining a network based on this needs much less effort.
What do you think? As far as I know you have worked with couchdb:
http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html
a+
arm
2012/2/16 Roderic Page <r.p...@bio.gla.ac.uk>
Dear Tony,
Yes, in many ways we are close, we just haven't put the bits together in one highly visible place that people can use.
I used to run the "Taxonomic Search Engine" described in http://dx.doi.org/10.1186/1471-2105-6-48 , which was an example of a federated search engine that queried multiple sources on the fly (a bit like http://botany.si.edu/ing/ but I used web services and extracted and reformatted the data rather than show results as web pages inside frames).
The two biggest problems were the source services going offline, or changing their web services (breaking my code). There also performance issues when doing "live" searching as opposed to searching a local list. You've covered all these issues nicely. There's also the problem of redundancy. Because some lists are aggregations of other lists, we can end up with the same source being represented several times in the results, but without the user being aware of this.
I just think there's clearly scope for bringing these sorts of services together in one place and providing people with some tools to address the problems they face, rather than the chaotic landscape we present at the moment.
Regards
Rod
On 16 Feb 2012, at 03:19, <Tony...@csiro.au> <Tony...@csiro.au> wrote:
Dear all,
My take on the issue/s below...
Basically what is sought is a taxonomic name reconciliation service (TNRS) - now, where have I heard that before...
In my mind this comprises three coupled components:
(1) a web (human and machine) interface to search a "master list" or global taxon register (2) a fuzzy match component to cope with misspelled queries (3) the master list itself.
Prototypes of just such a system have been in existence for several years, notably: - the GRIN Taxonomic Nomenclature Checker
http://pgrdoc.bioversity.cgiar.org/taxcheck/ (the granddaddy of such systems), operating over the GRIN database for higher plants
- my own IRMNG http://www.cmar.csiro.au/datacentre/irmng/, operating
over many/most genus namesplus species names from catalogue of Life in the main, and more recently
- the iPlant TNRS http://tnrs.iplantcollaborative.org/TNRSapp.html, operatin g over the TROPICOS plant database - the World Register of Marine Species portal http://www.marinespecies.org/aphia.php operating over its own data, and - the Australian "National Species Lists" search service
http://biodiversity.org.au/service/taxamatch (working over different master lists as their reference database) as mentioned by Greg; plus no doubt others I have not mentioned.
So we could say that the technical aspects of (1) and (2) are basically
not a problem. The residual problem is the construction of the (actual or virtual) "master list". An example of a virtual "master list" (for plants, at genus level only) is provided by the Index Nominum Genericorum (ING) portal at http://botany.si.edu/ing/ : entering a query in the first text box does a real time distributed search of designated resources which between them provide a (partly overlapping) coverage of most/all plant genus names, comprising ING itself, IPNI, Index Nominum Algarum, TROPICOS, and Index Fungorum in this instance (however without any fuzzy match function). This is one approach; benefits are the removal of the need for one site to hold all the names to search over, plus the removal of any synchronization issues between a remote "point of truth" for the records and a central cache of the data. Downsides are the fact that deduplication/data harmonization of potential duplicates from multiple sources is not done; multiple taxonomic concepts/hierarchies may be in use at the various providers; it is difficult to provide fuzzy search over remotely hosted data; plus if any provider is off line at time of query, their data are not searched.
The alternative is for remote provider content to be regularly crawled
or exported, then cached centrally for the search process. This also provides the option for additional QA / data deduplication and harmonization and also notionally improved performance (provided that sufficient resources can be thrown at the one machine where the searches execute in real time). Disadvantages then include the need for continuous assembly and reassembly of the aggregate dataset, and the possibility of the central "view" of the data being out-of-synch with the latest changes at the provider; but something which can be managed in the main (as is already done my many similar "aggregators" of species distribution records).
So the residual question is, where are all the data - also in some
cases, whose content is the most up-to-date / complete / authoritative / accepted in case of potential multiple sources; plus of course, filling gaps where currently there is no obvious source of content for a particular group or region. These are questions which concern key projects at the present time, exemplified by the Catalogue of Life partnership (Sp2000 plus ITIS) for extant data, other sources for data on fossils, and the "Global Names" partnership, to name a few. Others more qualified than myself can answer this question and look at associated issues of resourcing, persistence, data completeness, data sharing culture and the like but at least what we have is a start...
So the upshot of the above is:
- For plants, GRIN and iPlant already provide most of the desired functionality (also ING distributed search for genera) - for marine species, try WoRMS - for (e.g.) Australian species, Greg's "NSLs" project resources as
above, for Europaean species, PESI http://www.eu-nomen.eu/portal/, and so on
- for Cat. of Life species, plus genera from ING (plants) and Nomenclator Zoologicus (animals) plus elsewhere, my own IRMNG.
None of the above resources are as yet complete or completely populated
(in my case, definitely not...) however they are not only pointers along the road but useable resources today.
How do we get to where we want to be? Improve the master list, keep it
up-to-date, continuously improve the quality and completeness of the accessible data... But it does require a "client focus" which provides strong directions for the types of services to be provide, and their actual useability when accessed. (Who has the mandate / who pays are different issues of course).
Probably little of this is news to Taxacomers, but I thought I would
just show that it is not all gloom and doom. And this is not to mention the myriad (and often excellent) taxon-specific database projects out there, of which Paul Kirk and Chris Thompson have already mentioned shining examples, to name but two... - most of which are already engaged in either Sp2000, Global Names Architecture, or both.
Regards - Tony
Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony...@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36
-----Original Message----- From: taxa...@mailman.nhm.ku.edu [mailto:taxacom- boun...@mailman.nhm.ku.edu] On Behalf Of Roderic Page Sent: Thursday, 16 February 2012 6:24 AM To: taxacom Subject: Re: [Taxacom] validation of taxon names
Dear Doug,
Regarding
#4 is something that cannot be objectively determined, because synonymy is almost invariably subjective.
presumably the fact that person x asserted that two names are synonyms can be determined objectively, and that's all we need to know.
Regards
Rod
On 15 Feb 2012, at 18:40, Doug Yanega wrote:
I would observe that, for zoological names, of the following list:
1. Is this a name? 2. Is this the correct way to write it? 3. Is this name currently in use? 4. What other names are related to this name (e.g., synonyms, lexical variants)? 5. Where was this name published? Can I see that publication?
at least 1 and 5 are questions for which an objective and definitive answer (via application of the ICZN for #1) can be arrived at, and that the answer will not change. Thus, these are things which could be made part of a permanent public archive (hopefully, something like ZooBank).
#2 and 3 are things that can, in essence, be objectively determined under the Code, but are subject to the nuance of "prevailing usage" - that is, a sudden change in how taxonomists treat a name can shift the answer from "no" to "yes" (in both cases) or from "yes" to "no" (for #2). One hope that I have is that a mechanism for Registration can be implemented in the future which will prevent such fluctuation, and thus make the answers to 2 and 3 immutable, as well.
#4 is something that cannot be objectively determined, because synonymy is almost invariably subjective.
Realistically, then, this list represents a mixed bag of the immediately attainable, the potentially attainable, and the unattainable. It might be more productive to focus on the former categories, in terms of a community-wide goal. I'll further note that if taxonomists want a system of Registration that will result in permanently stable names, then they are probably going to have to insist upon it, *and* be willing to participate in the process (because such a process is likely to require public review). I'm not 100% sure whether botanical names would work exactly the same way, but I expect that the situation would be pretty much the same.
Peace,
--
Doug Yanega Dept. of Entomology Entomology Research Museum Univ. of California, Riverside, CA 92521-0314 skype: dyanega phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's) http://cache.ucr.edu/~heraty/yanega.html "There are some enterprises in which a careful disorderliness is the true method" - Herman Melville, Moby Dick, Chap. 82
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.p...@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodp...@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
_______________________________________________
Taxacom Mailing List Taxa...@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these methods:
(1) by visiting http://taxacom.markmail.org
(2) a Google search specified as: site: mailman.nhm.ku.edu/pipermail/taxacom your search terms here
_______________________________________________
Taxacom Mailing List Taxa...@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these
methods:
(1) by visiting http://taxacom.markmail.org
(2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom
your search terms here





