| From | Sent On | Attachments |
|---|---|---|
| Armand Turpel | Feb 14, 2012 12:41 am | |
| Stephen Thorpe | Feb 14, 2012 1:01 pm | |
| Armand Turpel | Feb 15, 2012 12:59 am | |
| Roderic Page | Feb 15, 2012 1:50 am | |
| Paul Kirk | Feb 15, 2012 1:55 am | |
| greg whitbread | Feb 15, 2012 5:05 am | |
| Chris Thompson | Feb 15, 2012 8:03 am | |
| David Patterson | Feb 15, 2012 8:07 am | |
| Roderic Page | Feb 15, 2012 9:12 am | |
| Doug Yanega | Feb 15, 2012 10:40 am | |
| Paul Kirk | Feb 15, 2012 10:51 am | |
| Roderic Page | Feb 15, 2012 11:23 am | |
| Armand Turpel | Feb 15, 2012 11:39 am | |
| Stephen Thorpe | Feb 15, 2012 12:47 pm | |
| Jim Croft | Feb 15, 2012 1:06 pm | |
| Curtis Clark | Feb 15, 2012 5:49 pm | |
| Tony...@csiro.au | Feb 15, 2012 7:19 pm | |
| Roderic Page | Feb 15, 2012 10:18 pm | |
| Jim Croft | Feb 15, 2012 10:29 pm | |
| Armand Turpel | Feb 16, 2012 5:12 am | |
| Roderic Page | Feb 16, 2012 8:24 am | |
| Doug Yanega | Feb 16, 2012 9:45 am | |
| Chuck Miller | Feb 16, 2012 11:57 am | |
| Bradley Boyle | Feb 16, 2012 2:45 pm | |
| Richard Zander | Feb 16, 2012 3:10 pm | |
| Stephen Thorpe | Feb 16, 2012 3:24 pm | |
| Frederick W. Schueler | Feb 16, 2012 3:31 pm | |
| Chuck Miller | Feb 16, 2012 4:14 pm | |
| Stephen Thorpe | Feb 16, 2012 4:28 pm | |
| Chris Thompson | Feb 16, 2012 7:05 pm | |
| Kim van der Linde | Feb 16, 2012 7:13 pm | |
| Neal Evenhuis | Feb 16, 2012 7:27 pm | |
| Kim van der Linde | Feb 16, 2012 7:38 pm | |
| Stephen Thorpe | Feb 16, 2012 7:40 pm | |
| muscapaul | Feb 17, 2012 12:16 am | |
| Dr Brian Taylor | Feb 17, 2012 12:23 am | |
| Armand Turpel | Feb 17, 2012 1:25 am | |
| Chris Thompson | Feb 17, 2012 7:03 am | |
| Chris Thompson | Feb 17, 2012 11:23 am | |
| Stephen Thorpe | Feb 17, 2012 1:09 pm | |
| Chris Thompson | Feb 17, 2012 2:04 pm | |
| Tony...@csiro.au | Feb 17, 2012 2:32 pm | |
| Richard Pyle | Feb 17, 2012 2:51 pm | |
| Stephen Thorpe | Feb 17, 2012 3:16 pm | |
| Stephen Thorpe | Feb 17, 2012 3:18 pm | |
| Richard Pyle | Feb 17, 2012 3:22 pm | |
| Stephen Thorpe | Feb 17, 2012 3:37 pm | |
| Richard Pyle | Feb 17, 2012 5:06 pm | |
| Tony...@csiro.au | Feb 17, 2012 5:18 pm | |
| Stephen Thorpe | Feb 17, 2012 5:27 pm | |
| Curtis Clark | Feb 17, 2012 7:39 pm | |
| Stephen Thorpe | Feb 17, 2012 8:04 pm | |
| Richard Zander | Feb 18, 2012 9:26 am | |
| Richard Zander | Feb 18, 2012 9:59 am | |
| Richard Pyle | Feb 18, 2012 11:33 am | |
| Curtis Clark | Feb 18, 2012 6:45 pm | |
| Richard Pyle | Feb 18, 2012 8:59 pm | |
| Paul van Rijckevorsel | Feb 19, 2012 12:36 am | |
| 72 later messages | ||
| Subject: | Re: [Taxacom] validation of taxon names | |
|---|---|---|
| From: | David Patterson (dpat...@mbl.edu) | |
| Date: | Feb 15, 2012 8:07:43 am | |
| List: | edu.ku.nhm.mailman.taxacom | |
Rod
Bizarre is not the word I would use, having (because of my association with the Global Names project) some appreciation of the extent of the problem. But, I am certainly embarassed that we have made so little effective progress towards what is such an obvious goal.
Progress, I think, requires a more analytical perspective, and a willingness to work collaboratively.
Our sub-discipline contrasts massively with, for example, the molecular domain, where open sharing of content is the norm. As a community we have shown much less readiness to share names and taxonomic content. So, a 'social' change needed there. We can provide the tools, but attitudes will need to change. And change does not just relate to people and content, but also a change relating to development of services and software. New tools need to be designed so that they can be interconnected, creating a much better open toolkit. Money, and especially money not tied to short duration projects
Then we require an infrastructure that allows those who are willing to share content, annotate names as being valid now according to someone, declare synonymies, flag chresonyms, disambiguate homonyms, interconnect lexical variants, offer alternative taxonomic perspective, and provide means for integrating vernacular names and names for surrogates. But that's not too hard, is it?
That structure is in need of the means of capturing ALL new changes that occur out there.
Rod, you have been significant in showing us that it is feasible to make massive progress, but then you are equally aware of the spectrum of outlets and players. That diversity defies simple and quick solutions, and frustrates the universal fix. The result should be seen as a process that will improve with time. That process will need some kind of overview, but an overview that is aware that resources for developments that go well beyond 'proof of concept' are lacking.
Given the diversity of players and the wealth of expertise, the solution needs to enable crowd sourcing. Such crowd sourcing would include the capacity of anyone to comment on any element of information, for some players to have authority to make changes, for all initiatives to be interlinked, and for an alert system that keeps all interested parties aware of all changes as they happen. The expectation is that this evolve to synchrony of all parts. Crows sourcing needs to accept that there is more than one point of view as to what constitutes an entity, and how the entities should be arranged.
With such a structure, what progress might we get?
1. Is this a name? This answer needs access to all names and all variants. Global Names has about 22,000,000 strings that are purported to be names, but the contents of this are extremely dirty (intentionally so). To find a string in GNI certainly does not mean that it will be a name. So, that needs to be fixed by flagging entries as 'names' and 'not names'. Some of the other weaknesses in Global Names are those taxonomic territories without good coverage or where content sharing does not happen. A lot of namestrings from the older literature have yet to be added. Taxonomic sources tend not to be comprehensive with synonyms, certainly not with lexical variants, are often contaminated with chresonyms, and vary in terms of taxonomic currency. There is often too much distance between expert and point of contact with the product - making it difficult to correct errors. So, these are a few of the issues on this front. It would be useful if we could run some exercises to assess how close to the asymptote we are. It would also be of interest to get input on priorities, given that there are many tasks.
2. Is this the correct way to write it? I would suggest this question needs to be rephrased, given that there are many correct ways to write a scientific name - mostly with variations in the authority department. If we limit ourselves to thinking about the correct spelling of the scientific elements, then this remains the responsibility of the nomenclators. Making nomenclator content open and placing nomenclators within crowd sourcing will help to overcome issues of scale. Historically, nomenclators have defined their own taxonomic context; but this is no longer a feasible stance. Homonyms - as Tony Rees has pointed out abound, some more taxa become ambiregnal, some less so. Open-ness of nomenclatoral content is an issue.
3. Is this name currently in use? Google can probably provide an answer of sorts to that question, but again I suspect this is a question to be refined. Is the question: Is this name, currently, considered to be a nomenclaturally valid for a taxon? Given that species concepts are rarely universally accepted, and that understanding of relationships is improving such that binomials change, then I am also assuming that we accept that there will be more than one list of valid names for the same taxonomic area. Building that polytheism into the infrastructure is not too challenging, but will the experts accept that, and will the consumer accept it?
4. What other names are related to this name (e.g., synonyms, lexical variants)? Yes, this is what we called, in our TREE paper, 'reconciliation'. Lexical variants are probably the simplest to deal with, but algorithms that seek to scale to all name strings run into problems where the fuzziness of lexical variants of similar names overlap, and that leads to massive aggregates of names - many of which are not lexical variants of all others. There will never be a perfect algorithm for this, and this also needs to have a human interface that allows results to be refined. Homotypic synonyms (objective synonyms) can probably be found with fair success by algorithms. They are, like heterotypic synonyms, embedded in taxonomic treatments. Which leads us to the issue of access to taxonomic treatments. Can we make more openly available, and in a form that the content will flow to all of us, and can we create an infrastructure that any complementary data can flow back to reward the participating taxonomists. Taxonomic treatments differ in completeness, may take differing and equally valid perspectives on the same taxa, so we still have a long way to go to merge the compatible thoughts and separate the incompatible ones.
5. Where was this name published? Can I see that publication? The nomenclators should be our primary point of reference for the first point. Nomenclators for prokaryotes, fungi, plants, and viruses are reasonably good. The protists (especially the heterotrophs) and animals present us with massive problems. ZooBank is setting up the infrastructure to help make progress on animals, and with Index Animalium and (let's cross our fingers) Nomenclator Zoologicus included, there will be a reasonably good generic framework. What incentives will entice Zoological Record to make their content openly available. After that, we are back to the taxonomists, and helping them to make their content openly available and ensuring they are rewarded for doing so. A proximate task, one that you demonstrated very nicely, is to reconcile the alternative ways of pointing to a reference. You are probably in a better position to estimate what proportion of the relevant literature is in a digital format, is indexed, and is not behind a paywall or closed off by copyright issues.
Sorry for the length, but it is my way of saying that the task is not a small one, and I am sure I have missed out many many issues. This is a task that GBIF, TDWG and others have in mind, but if anything the most proximate problem is that the focus is still too diffuse.
David Patterson
On Wed, Feb 15, 2012 at 4:50 AM, Roderic Page <r.p...@bio.gla.ac.uk> wrote:
But isn't it bizarre that our field can't offer the kind of service Armand is looking for?
Very simple questions are being asked:
1. Is this a name? 2. Is this the correct way to write it? 3. Is this name currently in use? 4. What other names are related to this name (e.g., synonyms, lexical variants)? 5. Where was this name published? Can I see that publication?
Yes, there will always be edge cases, but in general these are straightforward questions and yet we have failed to provide a simple, global tool to answer them.
Regards
Rod
On 14 Feb 2012, at 21:02, Stephen Thorpe wrote:
Hi Armand, Your question opens a familiar "can of worms", as we say! Currently, there is no comprehensive source of validated scientific
names, and certain vagueness in some ICZN Code articles makes it unlikely that there could ever be a robust notion of name availability.
I work on Wikispecies to create something akin to what you want, except
that I try to make the names verifiable by the user, rather than just saying "you can trust me". Hence, it involves work on the part of the user. Most users want someone else to do the work, and to be "spoon fed" with validated names, but this just isn't realistic ...
Cheers, Stephen
________________________________ From: Armand Turpel <arma...@gmail.com> To: taxa...@mailman.nhm.ku.edu Sent: Tuesday, 14 February 2012 9:42 PM Subject: [Taxacom] validation of taxon names
Hi,
We have a database with over 80000 species taxon names which we want to compare and validate against other databases. Doing this job isn’t very easy:
1. The majority of organizations only provide web interfaces to search for single taxon names. 2. Copyrights of data are some times not very clear 3. Quality of data is doubtful. > - Lamia amputator Guérin-Méneville, 1844 - Lamia amputator Guerin-Meneville, 1844 - Lamia amputator Guérin-Méneville - …...
The only organization we know that provide its whole database for download is species2000 (catalogue of life > COL). We created a postgresql version for the COL data from which it is possible to compare a big number of taxon names in one run. Postgresql provide good fuzzy string algorithms. But the COL data isn’t error free and it isn’t complete for our region.
The question is: Which organization provide trustful, complete (as possible) and full accessible data?
a+
arm
_______________________________________________
Taxacom Mailing List Taxa...@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these methods:
(1) by visiting http://taxacom.markmail.org
(2) a Google search specified as: site: mailman.nhm.ku.edu/pipermail/taxacom your search terms here
_______________________________________________
Taxacom Mailing List Taxa...@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these methods:
(1) by visiting http://taxacom.markmail.org
(2) a Google search specified as: site:
mailman.nhm.ku.edu/pipermail/taxacom your search terms here
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.p...@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodp...@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
_______________________________________________
Taxacom Mailing List Taxa...@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these methods:
(1) by visiting http://taxacom.markmail.org
(2) a Google search specified as: site: mailman.nhm.ku.edu/pipermail/taxacom your search terms here
--
___________________________________ David J Patterson
Senior Scientist, Marine Biological Laboratory Life Sciences Lead, Data Conservancy globalnames.org
7 MBL Street, Woods Hole, MASS 02543, USA.
_______________________________________________
Taxacom Mailing List Taxa...@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these
methods:
(1) by visiting http://taxacom.markmail.org
(2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom
your search terms here





