|Yves Savourel||Feb 26, 2012 6:58 pm|
|Rodolfo M. Raya||Feb 27, 2012 1:00 am|
|Yves Savourel||Feb 27, 2012 4:45 am|
|Rodolfo M. Raya||Feb 27, 2012 5:04 am|
|Yves Savourel||Feb 27, 2012 5:37 am|
|Lucia.Morado||Feb 27, 2012 7:12 am|
|Yves Savourel||Feb 27, 2012 7:35 am|
|Jung Nicholas Ryoo||Feb 27, 2012 8:36 am|
|Yves Savourel||Feb 27, 2012 10:38 am|
|Rodolfo M. Raya||Feb 28, 2012 10:07 am|
|Yves Savourel||Feb 28, 2012 10:10 am|
|Subject:||Re: [xliff] Attributes for translation candidates|
|From:||Jung Nicholas Ryoo (jung...@oracle.com)|
|Date:||Feb 27, 2012 8:36:22 am|
I have a couple of questions and comments on the proposal.
1) Data type of score(similarity) and quality: * Is there any reason why the score should be an integer? In our case, it has been always a real number ranging from 0 to 100.00. You may ask us back the benefit of having them in real number though. Our scoring logic is very sophisticated. We want to sort suggestions correctly (99.9 is definitely preferred to 99). Real numbers may be better for interoperability as it is a superset of integer.
2) Score and quality? * I understand the points of having two attributes. However, our scoring logic all consider many factors including similarity, quality, content domains and types etc. The score for our case is a combination score, so we can list the suggestions clearly in the order of our preference.
* Therefore, "similarity" is not proper for our case. I suggest to have "match-score" as a main attribute, allowing two more attributes (similarity, quality) if each tool wants to have. All these may increase confusion rather than help. 2 attributes are perfect, and 3 attributes are too many? Then my suggestion is to have the first attribute "score".
3) content-type, content-domain, match-type
* Due to cross-file/type leverage, we need to deliver content-type (xml, html, properties, etc) and content domain. Do you think "origin" can be used for that purpose? * "type" requires a clearly defined list of values. For MT suggestions, translators should post-edit instead of translate. CATs may have specific features for MT suggestions. Therefore, XLIFF docs should use the same value in type attribute for MT suggestions.
On 27/02/2012 12:45, Yves Savourel wrote:
Hi Rodolfo, all,
1) Change the name of "score" to "similarity". That would be clearer.
2) Define an optional module for storing the metadata associated with a match.
Yes, I think such metadata could be re-used for other features. For example QA
Perhaps we would need to provide some directions for handling the combination of "score/similarity" with "quality". It may be hard for a user to select the best match from two matches that have these properties: a) similarity="60" quality="90" b) similarity="80" quality="60"
That would be something useful. But, based on some discussions I've seen in use
cases like Microsoft Translator's MatchDegree (similarity) and Rating (quality)
I'm not sure there would be a single answer. Often it ends up being a user
preference that needs to be decided at usage time.
This also brings the question: should we have a processing expectation that user
agents should preserve the order of the matches? Also should we have specific
processing expectations about how new matches should be added?
My guess is that we probably want to keep this simple: XLIFF provides the
structure to hold the information, but let tools do what they want with it. For
example a processing expectation that the matches must be re-written in the same
order wouldn't work with a tool whose tasks is precisely to apply some ranking
to the matches.
-- Jung Nicholas Ryoo | Principal Software Engineer Phone: +35318031918 <tel:+35318031918> | | Fax: +35318031918 <fax:+35318031918> | Oracle WPTG Infrastructure
ORACLE Ireland | Block P5, Eastpoint Business Park Dublin 3 Oracle is committed to developing practices and products that help protect the environment