| From | Sent On | Attachments |
|---|---|---|
| Geert Josten | May 12, 2012 8:52 am | |
| Danny Sokolsky | May 12, 2012 10:38 am | |
| Danny Sokolsky | May 12, 2012 3:41 pm | |
| Geert Josten | May 13, 2012 1:31 am | |
| Geert Josten | May 13, 2012 1:32 am |
| Subject: | [MarkLogic Dev General] Bug in cts:element-words? (was: Term with same stem) | |
|---|---|---|
| From: | Geert Josten (geer...@dayon.nl) | |
| Date: | May 12, 2012 8:52:18 am | |
| List: | com.marklogic.developer.general | |
Curious how well the idea of Danny would perform, I thought to apply it to one of my test databases with a fair number of tweets (roughly 400K last time I checked). I had to rewrite cts:words to cts:element-words since I have no words lexicon. But it breaks with me. Did I hit a bug?
let $map := map:map()
let $all :=
for $x in cts:element-words(fn:QName("http://grtjn.nl/twitter/utils", "text"), "collation=http://marklogic.com/collation/nl/S1/AS/T00BB")
return map:put($map, cts:stem($x), $x)
return (
fn:concat(xs:string(fn:count(map:keys($map))), " unique stems in the database"),
fn:concat(fn:count(cts:words()), " unique words in the database
"),
map:keys($map) )
Note that I specify a specific collation, but that seems to get ignored. Can anyone confirm this behavior?
Kind regards,
Geert
*Van:* gene...@developer.marklogic.com [mailto: gene...@developer.marklogic.com] *Namens *Danny Sokolsky *Verzonden:* zaterdag 12 mei 2012 0:13 *Aan:* MarkLogic Developer Discussion *Onderwerp:* Re: [MarkLogic Dev General] Term with same stem
If you have a word lexicon you can do something like this to get information about your words and stems:
let $map := map:map()
let $all :=
for $x in cts:words()
return map:put($map, cts:stem($x), $x)
return (
fn:concat(xs:string(fn:count(map:keys($map))), " unique stems in the database"),
fn:concat(fn:count(cts:words()), " unique words in the database
"),
map:keys($map) )
-Danny
*From:* gene...@developer.marklogic.com [mailto:gene...@developer.marklogic.com] *On Behalf Of *Michael Blakeley *Sent:* Friday, May 11, 2012 2:02 PM *To:* MarkLogic Developer Discussion *Cc:* MarkLogic Developer Discussion *Subject:* Re: [MarkLogic Dev General] Term with same stem
If stemming=advanced I think cts:stem will do that. With basic the best you can do is to pass terms to cts:stem and see if they have the same stem.
-- Mike
On May 11, 2012, at 13:39, Abhishek53 S <abhi...@tcs.com> wrote:
Hi Folks,
Is it possible to get the all terms that have same stem from Marklogic database? I want to get all terms that belongs to the same stem.
Thanks & Regards Abhishek Srivastav Systems Engineer Tata Consultancy Services Cell:- +91-9883389968 Mailto: abhi...@tcs.com Website: http://www.tcs.com
____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing
=====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
_______________________________________________ General mailing list Gene...@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list Gene...@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general





