atom feed5 messages in com.marklogic.developer.general[MarkLogic Dev General] Bug in cts:el...
FromSent OnAttachments
Geert JostenMay 12, 2012 8:52 am 
Danny SokolskyMay 12, 2012 10:38 am 
Danny SokolskyMay 12, 2012 3:41 pm 
Geert JostenMay 13, 2012 1:31 am 
Geert JostenMay 13, 2012 1:32 am 
Subject:[MarkLogic Dev General] Bug in cts:element-words? (was: Term with same stem)
From:Geert Josten (geer@dayon.nl)
Date:May 12, 2012 8:52:18 am
List:com.marklogic.developer.general

Curious how well the idea of Danny would perform, I thought to apply it to one of my test databases with a fair number of tweets (roughly 400K last time I checked). I had to rewrite cts:words to cts:element-words since I have no words lexicon. But it breaks with me. Did I hit a bug?

let $map := map:map()

let $all :=

for $x in cts:element-words(fn:QName("http://grtjn.nl/twitter/utils", "text"), "collation=http://marklogic.com/collation/nl/S1/AS/T00BB")

return map:put($map, cts:stem($x), $x)

return (

fn:concat(xs:string(fn:count(map:keys($map))), " unique stems in the database"),

fn:concat(fn:count(cts:words()), " unique words in the database

"),

map:keys($map) )

Note that I specify a specific collation, but that seems to get ignored. Can anyone confirm this behavior?

Kind regards,

Geert

*Van:* gene@developer.marklogic.com [mailto: gene@developer.marklogic.com] *Namens *Danny Sokolsky *Verzonden:* zaterdag 12 mei 2012 0:13 *Aan:* MarkLogic Developer Discussion *Onderwerp:* Re: [MarkLogic Dev General] Term with same stem

If you have a word lexicon you can do something like this to get information about your words and stems:

let $map := map:map()

let $all :=

for $x in cts:words()

return map:put($map, cts:stem($x), $x)

return (

fn:concat(xs:string(fn:count(map:keys($map))), " unique stems in the database"),

fn:concat(fn:count(cts:words()), " unique words in the database

"),

map:keys($map) )

-Danny

*From:* gene@developer.marklogic.com [mailto:gene@developer.marklogic.com] *On Behalf Of *Michael Blakeley *Sent:* Friday, May 11, 2012 2:02 PM *To:* MarkLogic Developer Discussion *Cc:* MarkLogic Developer Discussion *Subject:* Re: [MarkLogic Dev General] Term with same stem

If stemming=advanced I think cts:stem will do that. With basic the best you can do is to pass terms to cts:stem and see if they have the same stem.

-- Mike

On May 11, 2012, at 13:39, Abhishek53 S <abhi@tcs.com> wrote:

Hi Folks,

Is it possible to get the all terms that have same stem from Marklogic database? I want to get all terms that belongs to the same stem.

Thanks & Regards Abhishek Srivastav Systems Engineer Tata Consultancy Services Cell:- +91-9883389968 Mailto: abhi@tcs.com Website: http://www.tcs.com

____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing