atom feed7 messages in com.marklogic.developer.generalRe: [MarkLogic Dev General] element-q...
FromSent OnAttachments
Abhishek53 SFeb 1, 2012 1:28 am 
Geert JostenFeb 1, 2012 1:55 am 
Abhishek53 SFeb 1, 2012 2:46 am 
Abhishek53 SFeb 1, 2012 2:54 am 
Will ThompsonFeb 1, 2012 9:20 am 
Michael BlakeleyFeb 1, 2012 10:55 am 
Will ThompsonFeb 1, 2012 11:18 am 
Subject:Re: [MarkLogic Dev General] element-query with punctuation insensitive and punctuation marks as cts:text
From:Will Thompson (wtho@jonesmcclure.com)
Date:Feb 1, 2012 11:18:29 am
List:com.marklogic.developer.general

Mike - This is also what I have found. Search:parse has to actually return this
"empty" query for the <empty> option to have any effect:

<cts:and-query qtextempty="1" xmlns:cts="http://marklogic.com/cts"/>

When it is passed punctuation text and "punctuation-insensitive" in options it
returns:

<cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> <cts:text>,</cts:text> <cts:option>punctuation-insensitive</cts:option> </cts:word-query>

The same problem occurs with "whitespace-insensitive" in options and
search:parse("&nbsp;",$options):

<cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> <cts:text> </cts:text> <cts:option>whitespace-insensitive</cts:option> </cts:word-query>

Both these queries are unaffected by <empty apply="all-results"/> and return no
results. I don't think this is desirable for any application. Ideally I think
Search API would provide an option to behave like your parser or for
search:parse to return empty queries for these scenarios.

Stripping out punctuation from the input query is a decent workaround, but we
have to be careful not strip out characters that could be part of a constraint,
phrase, custom grammar, etc., so the regex gets uglier.

-Will

-----Original Message----- From: gene@developer.marklogic.com
[mailto:gene@developer.marklogic.com] On Behalf Of Michael Blakeley Sent: Wednesday, February 01, 2012 10:56 AM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] element-query with punctuation insensitive
and punctuation marks as cts:text

In cases like this it's worth looking at the query output. The search:parse
function produces this:

<cts:and-query strength="20" qtextjoin="" qtextgroup="( )"
xmlns:cts="http://marklogic.com/cts"> <cts:word-query qtextpre="&quot;" qtextref="cts:text" qtextpost="&quot;"> <cts:text>metal</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>unstemmed</cts:option> <cts:option>punctuation-insensitive</cts:option> </cts:word-query> <cts:and-query strength="20" qtextjoin="" qtextgroup="( )"> <cts:word-query qtextref="cts:text"> <cts:text>,</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>unstemmed</cts:option> <cts:option>punctuation-insensitive</cts:option> </cts:word-query> <cts:word-query qtextpre="&quot;" qtextref="cts:text" qtextpost="&quot;"> <cts:text>locker</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>unstemmed</cts:option> <cts:option>punctuation-insensitive</cts:option> </cts:word-query> </cts:and-query> </cts:and-query>

See the cts:text entry for ','? After some testing with 5.0-2, my guess is that
since ',' is the only character in that punctuation-insensitive word-query, that
word-query term ends up not matching anything. I think it should match
*everything*, which would also cause problems if search:parse created that
query. But whether the existing behavior is a bug or not, the workaround should
be simple: rewrite the input query so that it does not contain any punctuation.
This might be suitable:

replace($query, '[^\w\s]', ' ')

Or you might look into using https://github.com/mblakele/xqysp with
search:resolve(). XQYSP ignores unexpected punctuation unless it is part of a
quoted term.

On 1 Feb 2012, at 09:21 , Will Thompson wrote:

Abhishek - I recently had a very similar issue with empty searches and
punctuation, and the solution appeared to be adding <empty apply="all-results"
/> to search options. However, after further testing, I am also getting empty
results. For example,

let $options := <options xmlns="http://marklogic.com/appservices/search"> <term> <empty apply="all-results" /> <term-option>punctuation-insensitive</term-option> </term> <searchable-expression>//doc</searchable-expression> </options> let $empty := <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> <cts:text>;</cts:text> <cts:option>punctuation-insensitive</cts:option> </cts:word-query> return search:resolve($empty,$options)

This returns no results, and the value of @apply does not seem to have any
effect. I think this is probably a bug.

-Will

From: gene@developer.marklogic.com
[mailto:gene@developer.marklogic.com] On Behalf OfAbhishek53 S Sent: Wednesday, February 01, 2012 2:55 AM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] element-query with punctuation insensitive
and punctuation marks as cts:text

Hi Geert,

Here is the sample query I used

import module namespace search = "http://marklogic.com/appservices/search" at
"/MarkLogic/appservices/search/search.xqy"; let $parsed-query := search:parse('"metal" , "locker"', <options
xmlns="http://marklogic.com/appservices/search">

<search-option>unfiltered</search-option> <term> <empty apply="all-results" /> <term-option>case-insensitive</term-option> <term-option>unstemmed</term-option> <term-option>punctuation-insensitive</term-option> </term>

</options>)

let $query := cts:element-query(xs:QName("data"),cts:query($parsed-query)) return

xdmp:estimate(cts:search(fn:doc(), $query))

Thanks Abhishek Srivastav Tata Consultancy Services Cell:- +91-9883389968 Mailto: abhi@tcs.com Website: http://www.tcs.com

____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing

____________________________________________

From: Abhishek53 S <abhi@tcs.com> To: General MarkLogic Developer Discussion <gene@developer.marklogic.com> Date: 02/01/2012 04:17 PM Subject: Re: [MarkLogic Dev General] element-query with punctuation insensitive and
punctuation marks as cts:text Sent by: gene@developer.marklogic.com

Hi Geert,

Thanks for your response. Currently I am not inclined towards removing the
word-query with punctuation marks (Until it will be the last option to do) from
the main query. I am using search:parse function to parse the search term.

I tried with your 3rd option but still unable to get the expected result [count
without punctuation (,) = count with punctuation (,) as
punctuation-insensitive]. If I can recall it correctly this term option is used
to send result or not when the term is empty terms how this would help me in
this case...

Thanks for you help!

Abhishek Srivastav Tata Consultancy Services Cell:- +91-9883389968 Mailto: abhi@tcs.com Website: http://www.tcs.com

____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing

____________________________________________

From: Geert Josten <geer@dayon.nl> To: General MarkLogic Developer Discussion <gene@developer.marklogic.com> Date: 02/01/2012 03:26 PM Subject: Re: [MarkLogic Dev General] element-query with punctuation insensitive and
punctuation marks as cts:text Sent by: gene@developer.marklogic.com

Hi Abishek,

What is happening here is that you pass ',' as search term to a word-query with
'punctuation-insensitive' option. That option causes the comma character
effectively to be stripped out of the search term, leaving an empty search term.
Doing a cts:word-query with an empty search term results nothing.

I think you have few options: 1. Don't tokenize the search string yourself (at least, if that is what you
are doing), and pass in 'metal,' or ', metal' as search term with punctuation
insensitive. That is effectively the same as searching for 'metal'. 2. Strip punctuation yourself before parsing it to <cts:query> element
structure (or post-process the query element structure to filter out
punctuation-only queries) 3. Add <empty apply="all-results" /> to your search options (I'm guessing
you are using search:parse, so to the options you pass in there)

Kind regards, Geert

Van: gene@developer.marklogic.com
[mailto:gene@developer.marklogic.com] NamensAbhishek53 S Verzonden: woensdag 1 februari 2012 10:30 Aan: General MarkLogic Developer Discussion Onderwerp: [MarkLogic Dev General] element-query with punctuation insensitive
and punctuation marks as cts:text

Hi Folks,

I am not sure if I am wrong somewhere while explaining this issue of
punctuation-insensitive search with punctuation marks as cts:text
(element-query). While executing the below query I am not getting any count back
because punctuation mark is not ignored during search (even if
punctuation-insensitive). The expected behavior of our application is always
punctuation-insensitive . If I remove word query with punctuation marks, It will
start returning count based on remaining search criteria. On the other hand word
query with punctuation-sensitive option is behaving similar to it is ignored
from the search criteria.

Please let me know how to make this element-query punctuation insensitive even
if punctuation marks are present into cts:text node of word-query . xdmp:estimate(cts:search(fn:doc(), cts:query( <cts:element-query> <cts:element xmlns="">data</cts:element> <cts:and-query> <cts:word-query> <cts:text xml:lang="en">,</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>punctuation-insensitive</cts:option> <cts:option>unstemmed</cts:option> </cts:word-query> <cts:word-query> <cts:text xml:lang="en">metal</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>punctuation-insensitive</cts:option> <cts:option>unstemmed</cts:option> </cts:word-query> </cts:and-query> </cts:element-query> )))

____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing