14 messages in org.apache.jackrabbit.usersRE: another search question ...
FromSent OnAttachments
KÖLL ClausOct 17, 2007 2:26 am 
Norbert DreisiebnerOct 17, 2007 2:44 am 
KÖLL ClausOct 17, 2007 2:50 am 
KÖLL ClausOct 18, 2007 10:06 pm 
Jukka ZittingOct 19, 2007 1:23 am 
Julian ReschkeOct 19, 2007 1:49 am 
David NueschelerOct 19, 2007 2:03 am 
KÖLL ClausOct 21, 2007 10:24 pm 
Julian ReschkeOct 22, 2007 3:42 am 
Jukka ZittingOct 22, 2007 4:08 am 
KÖLL ClausOct 22, 2007 4:59 am 
Ard SchrijversOct 22, 2007 5:44 am 
Marcel ReuteggerOct 23, 2007 7:29 am 
Marcel ReuteggerOct 23, 2007 7:33 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:RE: another search question ...Actions...
From:Ard Schrijvers (a.sc@hippo.nl)
Date:Oct 22, 2007 5:44:10 am
List:org.apache.jackrabbit.users

Hello,

On 10/19/07, KÖLL Claus <C.KO@tirol.gv.at> wrote:

is there anybody who can give me a answer ?

I looked at org.apache.jackrabbit.core.query.lucene.NodeIndexer and found the following snippet (starting at line 312 in current trunk):

// never fulltext index jcr:uuid String if (name.equals(QName.JCR_UUID)) { addStringValue(doc, fieldName, value.getString(), false, false, DEFAULT_BOOST); } else { addStringValue(doc, fieldName, value.getString(), true, isIncludedInNodeIndex(name), getPropertyBoost(name)); }

So jcr:uuid is never fulltext indexed. I'm not sure why that is, Marcel?

Although I am not Marcel, I might be able to give a reason to not (never)
fulltext index uuid : fulltext is indexed according the analyzer you have
defined in your <SearchIndex> element, for example

<param name="analyzer"
value="org.apache.lucene.analysis.standard.StandardAnalyzer"/> (this is also the
default)

Now your uuid will get indexed, depending on this analyzer. Typically,
'4778158b-4de1-4ab9-9feb-a1f8987a830d' for example would be tokenized into
'4778158', 'b', '4', 'de', '1', etc etc. ("-" are ignored, and tokenized on
letters / numbers)

When using jcr:contains(jcr:uuid, '4778158b-4de1-4ab9-9feb-a1f8987a830d') in
xpath, the '4778158b-4de1-4ab9-9feb-a1f8987a830d' will be tokenized (parsed)
according the same fulltext analyzer into seperate tokens which will be "AND"-ed
in the search (see public Object visit(TextsearchQueryNode node, Object data) in
LuceneQueryBuilder).

So, you would get a hit if we fulltext index uuids and you would seach for
jcr:contains(jcr:uuid, '4778158b-4de1-4ab9-9feb-a1f8987a830d'), but you would
also get a hit for

jcr:contains(jcr:uuid, '4778158b-4de1-4ab9-9feb') or jcr:contains(jcr:uuid, '4778158b-4de1') or jcr:contains(jcr:uuid, '4778158b-a1f8987') etc etc

So, fulltext indexing of a uuid really doesn't makes sense. If you are
interested to know more about indexing and searching, lucene in action book
might be a good starting point [1]

Regards Ard

[1] http://www.lucenebook.com/

BR,