atom feed13 messages in org.apache.lenya.userRe: making search a real usecase
FromSent OnAttachments
solp...@gmail.comApr 14, 2005 4:28 pm 
Gregor J. RothfussApr 15, 2005 11:22 am 
solp...@gmail.comApr 15, 2005 10:54 pm 
solp...@gmail.comApr 18, 2005 2:31 am 
solp...@gmail.comApr 18, 2005 9:41 pm 
solp...@gmail.comApr 18, 2005 11:06 pm 
solp...@gmail.comApr 19, 2005 1:38 am 
Gregor J. RothfussApr 19, 2005 4:45 am 
solp...@gmail.comApr 20, 2005 1:23 am 
Andreas HartmannApr 20, 2005 1:49 am 
solp...@gmail.comApr 20, 2005 12:52 pm 
Thorsten ScherlerApr 21, 2005 11:15 am 
Michael WechnerApr 23, 2005 4:26 am 
Subject:Re: making search a real usecase (
Date:Apr 20, 2005 12:52:42 pm

On 4/20/05, Andreas Hartmann <> wrote: wrote: [Bugzilla is] actually quite easy to use. Feel free to give it a try.

I should have time on Saturday. I need to review all my posts for mentions of various bugs.

How should this be handled? The instructions include many design changes,

Design changes should generally be discussed on the dev list before they are applied. Many things have a purpose, which might be not quite obvious sometimes.

Can this discussion be moved to the dev list?

- Indexes the XML files, rather than spidering the website.

How is this managed? For instance, imagine the following case: - a page is marked as "disabled", e.g., using meta data - the XSLT removes the content and adds a "This page is removed" message - the page doesn't appear in the navigation In this case, would the content of page be indexed? (I have no idea how it works ATM if your changes are not considered)

I could not find a "Disable" action. The "Deactivate" function removes the language version of the document from "live", so issue solved. ("Deactivate" should ask if all language versions should be deactivated. Remember our discussion about creating all language version.) The document is not removed from the index until the indexer is run again. (The indexer needs to be scheduled. Windows has difficulty recreating the index while Lenya is running due to file-locking issues. My instructions recreate the index as "new", so maybe using "incremental update" solves this.)

If there was <dc:status>disabled</dc:status>, adding a check to this process would be easy;it already checks by URL and language. (Although I would rewrite the code so the IFs are easier to control. The END IFs are currently very far away.)

- Creates the excerpt from the HTML BODY, rather than using Lucene's excerpt (although that is configurable. More work would allow fallback to make certain the excerpt includes the search terms.)

What's the difference between these approaches?

Lucene starts with the entire document with all tags removed. When indexing HTML, this is mostly fine since the META tags are removed by the XML2HTML process, although the navigation elements are included, including the breadcrumbs, so all documents under a document with a matching title will be included (hopefully with a low score). When indexing XML, all the META data is added to the excerpt, making a confusing excerpt: Lenya default publicationThe welcome page for the Lenya default publicationmyreviewer|My Reviewer|myreviewer@solprovider.com2005-01-31...

My code builds the excerpt from the HTML BODY. It looks much cleaner.

BUG: There is a bug in the Lucene remove tags process. All tags are removed, but a single space should be added when the tag is a <BR> or <TD>, or if there is whitespace between tags. Example: <H1>Hello</H1><TABLE><TR><TD>This page we<B>lcom</B>es you here.<BR>Now go away.</TD></TR></TABLE> - becomes: HelloThis page welcomes you here.Now go away. - A search for "this" fails because it is not a word in the index.

Lenya 1.2.2's security system is useless.

Of course you're free to tell your opinion, but these are quite harsh and demotivating words considering that people spend their leisure time on this project. Maybe it matches the requirements of the people who developed it?

I apologize. I should have phrased it: "Lenya 1.2.2's access control policies cannot be used as a security system."

Access Control has 3 goals: 1. Deny access if not authorized (and hide existence of unauthorized data.) 2. Grant access if authorized. 3. Allow functions if authorized.

Lenya's current access policies are only concerned with #3. It is designed to add "roles" so additional functionality can be granted to specific Groups. With the Inheriting Policy Manager, and the default <ac:world><ac:role id="visit"/></ac:world>, it is not possible to deny access based on the access policies. It is confusing to call it an "Access Control" system without performing the basic functions of Access Control; it is like finding a "Publishing" system that can only delete documents.

Worst of all, this was done in Lenya 1.2.2. I have not looked at 1.4, and have no idea how many of these changes are obsolete.

AFAIK the search hasn't undergone major refactorings in 1.4, so your contributions won't be obsolete there.

Search was not the biggest concern. The code checks Identity's Groups, and the Identity API could use some work to add functions like: - Identity.getGroups() - Identity.getRoles(context) If the current getIdentifiables() and getAccreditables() are changed, my code needs updating.

Your contributions are very appreciated, especially as they are well documented. -- Andreas

Document as you develop. I get calls to maintain code I wrote a decade ago; a million lines of code later, I forget what the code is doing. It is even more important in a collaborative environment. In this case, I was writing the instructions so a newbie user could follow them; I thought the functions are useful, did not know if the developers would accept any of my thoughts, the Wiki page "ClosedUserGroups" is extremely incomplete, and the architecture for search could not support ClosedUserGroups.

I will post more instructions when the functions are complete, probably next week: - Login/New Registration/Visitor Information page (Need to finish maintaining dynamic data.) - Deny access to unauthorized areas (Lenya does not make this easy. I will change the search instructions to use the config for this when it is complete.) - Contact Us form (I broke it when making Login into a usecase. Fix that and add code to handle mailing.) - Newsletter builder (Much later. Need to design how the CMS GUI decides if a document should be included.)

The instructions will be for extending a Lenya 1.2.2 publication. Could/should these functions be added to the Default publication?