atom feed5 messages in com.marklogic.developer.generalRe: [MarkLogic Dev General] XDMP-EXPN...
FromSent OnAttachments
Zegarek, ArthurSep 23, 2010 8:14 pm 
Michael BlakeleySep 23, 2010 9:58 pm 
Zegarek, ArthurSep 23, 2010 11:07 pm 
Michael BlakeleySep 24, 2010 8:42 am 
Zegarek, ArthurSep 24, 2010 8:54 am 
Subject:Re: [MarkLogic Dev General] XDMP-EXPNTREECACHEFULL on ML 3.2
From:Zegarek, Arthur (azeg@audible.com)
Date:Sep 24, 2010 8:54:12 am
List:com.marklogic.developer.general

That makes sense.

XQuery is deceptive in that if feels like a controlling program (because you
manually code loops, etc.), even though it really isn't. Definitely will to implement another way.

-----Original Message----- From: Michael Blakeley [mailto:mich@marklogic.com] Sent: Friday, September 24, 2010 11:43 AM To: Zegarek, Arthur Cc: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] XDMP-EXPNTREECACHEFULL on ML 3.2

The locks themselves do not take up expanded tree-cache space. Instead, the query's working set has to fit in the expanded tree cache. It's true that the number of locks is generally proportional to the working set. However, if you were to disable the update function calls in your query, then it would no longer take any locks. But it would still need the same amount of expanded-tree cache space.

I'm glad to hear that you are familiar with PL/SQL batched updates. That PL/SQL processing model implies an external process controller: the PL/SQL program manages SQL statements, which perform the updates, and manages the commit interval around them. With XQuery, a similar model would also require a controlling program. You can do this by writing XQuery to manage XQuery (for example, xdmp:spawn a module that updates one batch), or you can use XCC/Java to manage XQuery (eg, the previously mentioned Corb), or HTTP requests, or XCC/.NET, etc.

On 2010-09-23 23:07, Zegarek, Arthur wrote:

Michael-

Breaking up the updates into smaller batches is definitely possible, but....

Are you saying that holding the locks is causing the expanded tree cache to fill
up? "I like to use batches of 500 or 1000"- My natural inclination is to try and
draw a parallel to SQL processing - I often have to do this type of update
against Oracle DB's and my normal practice would be to write a similar PL/SQL
routine which would commit every 500-1000 rows or so to release locks -
especially if doing this type of update activity on an OLTP type system.

Is there a way to interject "commit" within the xquery which would have the same
effect (hopefully then allowing the processing to continue to completion without
the need to break it up into multiple calls)? What would that look like in code?
I've always heard references to the fact that ML has transactional
commit/rollback capability, but in practicality, my observations have been that
the updates seem to commit immediately.

Thanks in advance-

-----Original Message----- From: Michael Blakeley [mailto:mich@marklogic.com] Sent: Friday, September 24, 2010 12:59 AM To: General Mark Logic Developer Discussion Cc: Zegarek, Arthur Subject: Re: [MarkLogic Dev General] XDMP-EXPNTREECACHEFULL on ML 3.2

Arthur, I'm sorry to see that you wrote that much code while looking for a solution. Sometimes it's helpful to search for an error message if it's new to you: http://www.google.com/search?q=XDMP-EXPNTREECACHEFULL and http://marklogic.markmail.org/search/?q=XDMP-EXPNTREECACHEFULL are good places to start. The short story is that the query's working set has to fit in the expanded tree cache.

Moving on to remedies, tuning the in-memory tree size will not affect XDMP-EXPNTREECACHEFULL. If you want to try tuning the server, then tune the expanded tree cache size. However, it's usually better to tune the query. XDMP-EXPNTREECACHEFULL usually means that your query is over-ambitious. The query might not be using indexes efficiently, or it might simply be a query with a gigantic working set.

I see that this is an update query. ACID properties require a read-lock on every document read by the query, and write-lock every document that is updated. If you expect to have 300k of these documents for the live system, then I would recommend breaking the work up into smaller transactions. While it is possible to modify 300k (or more) documents in one transaction, it is usually more efficient to modify a batch of documents at a time. I like to use batches of 500 or 1000. Besides performance concerns, this technique is helpful when you encounter an error in the 299,999th document: you only have to reprocess that batch.

Finally, you might be interested in http://marklogic.github.com/corb/ which is intended to help automate this sort of bulk-update.

On 2010-09-23 20:14, Zegarek, Arthur wrote:

I am getting an XDMP-EXPNTREECACHEFULL error – not sure how to get around it.

Trying to write an xquery that reads through a control list to obtain a list of
catalog elements tha require updating, along with a start_date/end_date value to
update in main catalog.

When I load the control xml (DevWithoutExclProduct.xml ) up with more than 2000
items or so, I get XDMP-EXPNTREECACHEFULL. In memory tree size is set to 1Gb

I have 3 versions of the code – details below. I would think the 2nd or 3rd
versions would not incur the problem, since in this version I isolate the logic
in a function that is called with just a single node each time. I understand
the issue is too many nodes being kept I scope, but how can you get around this
in a single xquery call, without breaking up the data to make multiple calls to
ML Server? If I limit DevWithoutExclProduct.xml to 1500 products or so, it
runs through without the exception.

Currently running this in our dev environment where we have approx 54000
products in the Internal collection . In Prod it is more like 300000.

Version 1: declare namespace RSUITE="http://www.reallysi.com" declare namespace adbl="http://www.audible.com/publisherToRepository"

for $excl in doc("DevWithoutExclProduct.xml")/prods/product, $rsuite in
collection("Internal")/product

let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT let $prod_id := $excl/prods/product/prod_id

let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT

let $exi := exists($excl_product )

let $start := $excl/start let $end := $excl/end

let $repl_node
:=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT>

where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = $excl/prod_id/text()

return <a> { if( $exi = true() ) then xdmp:node-replace($excl_product, $repl_node) else xdmp:node-insert-after($excl_content, $repl_node)

} </a>

In this version, the control list, DevWithoutExclProduct.xml, is joined in the
same for loop as the main catalog Error returned is: XDMP-EXPNTREECACHEFULL: for $rsuite as item()* in
collection("Internal")/child::product -- Expanded tree cache full on host
rsuite.ofc.dev.ewr.audible.com line 4 = /use-cases/eval2.xqy line 2

Version 2 – Here I tried isolating the functionality in a function, and call the
function in a separate for loop that reads through the control. So I am unclear
why I am setill getting the tree cache full, given that the function is called
with just a single node each time. Note the difference in the error reported –
here there are 2 lines mentioned.

declare namespace RSUITE="http://www.reallysi.com" declare namespace adbl="http://www.audible.com/publisherToRepository"

define function update_excl_prod( $excl as node() ) as element() (: Call with
just 1 node ! :) {

for $rsuite in collection("Internal")/product

let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT let $prod_id := $excl/prods/product/prod_id

let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT

let $exi := exists($excl_product )

let $start := $excl/start let $end_dt := $excl/end

let $repl_node
:=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end_dt/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT>

where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = $excl/prod_id/text()

return if( $exi = true() ) then xdmp:node-replace($excl_product, $repl_node) else xdmp:node-insert-after($excl_content, $repl_node) }

<result>{ for $excl in doc("DevWithoutExclProduct.xml")/prods/product return update_excl_prod( $excl ) }

</result>

Error returned here is: XDMP-EXPNTREECACHEFULL: for $rsuite as item()* in
collection("Internal")/child::product -- Expanded tree cache full on host
rsuite.ofc.dev.ewr.audible.com line 7 = line 34 = /use-cases/eval2.xqy line 2

Version 3 – Here I tried limiting the for expression to search for the item,
removed the where clause.Same exception

declare namespace RSUITE="http://www.reallysi.com" declare namespace adbl="http://www.audible.com/publisherToRepository"

define function update_excl_prod( $excl as node() ) as element() (: Call with
just 1 node ! :) {

for $rs in fn:collection('Internal')/product/adbl:METADATA/adbl:CORE/adbl:ID[.=
$excl/prod_id/text() ]

let $rsuite := doc(xdmp:node-uri($rs ))/product

let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT let $prod_id := $excl/prods/product/prod_id

let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT

let $exi := exists($excl_product )

let $start := $excl/start let $end_dt := $excl/end

let $repl_node
:=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end_dt/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT>

(: where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = $excl/prod_id/text()
:)

return if( $exi = true() ) then xdmp:node-replace($excl_product, $repl_node) else xdmp:node-insert-after($excl_content, $repl_node) }

<result>{ for $excl in doc("DevWithoutExclProduct.xml")/prods/product return update_excl_prod( $excl ) }

</result>

Art

Art Zegarek | Director of Data Architecture T: 973.820.0396 F: 973.820.0505 C: 732-735-2592

audible.com 1 Washington Park, 16th Floor, Newark, NJ 07102