| From | Sent On | Attachments |
|---|---|---|
| Todd Gochenour | Feb 19, 2012 4:59 pm | |
| Damon Feldman | Feb 19, 2012 6:00 pm | |
| Todd Gochenour | Feb 19, 2012 10:56 pm | |
| Geert Josten | Feb 19, 2012 11:08 pm | |
| Geert Josten | Feb 19, 2012 11:12 pm | |
| Todd Gochenour | Feb 19, 2012 11:46 pm | |
| Geert Josten | Feb 20, 2012 2:42 am | |
| Damon Feldman | Feb 20, 2012 7:25 am | |
| Todd Gochenour | Feb 20, 2012 7:53 am | |
| Todd Gochenour | Feb 20, 2012 7:57 am | |
| Michael Blakeley | Feb 20, 2012 9:14 am | |
| Todd Gochenour | Feb 20, 2012 9:22 am | |
| Todd Gochenour | Feb 20, 2012 9:39 am | |
| Tim Meagher | Feb 20, 2012 9:56 am | |
| Michael Blakeley | Feb 20, 2012 9:59 am | |
| Michael Blakeley | Feb 20, 2012 10:10 am | |
| Todd Gochenour | Feb 20, 2012 10:48 am | |
| Todd Gochenour | Feb 20, 2012 12:16 pm | |
| Todd Gochenour | Feb 21, 2012 6:59 am | |
| David Lee | Feb 21, 2012 7:01 am | |
| Todd Gochenour | Feb 21, 2012 7:51 am | |
| David Lee | Feb 21, 2012 8:02 am | |
| mcun...@comcast.net | Feb 21, 2012 8:09 am | |
| Colleen Whitney | Feb 21, 2012 9:16 am | |
| Michael Blakeley | Feb 21, 2012 10:06 am | |
| Todd Gochenour | Feb 21, 2012 10:15 am | |
| Todd Gochenour | Feb 24, 2012 10:09 pm | |
| Geert Josten | Feb 24, 2012 11:57 pm | |
| Todd Gochenour | Feb 25, 2012 9:53 am | |
| Geert Josten | Feb 25, 2012 9:59 am | |
| Todd Gochenour | Feb 25, 2012 10:05 am | |
| Geert Josten | Feb 25, 2012 12:01 pm | |
| Todd Gochenour | Feb 25, 2012 4:04 pm | |
| Geert Josten | Feb 26, 2012 2:16 am | |
| Todd Gochenour | Feb 26, 2012 3:59 pm | |
| Todd Gochenour | Feb 26, 2012 10:09 pm |
| Subject: | Re: [MarkLogic Dev General] Processing Large Documents? | |
|---|---|---|
| From: | Michael Blakeley (mi...@blakeley.com) | |
| Date: | Feb 20, 2012 9:59:48 am | |
| List: | com.marklogic.developer.general | |
You can raise the time limit:
Default Time Limit specifies the default value for any request's time limit,
when otherwise unspecified. A request can change its time limit using
xdmp:set-request-time-limit. The time limit, in turn, is the maximum number of
seconds allowed for servicing a query request. The App Server gives up on
queries which take longer, and returns an error.
Turning to your query, I see some repeated work that could probably be factored
out.
for $row in /*/*/table_data/row let $record := element {$row/../@name} {
Let's remove that duplicate name lookup: the result will be constant for every
$row in a given table_data element, and I presume there are many of those.
for $table at $index in /*/*/table_data let $table-name := $table/@name/string() for $row in $table/row let $record := element { $table-name } { $row/field[text()]/element { @name } { text() } } ...
This part is especially troubling and probably adds a lot of duplicated work:
aren't you going back to the entire database again?
xdmp:document-insert(concat(/*/*/@name,'/',name($record),'/',name($record),'_',local:generate-uuid-v4(),'.xml'),
$record)
The semicolon at the end is superfluous. I think this might do what you want:
... let $uri = concat( replace(xdmp:path($row), '(\[[0-9]+\])', ''), '/', $index) return xdmp:document-insert($uri, $xml)
That removes the uuid functions too. But if you do want a uuid implementation
that should be slightly faster, take a look at
http://markmail.org/message/mql6teskkwb574na
Given that all the work is in-memory except the document-insert, you might
actually be able to do this faster by not ingesting the table data first. It's
all one large document, right? You can read that from the filesystem. I don't
know if that will make the transform query faster or slower, but it will avoid
the need to insert all that table_data first.
for $table at $index in xdmp:document-get('/tmp/export.xml')/*/*/table_data let $table-name := $table/@name/string() for $row in $table/row let $record := element { $table-name } { $row/field[text()]/element { @name } { text() } } let $uri := concat( replace(xdmp:path($row), '(\[[0-9]+\])', ''), '/', $index) return xdmp:document-insert($uri, $record)
Finally, (as Tim just proposed in his reply) I would probably move the actual
insert into a spawned task. This requires a little more setup, but allows you to
run the XML processing in timestamped, lock-free mode. Each doc-insert would
then run asynchronously on the task server threads. You might have to increase
the task server queue size, which defaults to 100,000 I think. Otherwise you are
likely to see a MAXTASKS error. You might also want to increase the number of
task server threads. For this workload I would try one thread per CPU,
initially.
http://docs.marklogic.com/5.0doc/docapp.xqy#search.xqy?query=xdmp:spawn
(: task.xqy, a module on the filesystem in the app-server module root :) xquery version "1.0-ml"; declare variable $URI external ; declare variable $NEW external ; xdmp:document-insert($URI, $NEW)
(: query console :) for $table at $index in xdmp:document-get('/tmp/export.xml')/*/*/table_data let $table-name := $table/@name/string() for $row in $table/row let $record := element { $table-name } { $row/field[text()]/element { @name } { text() } } let $uri := concat( replace(xdmp:path($row), '(\[[0-9]+\])', ''), '/', $index) return xdmp:spawn('task.xqy', (xs:QName('URI'), $uri, xs:QName('NEW'), $record))
-- Mike
On 20 Feb 2012, at 09:23 , Todd Gochenour wrote:
The XQuery I have for performing the chunking is timing out after 9 minutes
(running in the query console). There are 156000 'rows' total in this extract.
I'm now reading the Developer's guide for Understanding Transactions to figure
out how I might optimize this query. My query reads:
declare function local:random-hex($length as xs:integer) as xs:string { string-join( for $n in 1 to $length return xdmp:integer-to-hex(xdmp:random(15)), "" ) }; declare function local:generate-uuid-v4() as xs:string { string-join( (local:random-hex(8),local:random-hex(4),local:random-hex(4),local:random-hex(4),local:random-hex(12)), "-" ) };
for $row in /*/*/table_data/row
let $record := element {$row/../@name} {
for $field in $row/field[text()]
return element {$field/@name} {$field/text()}
}
return
xdmp:document-insert(concat(/*/*/@name,'/',name($record),'/',name($record),'_',local:generate-uuid-v4(),'.xml'),
$record);
_______________________________________________ General mailing list Gene...@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list Gene...@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general





