atom feed9 messages in org.apache.marmotta.usersRe: Scaling Marmotta's LDP interface
FromSent OnAttachments
Mark BreedloveSep 18, 2015 12:37 pm 
Raffaele PalmieriSep 19, 2015 10:27 am 
Mark A. MatienzoSep 19, 2015 12:02 pm 
Sergio FernándezSep 22, 2015 12:22 am 
Mark BreedloveSep 23, 2015 3:53 pm 
Sergio FernándezSep 24, 2015 8:47 am 
Mark BreedloveSep 25, 2015 12:30 pm 
Mark BreedloveOct 2, 2015 12:46 pm 
Sergio FernándezOct 8, 2015 4:04 am 
Subject:Re: Scaling Marmotta's LDP interface
From:Mark Breedlove (
Date:Sep 23, 2015 3:53:09 pm

On Tue, Sep 22, 2015 at 2:22 AM, Sergio Fernández <> wrote:

take into account that by default PostgreSQL is very very conservative. Have you customized it?

Thanks for your help with this!

Yes, we've done quite a bit of tuning, as described in [1], [3], and [4].

* max_connections: default is 100, 1000 is at least recommended, but goes

together with the available memory (see next)

Our Marmotta tends to use just 10 connections, and we have max_connections set to 150 for the server, which contains a few other databases in addition to Marmotta's that have very light usage. The only time we've hit the max_connections limit was during a period when locks were not being released. During a normal mapping or enrichment activity, even with the problems I've mentioned, we seldom go above 10 connections for the Marmotta database, so I'm hesitant to raise the potential memory usage of the system, which would result in paging. I'm focused at the moment on stopping the problem of the locks being held for so long.

* shared_buffers: default is just 32MB, as much as you give the more postres would not need to touch disk for any operation (we have installations with 16GB behaving really well)

We have this set to 40% of our real memory -- 3072MB on a 7.5GB server. The 40% ratio is based on a statement in the wiki[2], which states "given the way PostgreSQL also relies on the operating system cache, it's unlikely you'll find using more than 40% of RAM to work better than a smaller amount." Would you agree with the basic ratio, or would you suggest raising it?

The table for triples alone is 32GB, so I could see adding more memory and increasing shared_buffers, but I'm not sure if I'd ever be able to add enough to cover the expanding size of the table.

* work_mem: default is 4MB, you can dramatically increase it (128 or 256 MB) to improve each each transaction handling

* maintenance_work_mem: is less critical, but useful on maintenance tasks

you periodically should run (see below)

work_mem is 38MB at the moment. I'll try increasing it and will let you know what I encounter. maintenance_work_mem is 100MB.

* checkpoint_segments: by default is just 3, but a much larger value improves transaction handling

checkpoint_segments is 128, and checkpoint_timeout is 20min[3].

* think about your vacuum strategy

We've adjusted our autovacuuming per [1].

* If you have resources, create all the indexes (cspo, cop, cp, literals, etc) you may need to improve performance

Could you point me in the right direction for the indices `cop`, `cp`, and `literals`? We have `idx_triples_cspo` on `triples` from the Marmotta installation and created one that we called `idx_triples_c` [4]. Are there optional indices that Marmotta doesn't create when it automatically creates the tables the first time you run it?

* If you do not use versioning, cleanup periodically (nightly) deleted

triples: DELETE FROM triples where deleted = true;

Thanks for clarifying that those rows can be deleted. We've been assuming that we want to use versioning in the future, and will need to talk among ourselves about the implications of foregoing versioning. I believe that the main reason for using the KiWi, and thus PostgreSQL, was to have the possibility of versioning.

Hope that helps. Try to implement some of those suggestion in your system and tell us how they behave and where you still need more help, and maybe code patching.

Thanks again. Does the information referenced above and elsewhere on our wiki clarify how we have things tuned? Our biggest problem at the moment is the slowness of UPDATE statements when we run concurrent LDP requests, detailed in [3]. We see locks piling up, but need to ascertain whether that's a cause or a symptom of something else. The thing that's most notable is how a single batch of LDP requests (in the course of one of our mapping or enrichment runs) works very efficiently, but two concurrent batches of LDP requests have severely degraded performance. I just want to make sure first that nothing in our database tuning jumps out as a culprit before suggesting changes to the code, yours our ours.


[1] [2] [3] [4]