atom feed9 messages in org.apache.marmotta.usersRe: Scaling Marmotta's LDP interface
FromSent OnAttachments
Mark BreedloveSep 18, 2015 12:37 pm 
Raffaele PalmieriSep 19, 2015 10:27 am 
Mark A. MatienzoSep 19, 2015 12:02 pm 
Sergio FernándezSep 22, 2015 12:22 am 
Mark BreedloveSep 23, 2015 3:53 pm 
Sergio FernándezSep 24, 2015 8:47 am 
Mark BreedloveSep 25, 2015 12:30 pm 
Mark BreedloveOct 2, 2015 12:46 pm 
Sergio FernándezOct 8, 2015 4:04 am 
Subject:Re: Scaling Marmotta's LDP interface
From:Sergio Fernández (
Date:Sep 24, 2015 8:47:54 am

Hi Mark,

On Thu, Sep 24, 2015 at 12:53 AM, Mark Breedlove <> wrote:

* max_connections: default is 100, 1000 is at least recommended, but goes

together with the available memory (see next)

Our Marmotta tends to use just 10 connections, and we have max_connections set to 150 for the server, which contains a few other databases in addition to Marmotta's that have very light usage. The only time we've hit the max_connections limit was during a period when locks were not being released. During a normal mapping or enrichment activity, even with the problems I've mentioned, we seldom go above 10 connections for the Marmotta database, so I'm hesitant to raise the potential memory usage of the system, which would result in paging. I'm focused at the moment on stopping the problem of the locks being held for so long.

I do agree with that. We should find the actual source of those locks.

* shared_buffers: default is just 32MB, as much as you give the more

postres would not need to touch disk for any operation (we have installations with 16GB behaving really well)

We have this set to 40% of our real memory -- 3072MB on a 7.5GB server. The 40% ratio is based on a statement in the wiki[2], which states "given the way PostgreSQL also relies on the operating system cache, it's unlikely you'll find using more than 40% of RAM to work better than a smaller amount." Would you agree with the basic ratio, or would you suggest raising it?

The table for triples alone is 32GB, so I could see adding more memory and increasing shared_buffers, but I'm not sure if I'd ever be able to add enough to cover the expanding size of the table.

From my experience Postgres takes as much as it can, but never risking the system. We normally also use 40% as orientative figure. But with such small server I'd try to give it more memory (6GB?) to see how it behaves.

Because that, another key aspect that dramatically improves the performance on updates is I/O. Are you using SSD? Actual SSD, I mean, virtual SSDs are a completely different story.

* work_mem: default is 4MB, you can dramatically increase it (128 or 256

MB) to improve each each transaction handling

* maintenance_work_mem: is less critical, but useful on maintenance tasks

you periodically should run (see below)

work_mem is 38MB at the moment. I'll try increasing it and will let you know what I encounter.

That should have an impact on handling concurrent transaction. Let's see...

* If you have resources, create all the indexes (cspo, cop, cp, literals,

etc) you may need to improve performance

Could you point me in the right direction for the indices `cop`, `cp`, and `literals`? We have `idx_triples_cspo` on `triples` from the Marmotta installation and created one that we called `idx_triples_c` [4]. Are there optional indices that Marmotta doesn't create when it automatically creates the tables the first time you run it?

Well, internally each LDP-R uses its own context. Therefore all indexes that could improve the lookup would help. Try to create also at least cop. In the end you can tune your indexes based on your needs and available resources. Marmotta does not create all possible indexes by default, but you can do it if you consider it.

* If you do not use versioning, cleanup periodically (nightly) deleted

triples: DELETE FROM triples where deleted = true;

Thanks for clarifying that those rows can be deleted. We've been assuming that we want to use versioning in the future, and will need to talk among ourselves about the implications of foregoing versioning. I believe that the main reason for using the KiWi, and thus PostgreSQL, was to have the possibility of versioning.

Then discuss it. If you do not need it, cleaning delete triples also helps.

Thanks again. Does the information referenced above and elsewhere on our

wiki clarify how we have things tuned? Our biggest problem at the moment is the slowness of UPDATE statements when we run concurrent LDP requests, detailed in [3]. We see locks piling up, but need to ascertain whether that's a cause or a symptom of something else. The thing that's most notable is how a single batch of LDP requests (in the course of one of our mapping or enrichment runs) works very efficiently, but two concurrent batches of LDP requests have severely degraded performance. I just want to make sure first that nothing in our database tuning jumps out as a culprit before suggesting changes to the code, yours our ours.

Thanks, having more hints about your setup help quite a lot. I'd say that, based on my experience with virtual disks, the I/O could be the source of the issues. So either you move to a more performance disk or we find strategies to improve the in-memory work.

Sorry for not being so helpful. Performance tuning is very tricky.