Proof of concept: using genetic algorithm to come up with most optimal PostgreSQL.conf - Mailing list pgsql-hackers

From Greg Jaskiewicz
Subject Proof of concept: using genetic algorithm to come up with most optimal PostgreSQL.conf
Date
Msg-id F475EDB6-670E-4634-A00D-BC92ABB35C69@pointblue.com.pl
Whole thread Raw
List pgsql-hackers
(Resending, I think google mail failed delivering it first time).

Hi folks, 

I've always been fascinated with genetic algorithms. Having had a chance to implement it once before, to solve real life issue - I knew they can be brilliant at searching for right solutions in multi dimensional space.

Thinking about just the postgresql.conf and number of possible options there to satisfy performance needs - I thought, this does sound like a good example of problem that can be solved using genetic algorithm. 

So I sat down after work for few days, and came up with a simple proof of concept.
It generates random population of postgresql configuration files, and runs simple pgbench test on each one of them. It takes the average TPS for 3 consecutive runs as the score that then is applied to each 'guy'. 

Then I run a typical - I suppose - cross over operation and slight mutation of each new chromosome - to come up with new population, and so on and so forth. 

Running this for 2 days - I came up to conclusion that it does indeed seem to work, although default pgbench 'test cases' are not really stressing the database enough for it to generate diverse enough populations each time. 

Also, ideally this sort of thing should be run on two or more different hosts. One (master) that just generates new configurations, saves, restores, manages the whole operation - and 'slave' host(s) that run the actual tests.

One benefit of that would be the fact that genetic algorithms are highly parallelizable. 

I did reboot my machines after tests couple times, to test configuration files and to see if the results were in fact repeatable (as much as they can be) - and I have to say, to my surprise - they were. I.e. the configuration files with poor results were still obviously slower then the best ones.

I did include my sample results for everyone to see, including nice spreadsheet with graphs (everyone loves graphs) showing the scores across all populations.
The tests were ran on my mac laptops (don’t have access to bunch of servers that I can test things like that on for couple days, sorry).

The project, including readme file is available to look at:https://github.com/waniek/genpostgresql.conf


Things I know so far:
* I probably need to take into account more configuration options;
* pgbench with its default test case is not the right characterization suite for this exercise, I need something more life like. I suppose we all have some sort of a characterization suite that could be used here;
* Code needs a lot work on it, if this was to be used professionally;
* Just restarting postgresql with different configuration file doesn't really constitute fully proper way to test new configuration files, but it seem to work;


I don't expect much out of this - after all this is just a proof of concept. But if there are people out there thinking this can be in any way useful - please give us a shout. 
Also, if you know something more about genetic algorithms then I do - and can suggest improvement - let me know.

Lastly, I'm looking for some more sophisticated pgbench test cases that I could throw in at it. I think in general pgbench as a project could use some more sophisticated benchmarks that should be included with the project, for everyone to see. Perhaps even to run some automated regression tests against git head. 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: GSoC project : K-medoids clustering in Madlib
Next
From: Kohei KaiGai
Date:
Subject: Re: Review of Row Level Security