Proof of concept: Evolving postgresql.conf using genetic algorithm - Mailing list pgsql-performance

From Greg Jaskiewicz
Subject Proof of concept: Evolving postgresql.conf using genetic algorithm
Date
Msg-id 5624E065-0C4B-4971-AD4E-4ECBF9B313E6@gmail.com
Whole thread Raw
List pgsql-performance
(following the interest from -hackers, I'm posting this here).

Hi folks,

I've always been fascinated with genetic algorithms. Having had a chance to implement it once before, to solve real
lifeissue - I knew they can be brilliant at searching for right solutions in multi dimensional space. 

Thinking about just the postgresql.conf and number of possible options there to satisfy performance needs - I thought,
thisdoes sound like a good example of problem that can be solved using genetic algorithm.  

So I sat down after work for few days, and came up with a simple proof of concept.
It generates random population of postgresql configuration files, and runs simple pgbench test on each one of them. It
takesthe average TPS for 3 consecutive runs as the score that then is applied to each 'guy'.  

Then I run a typical - I suppose - cross over operation and slight mutation of each new chromosome - to come up with
newpopulation, and so on and so forth.  

Running this for 2 days - I came up to conclusion that it does indeed seem to work, although default pgbench 'test
cases'are not really stressing the database enough for it to generate diverse enough populations each time.  

Also, ideally this sort of thing should be run on two or more different hosts. One (master) that just generates new
configurations,saves, restores, manages the whole operation - and 'slave' host(s) that run the actual tests. 

One benefit of that would be the fact that genetic algorithms are highly parallelizable.

I did reboot my machines after tests couple times, to test configuration files and to see if the results were in fact
repeatable(as much as they can be) - and I have to say, to my surprise - they were. I.e. the configuration files with
poorresults were still obviously slower then the best ones. 

I did include my sample results for everyone to see, including nice spreadsheet with graphs (everyone loves graphs)
showingthe scores across all populations. 
The tests were ran on my mac laptops (don’t have access to bunch of servers that I can test things like that on for
coupledays, sorry). 

The project, including readme file is available to look at: https://github.com/waniek/genpostgresql.conf


Things I know so far:
* I probably need to take into account more configuration options;
* pgbench with its default test case is not the right characterization suite for this exercise, I need something more
lifelike. I suppose we all have some sort of a characterization suite that could be used here; 
* Code needs a lot work on it, if this was to be used professionally;
* Just restarting postgresql with different configuration file doesn't really constitute fully proper way to test new
configurationfiles, but it seem to work; 


I don't expect much out of this - after all this is just a proof of concept. But if there are people out there thinking
thiscan be in any way useful - please give us a shout.  
Also, if you know something more about genetic algorithms then I do - and can suggest improvement - let me know.

Lastly, I'm looking for some more sophisticated pgbench test cases that I could throw in at it. I think in general
pgbenchas a project could use some more sophisticated benchmarks that should be included with the project, for everyone
tosee. Perhaps even to run some automated regression tests against git head.  


pgsql-performance by date:

Previous
From: Cindy Makarowsky
Date:
Subject: Re: Performance of query
Next
From: Josh Berkus
Date:
Subject: Setting vacuum_freeze_min_age really low