Proof of concept: Evolving postgresql.conf using genetic algorithm - Mailing list pgsql-performance
From | Greg Jaskiewicz |
---|---|
Subject | Proof of concept: Evolving postgresql.conf using genetic algorithm |
Date | |
Msg-id | 5624E065-0C4B-4971-AD4E-4ECBF9B313E6@gmail.com Whole thread Raw |
List | pgsql-performance |
(following the interest from -hackers, I'm posting this here). Hi folks, I've always been fascinated with genetic algorithms. Having had a chance to implement it once before, to solve real lifeissue - I knew they can be brilliant at searching for right solutions in multi dimensional space. Thinking about just the postgresql.conf and number of possible options there to satisfy performance needs - I thought, thisdoes sound like a good example of problem that can be solved using genetic algorithm. So I sat down after work for few days, and came up with a simple proof of concept. It generates random population of postgresql configuration files, and runs simple pgbench test on each one of them. It takesthe average TPS for 3 consecutive runs as the score that then is applied to each 'guy'. Then I run a typical - I suppose - cross over operation and slight mutation of each new chromosome - to come up with newpopulation, and so on and so forth. Running this for 2 days - I came up to conclusion that it does indeed seem to work, although default pgbench 'test cases'are not really stressing the database enough for it to generate diverse enough populations each time. Also, ideally this sort of thing should be run on two or more different hosts. One (master) that just generates new configurations,saves, restores, manages the whole operation - and 'slave' host(s) that run the actual tests. One benefit of that would be the fact that genetic algorithms are highly parallelizable. I did reboot my machines after tests couple times, to test configuration files and to see if the results were in fact repeatable(as much as they can be) - and I have to say, to my surprise - they were. I.e. the configuration files with poorresults were still obviously slower then the best ones. I did include my sample results for everyone to see, including nice spreadsheet with graphs (everyone loves graphs) showingthe scores across all populations. The tests were ran on my mac laptops (don’t have access to bunch of servers that I can test things like that on for coupledays, sorry). The project, including readme file is available to look at: https://github.com/waniek/genpostgresql.conf Things I know so far: * I probably need to take into account more configuration options; * pgbench with its default test case is not the right characterization suite for this exercise, I need something more lifelike. I suppose we all have some sort of a characterization suite that could be used here; * Code needs a lot work on it, if this was to be used professionally; * Just restarting postgresql with different configuration file doesn't really constitute fully proper way to test new configurationfiles, but it seem to work; I don't expect much out of this - after all this is just a proof of concept. But if there are people out there thinking thiscan be in any way useful - please give us a shout. Also, if you know something more about genetic algorithms then I do - and can suggest improvement - let me know. Lastly, I'm looking for some more sophisticated pgbench test cases that I could throw in at it. I think in general pgbenchas a project could use some more sophisticated benchmarks that should be included with the project, for everyone tosee. Perhaps even to run some automated regression tests against git head.
pgsql-performance by date: