Thread: 8.3RC2 vs 8.2.6 testing results
Hello, I wanted to share performance-related test results for Postgresql 8.3RC2 and 8.2.6. In both cases we used a freshly imported database followed by analyze verbose command. Same server was used for testing (2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both cases default configuration was used with increased shared buffers to 1Gb (total server ram is 32Gb), increased work and maintenance mem, enabled autovacuum, increased default_statistics_target to 100, increased effective_cache_size to 20Gb, disabled fsync and increased checkpoint_segments. Total size (on disk) of the tables involved in the query was around 300Mb. 1. Freshly imported DB size on disk was about 3% smaller for 8.3 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. We took special measures to make sure that no third factors involved (no other apps running, all data was cached from disks, etc). Below is one of the queries that we used for testing (I anonymized table names) along with query plan for both 8.3 and 8.2. The query execution plans are the same for both versions, but what we found quite interesting is that if we add all the times from each line of 8.2's query plan, it roughly adds-up to the total execution time. For 8.3's plan each line shows a shorter time, yet resulting in longer total runtime. Also, summing 8.3's plan lines doesn't come close to the total execution time: SELECT _."a_id", SUM(_."counter") FROM ts.t_c AS _ LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id" LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id" LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id" WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <= '2008-01-27 23:59:59') AND __1."status" IS TRUE AND __2."status" IS TRUE GROUP BY _."a_id" 8.2.6 QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual time=21517.837..21517.890 rows=47 loops=1) -> Hash Join (cost=2450.09..111489.75 rows=550289 width=52) (actual time=76.083..7691.579 rows=2593557 loops=1) Hash Cond: (_.i_id = __3.id) -> Hash Join (cost=19.20..95377.74 rows=934651 width=56) (actual time=0.119..4933.928 rows=2596942 loops=1) Hash Cond: (_.a_id = __1.id) -> Append (cost=0.00..76276.09 rows=2596252 width=56) (actual time=0.014..2988.950 rows=2596942 loops=1) -> Seq Scan on t_c _ (cost=0.00..21.10 rows=4 width=56) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((date_day >= '2008-01-01'::date) AND (date_day <= '2008-01-27'::date)) -> Seq Scan on t_c_2008_01 _ (cost=0.00..76254.99 rows=2596248 width=56) (actual time=0.011..1979.606 rows=2596942 loops=1) Filter: ((date_day >= '2008-01-01'::date) AND (date_day <= '2008-01-27'::date)) -> Hash (cost=18.30..18.30 rows=72 width=4) (actual time=0.094..0.094 rows=72 loops=1) -> Seq Scan on t_a __1 (cost=0.00..18.30 rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1) Filter: (status IS TRUE) -> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual time=75.931..75.931 rows=59934 loops=1) -> Hash Join (cost=57.45..1950.44 rows=38436 width=4) (actual time=0.829..54.760 rows=59934 loops=1) Hash Cond: (__3.b_id = __2.id) -> Seq Scan on t_i __3 (cost=0.00..1263.82 rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1) -> Hash (cost=50.90..50.90 rows=524 width=4) (actual time=0.499..0.499 rows=524 loops=1) -> Seq Scan on t_b __2 (cost=0.00..50.90 rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1) Filter: (status IS TRUE) Total runtime: 21518.097 ms 8.3RC2: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual time=24354.972..24355.019 rows=47 loops=1) -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52) (actual time=76.188..8177.510 rows=2593557 loops=1) Hash Cond: (_.i_id = __3.id) -> Hash Join (cost=16.20..92904.25 rows=935090 width=56) (actual time=0.140..5304.968 rows=2596942 loops=1) Hash Cond: (_.a_id = __1.id) -> Append (cost=0.00..73796.62 rows=2597473 width=56) (actual time=0.043..3272.024 rows=2596942 loops=1) -> Seq Scan on t_c _ (cost=0.00..21.55 rows=4 width=56) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((date_day >= '2008-01-01'::date) AND (date_day <= '2008-01-27'::date)) -> Seq Scan on t_c_2008_01 _ (cost=0.00..73775.07 rows=2597469 width=56) (actual time=0.040..2245.209 rows=2596942 loops=1) Filter: ((date_day >= '2008-01-01'::date) AND (date_day <= '2008-01-27'::date)) -> Hash (cost=15.30..15.30 rows=72 width=4) (actual time=0.091..0.091 rows=72 loops=1) -> Seq Scan on t_a __1 (cost=0.00..15.30 rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1) Filter: (status IS TRUE) -> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual time=76.027..76.027 rows=59934 loops=1) -> Hash Join (cost=55.45..1882.44 rows=38436 width=4) (actual time=0.835..54.576 rows=59934 loops=1) Hash Cond: (__3.b_id = __2.id) -> Seq Scan on t_i __3 (cost=0.00..1197.82 rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1) -> Hash (cost=48.90..48.90 rows=524 width=4) (actual time=0.513..0.513 rows=524 loops=1) -> Seq Scan on t_b __2 (cost=0.00..48.90 rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1) Filter: (status IS TRUE) Total runtime: 24355.179 ms Any ideas on what-we-were-doing-wrong are welcomed -- Vlad
Hello 8.3 plan is not optimal. > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52) > (actual time=76.188..8177.510 rows=2593557 loops=1) please, try to increase statistics default_statistics_target (in postgresql.conf) to 100 and repeat import and your test. Regards Pavel Stehule On 28/01/2008, Vlad <marchenko@gmail.com> wrote: > Hello, > > I wanted to share performance-related test results for Postgresql > 8.3RC2 and 8.2.6. In both cases we used a freshly imported database > followed by analyze verbose command. Same server was used for testing > (2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were > compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both > cases default configuration was used with increased shared buffers to > 1Gb (total server ram is 32Gb), increased work and maintenance mem, > enabled autovacuum, increased default_statistics_target to 100, > increased effective_cache_size to 20Gb, disabled fsync and increased > checkpoint_segments. Total size (on disk) of the tables involved in > the query was around 300Mb. > > 1. Freshly imported DB size on disk was about 3% smaller for 8.3 > 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. > We took special measures to make sure that no third factors involved > (no other apps running, all data was cached from disks, etc). Below > is one of the queries that we used for testing (I anonymized table > names) along with query plan for both 8.3 and 8.2. The query execution > plans are the same for both versions, but what we found quite > interesting is that if we add all the times from each line of 8.2's > query plan, it roughly adds-up to the total execution time. For 8.3's > plan each line shows a shorter time, yet resulting in longer total > runtime. Also, summing 8.3's plan lines doesn't come close to the > total execution time: > > SELECT _."a_id", SUM(_."counter") > FROM ts.t_c AS _ > LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id" > LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id" > LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id" > WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <= > '2008-01-27 23:59:59') > AND __1."status" IS TRUE > AND __2."status" IS TRUE > GROUP BY _."a_id" > > > 8.2.6 QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual > time=21517.837..21517.890 rows=47 loops=1) > -> Hash Join (cost=2450.09..111489.75 rows=550289 width=52) > (actual time=76.083..7691.579 rows=2593557 loops=1) > Hash Cond: (_.i_id = __3.id) > -> Hash Join (cost=19.20..95377.74 rows=934651 width=56) > (actual time=0.119..4933.928 rows=2596942 loops=1) > Hash Cond: (_.a_id = __1.id) > -> Append (cost=0.00..76276.09 rows=2596252 width=56) > (actual time=0.014..2988.950 rows=2596942 loops=1) > -> Seq Scan on t_c _ (cost=0.00..21.10 rows=4 > width=56) (actual time=0.001..0.001 rows=0 loops=1) > Filter: ((date_day >= '2008-01-01'::date) > AND (date_day <= '2008-01-27'::date)) > -> Seq Scan on t_c_2008_01 _ > (cost=0.00..76254.99 rows=2596248 width=56) (actual > time=0.011..1979.606 rows=2596942 loops=1) > Filter: ((date_day >= '2008-01-01'::date) > AND (date_day <= '2008-01-27'::date)) > -> Hash (cost=18.30..18.30 rows=72 width=4) (actual > time=0.094..0.094 rows=72 loops=1) > -> Seq Scan on t_a __1 (cost=0.00..18.30 > rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1) > Filter: (status IS TRUE) > -> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual > time=75.931..75.931 rows=59934 loops=1) > -> Hash Join (cost=57.45..1950.44 rows=38436 width=4) > (actual time=0.829..54.760 rows=59934 loops=1) > Hash Cond: (__3.b_id = __2.id) > -> Seq Scan on t_i __3 (cost=0.00..1263.82 > rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1) > -> Hash (cost=50.90..50.90 rows=524 width=4) > (actual time=0.499..0.499 rows=524 loops=1) > -> Seq Scan on t_b __2 (cost=0.00..50.90 > rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1) > Filter: (status IS TRUE) > Total runtime: 21518.097 ms > > > > 8.3RC2: QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual > time=24354.972..24355.019 rows=47 loops=1) > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52) > (actual time=76.188..8177.510 rows=2593557 loops=1) > Hash Cond: (_.i_id = __3.id) > -> Hash Join (cost=16.20..92904.25 rows=935090 width=56) > (actual time=0.140..5304.968 rows=2596942 loops=1) > Hash Cond: (_.a_id = __1.id) > -> Append (cost=0.00..73796.62 rows=2597473 width=56) > (actual time=0.043..3272.024 rows=2596942 loops=1) > -> Seq Scan on t_c _ (cost=0.00..21.55 rows=4 > width=56) (actual time=0.001..0.001 rows=0 loops=1) > Filter: ((date_day >= '2008-01-01'::date) > AND (date_day <= '2008-01-27'::date)) > -> Seq Scan on t_c_2008_01 _ > (cost=0.00..73775.07 rows=2597469 width=56) (actual > time=0.040..2245.209 rows=2596942 loops=1) > Filter: ((date_day >= '2008-01-01'::date) > AND (date_day <= '2008-01-27'::date)) > -> Hash (cost=15.30..15.30 rows=72 width=4) (actual > time=0.091..0.091 rows=72 loops=1) > -> Seq Scan on t_a __1 (cost=0.00..15.30 > rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1) > Filter: (status IS TRUE) > -> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual > time=76.027..76.027 rows=59934 loops=1) > -> Hash Join (cost=55.45..1882.44 rows=38436 width=4) > (actual time=0.835..54.576 rows=59934 loops=1) > Hash Cond: (__3.b_id = __2.id) > -> Seq Scan on t_i __3 (cost=0.00..1197.82 > rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1) > -> Hash (cost=48.90..48.90 rows=524 width=4) > (actual time=0.513..0.513 rows=524 loops=1) > -> Seq Scan on t_b __2 (cost=0.00..48.90 > rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1) > Filter: (status IS TRUE) > Total runtime: 24355.179 ms > > > Any ideas on what-we-were-doing-wrong are welcomed > > > -- > Vlad > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On 28/01/2008, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > 8.3 plan is not optimal. > > > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52) > > (actual time=76.188..8177.510 rows=2593557 loops=1) > > please, try to increase statistics I am blind, I am sorry, It's noise, you did it. > > default_statistics_target (in postgresql.conf) to 100 and repeat > import and your test. > > Regards > Pavel Stehule > > On 28/01/2008, Vlad <marchenko@gmail.com> wrote: > > Hello, > > > > I wanted to share performance-related test results for Postgresql > > 8.3RC2 and 8.2.6. In both cases we used a freshly imported database > > followed by analyze verbose command. Same server was used for testing > > (2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were > > compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both > > cases default configuration was used with increased shared buffers to > > 1Gb (total server ram is 32Gb), increased work and maintenance mem, > > enabled autovacuum, increased default_statistics_target to 100, > > increased effective_cache_size to 20Gb, disabled fsync and increased > > checkpoint_segments. Total size (on disk) of the tables involved in > > the query was around 300Mb. > > > > 1. Freshly imported DB size on disk was about 3% smaller for 8.3 > > 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. > > We took special measures to make sure that no third factors involved > > (no other apps running, all data was cached from disks, etc). Below > > is one of the queries that we used for testing (I anonymized table > > names) along with query plan for both 8.3 and 8.2. The query execution > > plans are the same for both versions, but what we found quite > > interesting is that if we add all the times from each line of 8.2's > > query plan, it roughly adds-up to the total execution time. For 8.3's > > plan each line shows a shorter time, yet resulting in longer total > > runtime. Also, summing 8.3's plan lines doesn't come close to the > > total execution time: > > > > SELECT _."a_id", SUM(_."counter") > > FROM ts.t_c AS _ > > LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id" > > LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id" > > LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id" > > WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <= > > '2008-01-27 23:59:59') > > AND __1."status" IS TRUE > > AND __2."status" IS TRUE > > GROUP BY _."a_id" > > > > > > 8.2.6 QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------------------------------------- > > HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual > > time=21517.837..21517.890 rows=47 loops=1) > > -> Hash Join (cost=2450.09..111489.75 rows=550289 width=52) > > (actual time=76.083..7691.579 rows=2593557 loops=1) > > Hash Cond: (_.i_id = __3.id) > > -> Hash Join (cost=19.20..95377.74 rows=934651 width=56) > > (actual time=0.119..4933.928 rows=2596942 loops=1) > > Hash Cond: (_.a_id = __1.id) > > -> Append (cost=0.00..76276.09 rows=2596252 width=56) > > (actual time=0.014..2988.950 rows=2596942 loops=1) > > -> Seq Scan on t_c _ (cost=0.00..21.10 rows=4 > > width=56) (actual time=0.001..0.001 rows=0 loops=1) > > Filter: ((date_day >= '2008-01-01'::date) > > AND (date_day <= '2008-01-27'::date)) > > -> Seq Scan on t_c_2008_01 _ > > (cost=0.00..76254.99 rows=2596248 width=56) (actual > > time=0.011..1979.606 rows=2596942 loops=1) > > Filter: ((date_day >= '2008-01-01'::date) > > AND (date_day <= '2008-01-27'::date)) > > -> Hash (cost=18.30..18.30 rows=72 width=4) (actual > > time=0.094..0.094 rows=72 loops=1) > > -> Seq Scan on t_a __1 (cost=0.00..18.30 > > rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1) > > Filter: (status IS TRUE) > > -> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual > > time=75.931..75.931 rows=59934 loops=1) > > -> Hash Join (cost=57.45..1950.44 rows=38436 width=4) > > (actual time=0.829..54.760 rows=59934 loops=1) > > Hash Cond: (__3.b_id = __2.id) > > -> Seq Scan on t_i __3 (cost=0.00..1263.82 > > rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1) > > -> Hash (cost=50.90..50.90 rows=524 width=4) > > (actual time=0.499..0.499 rows=524 loops=1) > > -> Seq Scan on t_b __2 (cost=0.00..50.90 > > rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1) > > Filter: (status IS TRUE) > > Total runtime: 21518.097 ms > > > > > > > > 8.3RC2: QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------------------------------------- > > HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual > > time=24354.972..24355.019 rows=47 loops=1) > > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52) > > (actual time=76.188..8177.510 rows=2593557 loops=1) > > Hash Cond: (_.i_id = __3.id) > > -> Hash Join (cost=16.20..92904.25 rows=935090 width=56) > > (actual time=0.140..5304.968 rows=2596942 loops=1) > > Hash Cond: (_.a_id = __1.id) > > -> Append (cost=0.00..73796.62 rows=2597473 width=56) > > (actual time=0.043..3272.024 rows=2596942 loops=1) > > -> Seq Scan on t_c _ (cost=0.00..21.55 rows=4 > > width=56) (actual time=0.001..0.001 rows=0 loops=1) > > Filter: ((date_day >= '2008-01-01'::date) > > AND (date_day <= '2008-01-27'::date)) > > -> Seq Scan on t_c_2008_01 _ > > (cost=0.00..73775.07 rows=2597469 width=56) (actual > > time=0.040..2245.209 rows=2596942 loops=1) > > Filter: ((date_day >= '2008-01-01'::date) > > AND (date_day <= '2008-01-27'::date)) > > -> Hash (cost=15.30..15.30 rows=72 width=4) (actual > > time=0.091..0.091 rows=72 loops=1) > > -> Seq Scan on t_a __1 (cost=0.00..15.30 > > rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1) > > Filter: (status IS TRUE) > > -> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual > > time=76.027..76.027 rows=59934 loops=1) > > -> Hash Join (cost=55.45..1882.44 rows=38436 width=4) > > (actual time=0.835..54.576 rows=59934 loops=1) > > Hash Cond: (__3.b_id = __2.id) > > -> Seq Scan on t_i __3 (cost=0.00..1197.82 > > rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1) > > -> Hash (cost=48.90..48.90 rows=524 width=4) > > (actual time=0.513..0.513 rows=524 loops=1) > > -> Seq Scan on t_b __2 (cost=0.00..48.90 > > rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1) > > Filter: (status IS TRUE) > > Total runtime: 24355.179 ms > > > > > > Any ideas on what-we-were-doing-wrong are welcomed > > > > > > -- > > Vlad > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > >
Pavel: thanks for your feedback. To me plans generated by 8.2 and 8.3 are equal and only differ by execution times. (I don't know, maybe email wrap'ed lines, so I've attached plans to my message). Also, I confirm that that parameter was increased (to 100) before the ran tests. On Jan 28, 2008 4:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > 8.3 plan is not optimal. > > > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52) > > (actual time=76.188..8177.510 rows=2593557 loops=1) > > please, try to increase statistics > > default_statistics_target (in postgresql.conf) to 100 and repeat > import and your test. > > Regards > Pavel Stehule -- Vlad
Attachment
On Jan 28, 2008 3:56 PM, Vlad <marchenko@gmail.com> wrote: > Hello, > > 1. Freshly imported DB size on disk was about 3% smaller for 8.3 > 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. > We took special measures to make sure that no third factors involved > (no other apps running, all data was cached from disks, etc). Below > is one of the queries that we used for testing (I anonymized table > names) along with query plan for both 8.3 and 8.2. The query execution > plans are the same for both versions, but what we found quite > interesting is that if we add all the times from each line of 8.2's > query plan, it roughly adds-up to the total execution time. For 8.3's > plan each line shows a shorter time, yet resulting in longer total > runtime. Also, summing 8.3's plan lines doesn't come close to the > total execution time: This last bit often means there's some overhead in the systems timeofday() function calls. If you just use \timing from psql, and run the script without explain analyze, what speeds do you get on each?
> This last bit often means there's some overhead in the systems > timeofday() function calls. > > If you just use \timing from psql, and run the script without explain > analyze, what speeds do you get on each? > 17480ms (8.2.6) 20342ms (8.3RC2) -- Vlad
Vlad <marchenko@gmail.com> writes: > 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. The particular case you are showing here seems to be all about the speed of hash aggregation --- at least the time differential is mostly in the HashAggregate step. What is the data type of a_id? I speculate that you're noticing the slightly slower/more complicated hash function that 8.3 uses for integers. On a case where the data was well distributed you'd not see any countervailing efficiency gain from those extra cycles. regards, tom lane
On Mon, 28 Jan 2008, Tom Lane wrote: > I speculate that you're noticing the slightly slower/more complicated > hash function that 8.3 uses for integers. There was a similar slowdown in the Clodaldo case you tracked down recently. Is it worth considering an addition to the release notes warning about this class of problem? If there have been two of them so far just in the beta I wonder how many people are going to run into some variant of this in the future. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On 1/29/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vlad <marchenko@gmail.com> writes: > > 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. > > The particular case you are showing here seems to be all about the speed > of hash aggregation --- at least the time differential is mostly in the > HashAggregate step. What is the data type of a_id? I speculate that > you're noticing the slightly slower/more complicated hash function that > 8.3 uses for integers. On a case where the data was well distributed > you'd not see any countervailing efficiency gain from those extra > cycles. AFAIK we have a plan to update string hash in 8.4 to fastest available (Jenkins lookup3). Maybe we should update integer hash too then to the best: http://www.cris.com/~Ttwang/tech/inthash.htm ("32 bit Mix Functions" is the one). -- marko
Tom: Yes, they are ints. To (somewhat) check your guess on the role of the hash aggregation speed, I just ran oltp test from sysbench (http://sysbench.sourceforge.net/docs/#database_mode) on a table with 1mln of records. That test uses pretty simple queries (that do not use aggregation) and 8.3 showed about the same performance as 8.2 (strictly speaking 8.3 was about 1-2% slower, but not 10-15% like on my query). I'm curious if that new hash aggregation algorithm was put in 8.3 with the performance increase as a goal or it was simply a required change to support some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't seems a favorable step for the type of application that we have... -- vlad ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Vlad" <marchenko@gmail.com> Cc: "PG-General" <pgsql-general@postgresql.org> Sent: Monday, January 28, 2008 9:13 PM Subject: Re: [GENERAL] 8.3RC2 vs 8.2.6 testing results > Vlad <marchenko@gmail.com> writes: >> 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. > > The particular case you are showing here seems to be all about the speed > of hash aggregation --- at least the time differential is mostly in the > HashAggregate step. What is the data type of a_id? I speculate that > you're noticing the slightly slower/more complicated hash function that > 8.3 uses for integers. On a case where the data was well distributed > you'd not see any countervailing efficiency gain from those extra > cycles. > > regards, tom lane >
2008/1/29, Vlad Marchenko <marchenko@gmail.com>: > Tom: > > Yes, they are ints. To (somewhat) check your guess on the role of the hash > aggregation speed, I just ran oltp test from sysbench > (http://sysbench.sourceforge.net/docs/#database_mode) on a table with 1mln > of records. That test uses pretty simple queries (that do not use > aggregation) and 8.3 showed about the same performance as 8.2 (strictly > speaking 8.3 was about 1-2% slower, but not 10-15% like on my query). > > I'm curious if that new hash aggregation algorithm was put in 8.3 with the > performance increase as a goal or it was simply a required change to support > some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't > seems a favorable step for the type of application that we have... Vlad, What happens if you run the 8.3 test with enable_hashagg set to off? Saudações, Clodoaldo Pinto Neto > ----- Original Message ----- > From: "Tom Lane" <tgl@sss.pgh.pa.us> > To: "Vlad" <marchenko@gmail.com> > Cc: "PG-General" <pgsql-general@postgresql.org> > Sent: Monday, January 28, 2008 9:13 PM > Subject: Re: [GENERAL] 8.3RC2 vs 8.2.6 testing results > > > > Vlad <marchenko@gmail.com> writes: > >> 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6. > > > > The particular case you are showing here seems to be all about the speed > > of hash aggregation --- at least the time differential is mostly in the > > HashAggregate step. What is the data type of a_id? I speculate that > > you're noticing the slightly slower/more complicated hash function that > > 8.3 uses for integers. On a case where the data was well distributed > > you'd not see any countervailing efficiency gain from those extra > > cycles. > > > > regards, tom lane > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
I just tried adjusting two parameters: enable_hashagg = off both versions run slower, still 8.2 quicker than 8.3 (by ~ the same %) enable_hashjoin = off both versions run slower, still 8.2 quicker than 8.3 (by ~ the same %) On Jan 29, 2008 10:34 AM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > Vlad, > > What happens if you run the 8.3 test with enable_hashagg set to off? > > Saudações, Clodoaldo Pinto Neto > -- Vlad
More test results for public. We ran my original query and found out that on 4 cores CPU: 1 thread test 8.2 bits 8.3 by 10..15% 4 threads 8.2. wins 8.3 by ~2% 8 threads 8.3 finally wins 8.2 by ~2% the same data set was affected during multi-threaded runs, but it's 100% cached from the disk. I guess we see the result of concurrent scans optimization in 8.3... -- Vlad
"Vlad Marchenko" <marchenko@gmail.com> writes: > I'm curious if that new hash aggregation algorithm was put in 8.3 with the > performance increase as a goal Yes. http://archives.postgresql.org/pgsql-hackers/2007-05/msg01179.php http://archives.postgresql.org/pgsql-committers/2007-06/msg00007.php (and nearby messages) > or it was simply a required change to support > some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't > seems a favorable step for the type of application that we have... You are condemning a demonstrably better algorithm on the basis of one test case. If you poke around a bit harder you will find other cases where it is faster than 8.2 because of having better distribution. regards, tom lane