Re: LIMIT/SORT optimization - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: LIMIT/SORT optimization |
Date | |
Msg-id | 200704080116.l381Gf926989@momjian.us Whole thread Raw |
In response to | Re: LIMIT/SORT optimization (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: LIMIT/SORT optimization
|
List | pgsql-patches |
I reran the test using: test=> CREATE TABLE test (x INTEGER); test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); test=> SET log_min_duration_statement = 0; and got on an unpatched system: 1751.320 ms select * from (select * from test order by x limit 3) as x limit 1; 1725.092 ms select * from (select * from test order by x limit 3) as x limit 1; 1709.463 ms select * from (select * from test order by x limit 3) as x limit 1; 1702.917 ms select * from (select * from test order by x limit 10) as x limit 1; 1705.793 ms select * from (select * from test order by x limit 10) as x limit 1; 1704.046 ms select * from (select * from test order by x limit 10) as x limit 1; 1699.730 ms select * from (select * from test order by x limit 100) as x limit 1; 1712.628 ms select * from (select * from test order by x limit 100) as x limit 1; 1699.454 ms select * from (select * from test order by x limit 100) as x limit 1; 1720.207 ms select * from (select * from test order by x limit 1000) as x limit 1; 1725.519 ms select * from (select * from test order by x limit 1000) as x limit 1; 1728.933 ms select * from (select * from test order by x limit 1000) as x limit 1; 1699.609 ms select * from (select * from test order by x limit 10000) as x limit 1; 1698.386 ms select * from (select * from test order by x limit 10000) as x limit 1; 1698.985 ms select * from (select * from test order by x limit 10000) as x limit 1; 1700.740 ms select * from (select * from test order by x limit 100000) as x limit 1; 1700.989 ms select * from (select * from test order by x limit 100000) as x limit 1; 1695.771 ms select * from (select * from test order by x limit 100000) as x limit 1; which is expected because the sort work is constant. With the patch I see: 433.892 ms select * from (select * from test order by x limit 3) as x limit 1; 496.016 ms select * from (select * from test order by x limit 3) as x limit 1; 434.604 ms select * from (select * from test order by x limit 3) as x limit 1; 433.265 ms select * from (select * from test order by x limit 10) as x limit 1; 432.058 ms select * from (select * from test order by x limit 10) as x limit 1; 431.329 ms select * from (select * from test order by x limit 10) as x limit 1; 429.722 ms select * from (select * from test order by x limit 100) as x limit 1; 434.754 ms select * from (select * from test order by x limit 100) as x limit 1; 429.758 ms select * from (select * from test order by x limit 100) as x limit 1; 432.060 ms select * from (select * from test order by x limit 1000) as x limit 1; 432.523 ms select * from (select * from test order by x limit 1000) as x limit 1; 433.917 ms select * from (select * from test order by x limit 1000) as x limit 1; 449.885 ms select * from (select * from test order by x limit 10000) as x limit 1; 450.182 ms select * from (select * from test order by x limit 10000) as x limit 1; 450.536 ms select * from (select * from test order by x limit 10000) as x limit 1; 1771.807 ms select * from (select * from test order by x limit 100000) as x limit 1; 1746.628 ms select * from (select * from test order by x limit 100000) as x limit 1; 1795.600 ms select * from (select * from test order by x limit 100000) as x limit 1; The patch is faster until we hit 100k or 10% of the table, at which point it is the same speed. What is interesting is 1M is also the same speed: 1756.401 ms select * from (select * from test order by x limit 1000000) as x limit 1; 1744.104 ms select * from (select * from test order by x limit 1000000) as x limit 1; 1734.198 ms select * from (select * from test order by x limit 1000000) as x limit 1; This is with the default work_mem of '1M'. I used LIMIT 1 so the times were not affected by the size of the data transfer to the client. --------------------------------------------------------------------------- Bruce Momjian wrote: > > I did some performance testing of the patch, and the results were good. > I did this: > > test=> CREATE TABLE test (x INTEGER); > test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); > test=> SET log_min_duration_statement = 0; > test=> SELECT * FROM test ORDER BY x LIMIT 3; > > and the results where, before the patch, for three runs: > > LOG: duration: 1753.518 ms statement: select * from test order by x limit 3; > LOG: duration: 1766.019 ms statement: select * from test order by x limit 3; > LOG: duration: 1777.520 ms statement: select * from test order by x limit 3; > > and after the patch: > > LOG: duration: 449.649 ms statement: select * from test order by x limit 3; > LOG: duration: 443.450 ms statement: select * from test order by x limit 3; > LOG: duration: 443.086 ms statement: select * from test order by x limit 3; > > --------------------------------------------------------------------------- > > Gregory Stark wrote: > > > > Updated patch attached: > > > > 1) Removes #if 0 optimizations > > > > 2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to > > keep the code self-documenting purposes rather but isn't needed by anything > > currently > > > > 3) Fixed cost model to represent bounded sorts > > > > > > [ Attachment, skipping... ] > > > > > > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > > > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > > > > > >> There's a few blocks of code surrounded with "#if 0 - #endif". Are those just > > >> leftovers that should be removed, or are things that still need to finished and > > >> enabled? > > > > > > Uhm, I don't remember, will go look, thanks. > > > > Ok, they were left over code from an optimization that I decided wasn't very > > important to pursue. The code that was ifdef'd out detected when disk sorts > > could abort a disk sort merge because it had already generated enough tuples > > for to satisfy the limit. > > > > But I never wrote the code to actually abort the run and it looks a bit > > tricky. In any case the disk sort use case is extremely narrow, you would need > > something like "LIMIT 50000" or more to do it and it would have to be a an > > input table huge enough to cause multiple rounds of merges. > > > > > > I think I've figured out how to adjust the cost model. It turns out that it > > doesn't usually matter whether the cost model is correct since any case where > > the optimization kicks in is a case you're reading a small portion of the > > input so it's a case where an index would be *much* better if available. So > > the only times the optimization is used is when there's no index available. > > Nonetheless it's nice to get the estimates right so that higher levels in the > > plan get reasonable values. > > > > I think I figured out how to do the cost model. At least the results are > > reasonable. I'm not sure if I've done it the "right" way though. > > > > > > -- > > Gregory Stark > > EnterpriseDB http://www.enterprisedb.com > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-patches by date: