Re: BUG #14107: Major query planner bug regarding subqueries and indices - Mailing list pgsql-bugs
From | David Rowley |
---|---|
Subject | Re: BUG #14107: Major query planner bug regarding subqueries and indices |
Date | |
Msg-id | CAKJS1f_798-+aewz9GS=OSYqk2mxQTuT7_9pWedSJbDoRzZnVg@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #14107: Major query planner bug regarding subqueries and indices (Mathias Kunter <mathiaskunter@gmail.com>) |
Responses |
Re: BUG #14107: Major query planner bug regarding subqueries
and indices
|
List | pgsql-bugs |
On 11 May 2016 at 23:55, Mathias Kunter <mathiaskunter@gmail.com> wrote: > Sorry for bumping this one more time, but I'd like to share some more > real-life performance test results of using ANY(ARRAY(...)) instead of > IN(...), hoping that you'd maybe still consider implementing such an > optimization into the query planner. Since the test results indicate that > the performance boost can really be massive on certain query types (factor > 1000), I think that it'd really be worth the work. > > > ===== Test setup ===== > > The tables "mb.release" and "mb.release_group" both contain about 1.5 > million rows of real data, taken from the MusicBrainz database, and are of > course properly indexed. All performance tests have been repeated a few > times to be comparable. > > The test covers subqueries which return just a few rows and also subqueries > which return more than 100000 rows. The queries test the performance of IN > vs. ANY(ARRAY()) when used in different scenarios. > > For reference, the full query plans of all used queries are linked below. > > > ===== Tested queries ===== > > 1) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM > mb.release_group WHERE name = 'Bear'); > > 2) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM > mb.release_group WHERE name < 'Bear'); > > 3) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN > (SELECT id FROM mb.release_group WHERE name = 'Bear'); > > 4) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN > (SELECT id FROM mb.release_group WHERE name < 'Bear'); > > 5) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN > (SELECT id FROM mb.release_group WHERE name = 'Bear'); > > 6) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN > (SELECT id FROM mb.release_group WHERE name < 'Bear'); > > > ===== Test results ===== > > All numbers are given in milliseconds and show the total query time > (planning + execution). > > ------------------------------------------- > | Query | IN (...) | = ANY(ARRAY(...)) | > ------------------------------------------- > | 1 | 0.7 | 0.4 | > | 2 | 6001.1 | 2517.8 | > | 3 | 711.3 | 0.5 | > | 4 | > 1000000.0 | 1962.6 | > | 5 | 0.8 | 0.5 | > | 6 | 0.9 | 492.7 | > ------------------------------------------- > > Note: Query 4 using the IN operator has been canceled after running for more > than 15 minutes. How do you find the ANY(ARRAY(...)) version performs with say 10 million records in the array? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-bugs by date: