Thread: Query plan choice issue
Hi all, I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can be executed using either of two different query plans, one taking a few milliseconds and the other, tens of seconds. The work_mem setting doesn't seem to affect it -- tried to increase or decrease it by 2 or 4 times, but it didn't seem to favour the fast plan choice. Honestly, I have no idea what affects the plan choice, but I saw Postgresql change it at random. The query in question looks like this -- sorry, it's rather complex: SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 12345 AND status IN (0,1) AND headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255) ILIKE '%<...@mail.gmail.com>%' ORDER BY message_idnr; It comes from DBMail. That said, I don't think DBMail can be blamed here because at least sometimes Postgresql is able to do the right thing with respect to that query. Here is the slow plan: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual time=20933.166..20933.168 rows=1 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual time=3327.658..20933.122 rows=1 loops=1) -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24) (actual time=3327.599..20933.056 rows=1 loops=1) -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8) (actual time=3327.503..20932.944 rows=1 loops=1) Join Filter: (v.headername_id = n.id) -> Seq Scan on dbmail_headername n (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 loops=1) Filter: ((headername)::text ~~* 'MESSAGE- ID'::text) -> Seq Scan on dbmail_headervalue v (cost=0.00..1234407.96 rows=2666 width=16) (actual time=3327.465..20931.942 rows=1 loops=1) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%'::text) -> Index Scan using dbmail_messages_physmessage_idx on dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual time=0.088..0.100 rows=1 loops=1) Index Cond: (m.physmessage_id = v.physmessage_id) Filter: ((m.status = ANY ('{0,1}'::integer[])) AND (m.mailbox_idnr = 12345)) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual time=0.048..0.050 rows=1 loops=1) Index Cond: (p.id = m.physmessage_id) Total runtime: 20933.241 ms (17 rows) And here is the fast plan: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual time=14.163..14.165 rows=1 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual time=4.272..14.152 rows=1 loops=1) -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24) (actual time=4.261..14.135 rows=1 loops=1) -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32) (actual time=4.249..14.117 rows=1 loops=1) -> Index Scan using dbmail_messages_mailbox_idx on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual time=0.043..2.810 rows=358 loops=1) Index Cond: (mailbox_idnr = 12345) Filter: (status = ANY ('{0,1}'::integer[])) -> Index Scan using dbmail_headervalue_2 on dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual time=0.028..0.029 rows=0 loops=358) Index Cond: (v.physmessage_id = m.physmessage_id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%'::text) -> Index Scan using dbmail_headername_pkey on dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1) Index Cond: (n.id = v.headername_id) Filter: ((n.headername)::text ~~* 'MESSAGE- ID'::text) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1) Index Cond: (p.id = m.physmessage_id) Total runtime: 14.231 ms (18 rows) Do you think this query plan choice can be affected by postgresql.conf settings? Or shall I file a bug report? I'll be happy to provide the table and index structure details if needed. Thanks! Yar
a cursory look of the plan details a FTS on dbmail_headername invoked by the JOIN clause
JOIN dbmail_headername n ON v.headername_id=n.id
you would accelerate the seek appreciably by placing indexes on both participating columns
v.headername_id
n.id
I also see a FTS on domain_headervalue invoked by the JOIN cluase
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
place indexes on both columns
v.physmessage_id
p.id
the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id uses indexed for both participants
Martin
______________________________________________
Verzicht und Vertraulichkeitanmerkung
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
> From: yar@barnet.com.au
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Query plan choice issue
> Date: Mon, 13 Sep 2010 09:36:35 +1000
>
> Hi all,
>
> I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can
> be executed using either of two different query plans, one taking a
> few milliseconds and the other, tens of seconds. The work_mem setting
> doesn't seem to affect it -- tried to increase or decrease it by 2 or
> 4 times, but it didn't seem to favour the fast plan choice. Honestly,
> I have no idea what affects the plan choice, but I saw Postgresql
> change it at random.
>
> The query in question looks like this -- sorry, it's rather complex:
>
> SELECT message_idnr
> FROM dbmail_messages m
> JOIN dbmail_physmessage p ON m.physmessage_id=p.id
> JOIN dbmail_headervalue v ON v.physmessage_id=p.id
> JOIN dbmail_headername n ON v.headername_id=n.id
> WHERE mailbox_idnr = 12345 AND status IN (0,1) AND
> headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255)
> ILIKE '%<...@mail.gmail.com>%'
> ORDER BY message_idnr;
>
> It comes from DBMail. That said, I don't think DBMail can be blamed
> here because at least sometimes Postgresql is able to do the right
> thing with respect to that query.
>
> Here is the slow plan:
> QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual
> time=20933.166..20933.168 rows=1 loops=1)
> Sort Key: m.message_idnr
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual
> time=3327.658..20933.122 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24)
> (actual time=3327.599..20933.056 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8)
> (actual time=3327.503..20932.944 rows=1 loops=1)
> Join Filter: (v.headername_id = n.id)
> -> Seq Scan on dbmail_headername n
> (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1
> loops=1)
> Filter: ((headername)::text ~~* 'MESSAGE-
> ID'::text)
> -> Seq Scan on dbmail_headervalue v
> (cost=0.00..1234407.96 rows=2666 width=16) (actual
> time=3327.465..20931.942 rows=1 loops=1)
> Filter: ("substring"(v.headervalue, 0,
> 255) ~~* '%<...@mail.gmail.com>%'::text)
> -> Index Scan using dbmail_messages_physmessage_idx on
> dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual
> time=0.088..0.100 rows=1 loops=1)
> Index Cond: (m.physmessage_id = v.physmessage_id)
> Filter: ((m.status = ANY ('{0,1}'::integer[]))
> AND (m.mailbox_idnr = 12345))
> -> Index Scan using dbmail_physmessage_pkey on
> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
> time=0.048..0.050 rows=1 loops=1)
> Index Cond: (p.id = m.physmessage_id)
> Total runtime: 20933.241 ms
> (17 rows)
>
> And here is the fast plan:
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual
> time=14.163..14.165 rows=1 loops=1)
> Sort Key: m.message_idnr
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual
> time=4.272..14.152 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24)
> (actual time=4.261..14.135 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32)
> (actual time=4.249..14.117 rows=1 loops=1)
> -> Index Scan using dbmail_messages_mailbox_idx
> on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual
> time=0.043..2.810 rows=358 loops=1)
> Index Cond: (mailbox_idnr = 12345)
> Filter: (status = ANY ('{0,1}'::integer[]))
> -> Index Scan using dbmail_headervalue_2 on
> dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual
> time=0.028..0.029 rows=0 loops=358)
> Index Cond: (v.physmessage_id =
> m.physmessage_id)
> Filter: ("substring"(v.headervalue, 0,
> 255) ~~* '%<...@mail.gmail.com>%'::text)
> -> Index Scan using dbmail_headername_pkey on
> dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual
> time=0.007..0.008 rows=1 loops=1)
> Index Cond: (n.id = v.headername_id)
> Filter: ((n.headername)::text ~~* 'MESSAGE-
> ID'::text)
> -> Index Scan using dbmail_physmessage_pkey on
> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
> time=0.006..0.007 rows=1 loops=1)
> Index Cond: (p.id = m.physmessage_id)
> Total runtime: 14.231 ms
> (18 rows)
>
> Do you think this query plan choice can be affected by postgresql.conf
> settings? Or shall I file a bug report?
>
> I'll be happy to provide the table and index structure details if
> needed.
>
> Thanks!
>
> Yar
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
JOIN dbmail_headername n ON v.headername_id=n.id
you would accelerate the seek appreciably by placing indexes on both participating columns
v.headername_id
n.id
I also see a FTS on domain_headervalue invoked by the JOIN cluase
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
place indexes on both columns
v.physmessage_id
p.id
the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id uses indexed for both participants
Martin
______________________________________________
Verzicht und Vertraulichkeitanmerkung
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
> From: yar@barnet.com.au
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Query plan choice issue
> Date: Mon, 13 Sep 2010 09:36:35 +1000
>
> Hi all,
>
> I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can
> be executed using either of two different query plans, one taking a
> few milliseconds and the other, tens of seconds. The work_mem setting
> doesn't seem to affect it -- tried to increase or decrease it by 2 or
> 4 times, but it didn't seem to favour the fast plan choice. Honestly,
> I have no idea what affects the plan choice, but I saw Postgresql
> change it at random.
>
> The query in question looks like this -- sorry, it's rather complex:
>
> SELECT message_idnr
> FROM dbmail_messages m
> JOIN dbmail_physmessage p ON m.physmessage_id=p.id
> JOIN dbmail_headervalue v ON v.physmessage_id=p.id
> JOIN dbmail_headername n ON v.headername_id=n.id
> WHERE mailbox_idnr = 12345 AND status IN (0,1) AND
> headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255)
> ILIKE '%<...@mail.gmail.com>%'
> ORDER BY message_idnr;
>
> It comes from DBMail. That said, I don't think DBMail can be blamed
> here because at least sometimes Postgresql is able to do the right
> thing with respect to that query.
>
> Here is the slow plan:
> QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual
> time=20933.166..20933.168 rows=1 loops=1)
> Sort Key: m.message_idnr
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual
> time=3327.658..20933.122 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24)
> (actual time=3327.599..20933.056 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8)
> (actual time=3327.503..20932.944 rows=1 loops=1)
> Join Filter: (v.headername_id = n.id)
> -> Seq Scan on dbmail_headername n
> (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1
> loops=1)
> Filter: ((headername)::text ~~* 'MESSAGE-
> ID'::text)
> -> Seq Scan on dbmail_headervalue v
> (cost=0.00..1234407.96 rows=2666 width=16) (actual
> time=3327.465..20931.942 rows=1 loops=1)
> Filter: ("substring"(v.headervalue, 0,
> 255) ~~* '%<...@mail.gmail.com>%'::text)
> -> Index Scan using dbmail_messages_physmessage_idx on
> dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual
> time=0.088..0.100 rows=1 loops=1)
> Index Cond: (m.physmessage_id = v.physmessage_id)
> Filter: ((m.status = ANY ('{0,1}'::integer[]))
> AND (m.mailbox_idnr = 12345))
> -> Index Scan using dbmail_physmessage_pkey on
> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
> time=0.048..0.050 rows=1 loops=1)
> Index Cond: (p.id = m.physmessage_id)
> Total runtime: 20933.241 ms
> (17 rows)
>
> And here is the fast plan:
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual
> time=14.163..14.165 rows=1 loops=1)
> Sort Key: m.message_idnr
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual
> time=4.272..14.152 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24)
> (actual time=4.261..14.135 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32)
> (actual time=4.249..14.117 rows=1 loops=1)
> -> Index Scan using dbmail_messages_mailbox_idx
> on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual
> time=0.043..2.810 rows=358 loops=1)
> Index Cond: (mailbox_idnr = 12345)
> Filter: (status = ANY ('{0,1}'::integer[]))
> -> Index Scan using dbmail_headervalue_2 on
> dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual
> time=0.028..0.029 rows=0 loops=358)
> Index Cond: (v.physmessage_id =
> m.physmessage_id)
> Filter: ("substring"(v.headervalue, 0,
> 255) ~~* '%<...@mail.gmail.com>%'::text)
> -> Index Scan using dbmail_headername_pkey on
> dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual
> time=0.007..0.008 rows=1 loops=1)
> Index Cond: (n.id = v.headername_id)
> Filter: ((n.headername)::text ~~* 'MESSAGE-
> ID'::text)
> -> Index Scan using dbmail_physmessage_pkey on
> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
> time=0.006..0.007 rows=1 loops=1)
> Index Cond: (p.id = m.physmessage_id)
> Total runtime: 14.231 ms
> (18 rows)
>
> Do you think this query plan choice can be affected by postgresql.conf
> settings? Or shall I file a bug report?
>
> I'll be happy to provide the table and index structure details if
> needed.
>
> Thanks!
>
> Yar
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Hi Martin, Thank you for your response! On 13/09/2010, at 10:49 AM, Martin Gainty wrote: > a cursory look of the plan details a FTS on dbmail_headername > invoked by the JOIN clause > JOIN dbmail_headername n ON v.headername_id=n.id > you would accelerate the seek appreciably by placing indexes on both > participating columns > v.headername_id > n.id Granted, there was no index on v.headername_id but creating one just slowed the query down, with a different plan: CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue (headername_id); EXPLAIN ANALYSE ... QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=222020.81..222020.81 rows=1 width=8) (actual time=28636.426..28636.426 rows=0 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1409.82..222020.80 rows=1 width=8) (actual time=28636.409..28636.409 rows=0 loops=1) -> Nested Loop (cost=1409.82..222012.27 rows=1 width=24) (actual time=28636.405..28636.405 rows=0 loops=1) -> Nested Loop (cost=1409.82..221959.94 rows=6 width=8) (actual time=28543.441..28624.750 rows=1 loops=1) -> Seq Scan on dbmail_headername n (cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1 loops=1) Filter: ((headername)::text ~~* 'MESSAGE- ID'::text) -> Bitmap Heap Scan on dbmail_headervalue v (cost=1409.82..221813.70 rows=2805 width=16) (actual time=28543.411..28623.623 rows=1 loops=1) Recheck Cond: (v.headername_id = n.id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%'::text) -> Bitmap Index Scan on dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) (actual time=17555.572..17555.572 rows=1877009 loops=1) Index Cond: (v.headername_id = n.id) -> Index Scan using dbmail_messages_physmessage_idx on dbmail_messages m (cost=0.00..8.71 rows=1 width=16) (actual time=11.646..11.646 rows=0 loops=1) Index Cond: (m.physmessage_id = v.physmessage_id) Filter: ((m.status = ANY ('{0,1}'::integer[])) AND (m.mailbox_idnr = 12345)) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.52 rows=1 width=8) (never executed) Index Cond: (p.id = m.physmessage_id) Total runtime: 28636.517 ms (19 rows) > I also see a FTS on domain_headervalue invoked by the JOIN cluase > JOIN dbmail_headervalue v ON v.physmessage_id=p.id > place indexes on both columns > v.physmessage_id > p.id Both columns already indexed here: On public.dbmail_headervalue (alias v): "dbmail_headervalue_2" btree (physmessage_id) On public.dbmail_physmessage (alias p): "dbmail_physmessage_pkey" PRIMARY KEY, btree (id) Perhaps I should provide some data on the table sizes. dbmail_headervalue is the largest table with respect to its record count: 36 million records. dbmail_headername is small: 5640 records. dbmail_physmessage and dbmail_messages are of an average size: ~2 million records each. Sorry for my cluelessness on this issue. But Postgresql's ability to build a fast query plan for this query type at least occasionally is encouraging. :-) Yar > the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id > uses indexed for both participants > > Martin > ______________________________________________ > Verzicht und Vertraulichkeitanmerkung > > Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene > Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede > unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. > Diese Nachricht dient lediglich dem Austausch von Informationen und > entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten > Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den > Inhalt uebernehmen. > > >> From: yar@barnet.com.au >> To: pgsql-general@postgresql.org >> Subject: [GENERAL] Query plan choice issue >> Date: Mon, 13 Sep 2010 09:36:35 +1000 >> >> Hi all, >> >> I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can >> be executed using either of two different query plans, one taking a >> few milliseconds and the other, tens of seconds. The work_mem setting >> doesn't seem to affect it -- tried to increase or decrease it by 2 or >> 4 times, but it didn't seem to favour the fast plan choice. Honestly, >> I have no idea what affects the plan choice, but I saw Postgresql >> change it at random. >> >> The query in question looks like this -- sorry, it's rather complex: >> >> SELECT message_idnr >> FROM dbmail_messages m >> JOIN dbmail_physmessage p ON m.physmessage_id=p.id >> JOIN dbmail_headervalue v ON v.physmessage_id=p.id >> JOIN dbmail_headername n ON v.headername_id=n.id >> WHERE mailbox_idnr = 12345 AND status IN (0,1) AND >> headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255) >> ILIKE '%<...@mail.gmail.com>%' >> ORDER BY message_idnr; >> >> It comes from DBMail. That said, I don't think DBMail can be blamed >> here because at least sometimes Postgresql is able to do the right >> thing with respect to that query. >> >> Here is the slow plan: >> QUERY >> PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual >> time=20933.166..20933.168 rows=1 loops=1) >> Sort Key: m.message_idnr >> Sort Method: quicksort Memory: 25kB >> -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual >> time=3327.658..20933.122 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24) >> (actual time=3327.599..20933.056 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8) >> (actual time=3327.503..20932.944 rows=1 loops=1) >> Join Filter: (v.headername_id = n.id) >> -> Seq Scan on dbmail_headername n >> (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 >> loops=1) >> Filter: ((headername)::text ~~* 'MESSAGE- >> ID'::text) >> -> Seq Scan on dbmail_headervalue v >> (cost=0.00..1234407.96 rows=2666 width=16) (actual >> time=3327.465..20931.942 rows=1 loops=1) >> Filter: ("substring"(v.headervalue, 0, >> 255) ~~* '%<...@mail.gmail.com>%'::text) >> -> Index Scan using dbmail_messages_physmessage_idx on >> dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual >> time=0.088..0.100 rows=1 loops=1) >> Index Cond: (m.physmessage_id = v.physmessage_id) >> Filter: ((m.status = ANY ('{0,1}'::integer[])) >> AND (m.mailbox_idnr = 12345)) >> -> Index Scan using dbmail_physmessage_pkey on >> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual >> time=0.048..0.050 rows=1 loops=1) >> Index Cond: (p.id = m.physmessage_id) >> Total runtime: 20933.241 ms >> (17 rows) >> >> And here is the fast plan: >> QUERY >> PLAN >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual >> time=14.163..14.165 rows=1 loops=1) >> Sort Key: m.message_idnr >> Sort Method: quicksort Memory: 25kB >> -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual >> time=4.272..14.152 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24) >> (actual time=4.261..14.135 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32) >> (actual time=4.249..14.117 rows=1 loops=1) >> -> Index Scan using dbmail_messages_mailbox_idx >> on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual >> time=0.043..2.810 rows=358 loops=1) >> Index Cond: (mailbox_idnr = 12345) >> Filter: (status = ANY ('{0,1}'::integer[])) >> -> Index Scan using dbmail_headervalue_2 on >> dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual >> time=0.028..0.029 rows=0 loops=358) >> Index Cond: (v.physmessage_id = >> m.physmessage_id) >> Filter: ("substring"(v.headervalue, 0, >> 255) ~~* '%<...@mail.gmail.com>%'::text) >> -> Index Scan using dbmail_headername_pkey on >> dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual >> time=0.007..0.008 rows=1 loops=1) >> Index Cond: (n.id = v.headername_id) >> Filter: ((n.headername)::text ~~* 'MESSAGE- >> ID'::text) >> -> Index Scan using dbmail_physmessage_pkey on >> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual >> time=0.006..0.007 rows=1 loops=1) >> Index Cond: (p.id = m.physmessage_id) >> Total runtime: 14.231 ms >> (18 rows) >> >> Do you think this query plan choice can be affected by >> postgresql.conf >> settings? Or shall I file a bug report? >> >> I'll be happy to provide the table and index structure details if >> needed. >> >> Thanks! >> >> Yar >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >
Yaroslav Tykhiy <yar@barnet.com.au> writes: > -> Bitmap Heap Scan on dbmail_headervalue v > (cost=1409.82..221813.70 rows=2805 width=16) (actual > time=28543.411..28623.623 rows=1 loops=1) > Recheck Cond: (v.headername_id = n.id) > Filter: ("substring"(v.headervalue, 0, > 255) ~~* '%<...@mail.gmail.com>%'::text) > -> Bitmap Index Scan on > dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) > (actual time=17555.572..17555.572 rows=1877009 loops=1) > Index Cond: (v.headername_id = n.id) I think the major problem you're having is that the planner is completely clueless about the selectivity of the condition "substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%' If it knew that that would match only one row, instead of several thousand, it would likely pick a different plan. In recent versions of PG you could probably make a noticeable improvement in this if you just dropped the substring() restriction ... do you actually need that? Alternatively, if you don't want to change the query logic at all, I'd try making an index on substring(v.headervalue, 0, 255). I'm not expecting the query to actually *use* the index, mind you. But its existence will prompt ANALYZE to collect stats on the expression's value, and that will help the planner with estimating the ~~* condition. regards, tom lane
Hi Tom, On 14/09/2010, at 12:41 AM, Tom Lane wrote: > Yaroslav Tykhiy <yar@barnet.com.au> writes: >> -> Bitmap Heap Scan on dbmail_headervalue v >> (cost=1409.82..221813.70 rows=2805 width=16) (actual >> time=28543.411..28623.623 rows=1 loops=1) >> Recheck Cond: (v.headername_id = n.id) >> Filter: ("substring"(v.headervalue, 0, >> 255) ~~* '%<...@mail.gmail.com>%'::text) >> -> Bitmap Index Scan on >> dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) >> (actual time=17555.572..17555.572 rows=1877009 loops=1) >> Index Cond: (v.headername_id = n.id) > > I think the major problem you're having is that the planner is > completely clueless about the selectivity of the condition > "substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%' > If it knew that that would match only one row, instead of several > thousand, it would likely pick a different plan. > > In recent versions of PG you could probably make a noticeable > improvement in this if you just dropped the substring() restriction > ... do you actually need that? Alternatively, if you don't want to > change the query logic at all, I'd try making an index on > substring(v.headervalue, 0, 255). I'm not expecting the query > to actually *use* the index, mind you. But its existence will prompt > ANALYZE to collect stats on the expression's value, and that will > help the planner with estimating the ~~* condition. Well, that substring() and ILIKE combo looked suspicious to me, too. However, there already was an index on substring(v.headervalue, 0, 255) but the fast query plan didn't seem to use it, it used a different index instead: mail=# \d dbmail_headervalue Table "public.dbmail_headervalue" Column | Type | Modifiers ----------------+-------- +------------------------------------------------------------------- headername_id | bigint | not null physmessage_id | bigint | not null id | bigint | not null default nextval('dbmail_headervalue_idnr_seq'::regclass) headervalue | text | not null default ''::text Indexes: "dbmail_headervalue_pkey" PRIMARY KEY, btree (id) "dbmail_headervalue_1" UNIQUE, btree (physmessage_id, id) "dbmail_headervalue_2" btree (physmessage_id) "dbmail_headervalue_3" btree ("substring"(headervalue, 0, 255)) ... EXPLAIN ANALYZE... -> Index Scan using dbmail_headervalue_2 on dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual time=0.028..0.029 rows=0 loops=358) Index Cond: (v.physmessage_id = m.physmessage_id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%'::text) ... Meanwhile, a mate of mine lurking on this list pointed out that reducing random_page_cost might help here and it did: random_page_cost of 2 made the fast query favourable. Can it mean that the default planner configuration slightly overfavours seq scans? Thank you all guys! Yar
On 14/09/2010, at 10:37 AM, Yaroslav Tykhiy wrote: > > On 14/09/2010, at 12:41 AM, Tom Lane wrote: > >> Yaroslav Tykhiy <yar@barnet.com.au> writes: [...] >> >> I think the major problem you're having is that the planner is >> completely clueless about the selectivity of the condition >> "substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%' >> If it knew that that would match only one row, instead of several >> thousand, it would likely pick a different plan. >> >> In recent versions of PG you could probably make a noticeable >> improvement in this if you just dropped the substring() restriction >> ... do you actually need that? Alternatively, if you don't want to >> change the query logic at all, I'd try making an index on >> substring(v.headervalue, 0, 255). I'm not expecting the query >> to actually *use* the index, mind you. But its existence will prompt >> ANALYZE to collect stats on the expression's value, and that will >> help the planner with estimating the ~~* condition. > > Well, that substring() and ILIKE combo looked suspicious to me, > too. However, there already was an index on > substring(v.headervalue, 0, 255) but the fast query plan didn't seem > to use it, it used a different index instead: > [...] > Meanwhile, a mate of mine lurking on this list pointed out that > reducing random_page_cost might help here and it did: > random_page_cost of 2 made the fast query favourable. > > Can it mean that the default planner configuration slightly > overfavours seq scans? > Funnily, after a few days of running with random_page_cost=2, exactly the same query became slow again and I had to reduce random_page_cost further to 1.5 to make it fast. Can it be a sign of a problem in the planner? Thanks! Yar