Thread: Force a merge join?

Force a merge join?

From

Doug Fields

Date:

15 May 2002, 20:11:49

Hello,

[7.2.1, Debian/woody kernel 2.2.20]

I have an extremely slow running query which uses a nested loop. I would
like to force it to use a merge join instead, but I have been unable to
figure out how to fool the optimizer to do that. Alternately, I would
welcome suggestions on how to speed up the query.

Yes, I have tried turning off enable_nestloop and playing with the tuple
costs somewhat.

When I say extremely slowly, I mean it. We're talking 280 seconds for a
query which returns 3000 rows out of a possible ~800,000 using two parts of
the same table (joined together) of sizes 500 and 15000 or so.

Please find below some details on my query, the explain, the table and the
indices. Some columns not referenced have been omitted.

Note that I just increased the RAM on the database server to 2 gigs and
allocated almost 1.4 gigs to shared buffers. All my tables and indices
total size (for now) is about 600 megs, so everything in theory could be
running from memory. I have a continuous vmstat running and show no
swapping, and no "blocks in" activity indicating it is indeed running fully
from memory.

The INTENT of the query is to take a set of e-mail addresses (list A) and
another set (list B) and find the intersection - that is, all entries in A
which have a corresponding entry in B based upon having the same e-mail
address. This is the last one part of a larger query (which is otherwise
very fast) which selects parts of list A and then removes (via combining
with EXCEPT) the remaining ones in the query below. These are then inserted
into yet another table so the whole query looks like INSERT ... SELECT ...
EXCEPT SELECT ... where the last SELECT is the slow query I show below.

By breaking the query into a bunch (about 40) of queries without any ORs
and UNIONing them, I can get about a 3-4 times improvement, but that still
means this is an 80 second query, and is not scalable to the case where it
can't be broken into multiple queries.

Query:

-- Note the odd 745 is because this is used in an INSERT ... SELECT statement
-- Also note that there is no difference if I use IN (1,2,3,4) syntax
SELECT 745, a.list_entry_id
    FROM list_entries a, list_entries b
    WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR
a.list_id=144)
        AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR
b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418)
        AND LOWER(a.email) = LOWER(b.email);

psql's \d on the table name and indices: (use fixed-width font)

Table "list_entries"
     Column     |           Type           |
Modifiers
---------------+--------------------------+--------------------------------------------------------------------
  list_id       | integer                  | not null
  list_entry_id | integer                  | not null default
nextval('"list_entries_list_entry_id_seq"'::text)
  email         | character varying(64)    |
(about a dozen more mostly varchar fields not shown)
Indexes: list_id_idx,
          lower_email_idx
Primary key: list_entries_pkey
Triggers: RI_ConstraintTrigger_1572852

Index "list_id_idx"
  Column  |  Type
---------+---------
  list_id | integer
btree

Index "lower_email_idx"
  Column | Type
--------+------
  lower  | text
btree

Query analysis:

         ->  Nested Loop  (cost=0.00..219086859.00 rows=1585343 width=52)
(actual time=0.27..294568.88 rows=2970 loops=1)
               ->  Index Scan using list_id_idx, list_id_idx, list_id_idx,
list_id_idx, list_id_idx on list_entries a  (cost=0.00..29025.99 rows=14115
width=28) (actual time=0.05..176.79 rows=15859 loops=1)
               ->  Index Scan using lower_email_idx on list_entries
b  (cost=0.00..15513.40 rows=112 width=24) (actual time=0.69..18.55 rows=0
loops=15859)

Database optimizer settings:

NOTICE:  cpu_operator_cost is 0.0025
NOTICE:  cpu_tuple_cost is 0.05
NOTICE:  cpu_index_tuple_cost is 0.001

All enable_X options are ON

Many thanks,

Doug

Re: Force a merge join?

From

Martijn van Oosterhout

Date:

15 May 2002, 21:03:01

On Wed, May 15, 2002 at 03:31:30PM -0400, Doug Fields wrote:

[Much snipped about mergejoins]

>         AND LOWER(a.email) = LOWER(b.email);

There's your problem. You're not comparing the two columns, you're comparing
the two columns after running through a function, so it can't use the index.

Try creating an index on LOWER(email) instead of just email.

HTH,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America

Re: Force a merge join?

From

Tom Lane

Date:

16 May 2002, 00:24:39

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Wed, May 15, 2002 at 03:31:30PM -0400, Doug Fields wrote:
> [Much snipped about mergejoins]
>> AND LOWER(a.email) = LOWER(b.email);

> There's your problem. You're not comparing the two columns, you're comparing
> the two columns after running through a function, so it can't use the index.
> Try creating an index on LOWER(email) instead of just email.

I don't think that will help :-( ... (in fact, it kinda looked like he'd
done that already, though surely the version he's using isn't saying so
explicitly).

The current version of the optimizer is not bright enough to do either
merge or hash joins on join expressions more complex than var1 = var2.
Improving this is on the TODO list ...but in the meantime I wonder why
you couldn't force an email-address column to lower case when you store
it, so as to simplify the join problem.  The RFCs nominally allow the
local-part of an email address to be case sensitive, but in practice
there is no one who really expects a case-sensitive email address to
work.

            regards, tom lane

Re: Force a merge join?

From

Lincoln Yeoh

Date:

16 May 2002, 00:49:51

At 12:24 AM 5/16/02 -0400, Tom Lane wrote:
>Improving this is on the TODO list ...but in the meantime I wonder why
>you couldn't force an email-address column to lower case when you store
>it, so as to simplify the join problem.  The RFCs nominally allow the
>local-part of an email address to be case sensitive, but in practice
>there is no one who really expects a case-sensitive email address to
>work.

Could use an extra column - so you can keep one column for display and one
for matching/searches.

So it would be easier to treat the following addresses as identical without
having to do it all in an SQL function:
Doug Fields <dfields-pg-general@pexicom.com>
Doug Fields <dfields-pg-general@pexicom.com.>
(D Fields ) dfields-pg-general@PEXICOM.COM

Cheerio,
Link.

Re: Force a merge join?

From

Doug Fields

Date:

16 May 2002, 01:52:35

At 09:00 PM 5/15/2002, Martijn van Oosterhout wrote:
>[Much snipped about mergejoins]
>
> >               AND LOWER(a.email) = LOWER(b.email);
>
>There's your problem. You're not comparing the two columns, you're comparing
>the two columns after running through a function, so it can't use the index.
>
>Try creating an index on LOWER(email) instead of just email.

Thanks. It actually is, already. I noticed that the \d lower_email_idx
displays:

pexitest=# \d lower_email_idx
Index "lower_email_idx"
  Column | Type
--------+------
  lower  | text
btree

But if I look at a pg_dump, you can see I covered this base already (I
indexed every column used in the majority of searches by turning on query
debug and then sort | uniq them, a few months ago):

-- Name: "lower_email_idx" Type: INDEX Owner: dfields
--
CREATE INDEX lower_email_idx ON list_entries USING btree (lower(email));

So - that's not the problem.

Although I did run a test on a new table, where I created an additional
column called lower_email and set it accordingly - and it does do the merge
join if you set enable_nestloop=off (but not if it is on).

However, I don't want to store the same data twice...

Other ideas, please?

Cheers,

Doug

Re: Force a merge join?

From

"Ian Harding"

Date:

17 May 2002, 13:38:28

I am having the exact same problem, although setting enable-nestloop = off fixes my problem.  Does it not affect yours?


Anyway, I occasionally recreate and reload my entire database.  When I do this, the planner is flying blind and chooses
mergejoin.  As soon as I vaccum analyze, it chooses nested loop and certain queries take 1.5 days to complete.  If I
setenable_nestloop off, they take seconds. 

Does yours act like this?  That is, if you can reload the data in the affected tables so the planner uses default
values,does it choose merge join?  Tom had indicated he would be interested in why this happens.  I can forward my
schemaand another example to the group if anyone wants. 

Ian A. Harding
Programmer/Analyst II
Tacoma-Pierce County Health Department
(253) 798-3549
mailto: iharding@tpchd.org

>>> Doug Fields <dfields-pg-general@pexicom.com> 05/15/02 12:31PM >>>
Hello,

[7.2.1, Debian/woody kernel 2.2.20]

I have an extremely slow running query which uses a nested loop. I would
like to force it to use a merge join instead, but I have been unable to
figure out how to fool the optimizer to do that. Alternately, I would
welcome suggestions on how to speed up the query.

Yes, I have tried turning off enable_nestloop and playing with the tuple
costs somewhat.

When I say extremely slowly, I mean it. We're talking 280 seconds for a
query which returns 3000 rows out of a possible ~800,000 using two parts of
the same table (joined together) of sizes 500 and 15000 or so.

Please find below some details on my query, the explain, the table and the
indices. Some columns not referenced have been omitted.

Note that I just increased the RAM on the database server to 2 gigs and
allocated almost 1.4 gigs to shared buffers. All my tables and indices
total size (for now) is about 600 megs, so everything in theory could be
running from memory. I have a continuous vmstat running and show no
swapping, and no "blocks in" activity indicating it is indeed running fully
from memory.

The INTENT of the query is to take a set of e-mail addresses (list A) and
another set (list B) and find the intersection - that is, all entries in A
which have a corresponding entry in B based upon having the same e-mail
address. This is the last one part of a larger query (which is otherwise
very fast) which selects parts of list A and then removes (via combining
with EXCEPT) the remaining ones in the query below. These are then inserted
into yet another table so the whole query looks like INSERT ... SELECT ...
EXCEPT SELECT ... where the last SELECT is the slow query I show below.

By breaking the query into a bunch (about 40) of queries without any ORs
and UNIONing them, I can get about a 3-4 times improvement, but that still
means this is an 80 second query, and is not scalable to the case where it
can't be broken into multiple queries.

Query:

-- Note the odd 745 is because this is used in an INSERT ... SELECT statement
-- Also note that there is no difference if I use IN (1,2,3,4) syntax
SELECT 745, a.list_entry_id
    FROM list_entries a, list_entries b
    WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR
a.list_id=144)
        AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR
b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418)
        AND LOWER(a.email) = LOWER(b.email);

psql's \d on the table name and indices: (use fixed-width font)

Table "list_entries"
     Column     |           Type           |
Modifiers
---------------+--------------------------+--------------------------------------------------------------------
  list_id       | integer                  | not null
  list_entry_id | integer                  | not null default
nextval('"list_entries_list_entry_id_seq"'::text)
  email         | character varying(64)    |
(about a dozen more mostly varchar fields not shown)
Indexes: list_id_idx,
          lower_email_idx
Primary key: list_entries_pkey
Triggers: RI_ConstraintTrigger_1572852

Index "list_id_idx"
  Column  |  Type
---------+---------
  list_id | integer
btree

Index "lower_email_idx"
  Column | Type
--------+------
  lower  | text
btree

Query analysis:

         ->  Nested Loop  (cost=0.00..219086859.00 rows=1585343 width=52)
(actual time=0.27..294568.88 rows=2970 loops=1)
               ->  Index Scan using list_id_idx, list_id_idx, list_id_idx,
list_id_idx, list_id_idx on list_entries a  (cost=0.00..29025.99 rows=14115
width=28) (actual time=0.05..176.79 rows=15859 loops=1)
               ->  Index Scan using lower_email_idx on list_entries
b  (cost=0.00..15513.40 rows=112 width=24) (actual time=0.69..18.55 rows=0
loops=15859)

Database optimizer settings:

NOTICE:  cpu_operator_cost is 0.0025
NOTICE:  cpu_tuple_cost is 0.05
NOTICE:  cpu_index_tuple_cost is 0.001

All enable_X options are ON

Many thanks,

Doug


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Force a merge join?

From

Doug Fields

Date:

17 May 2002, 15:55:56

Hi Ian,

At 01:06 PM 5/17/2002, you wrote:
>I am having the exact same problem, although setting enable-nestloop = off
>fixes my problem.  Does it not affect yours?

No, in my case, I'm using LOWER(x) = LOWER(y) which precludes merge joins -
however, when I refactor the DB in a test situation to do x = y, as another
person has mentioned, it can be forced into merge joins.

>Anyway, I occasionally recreate and reload my entire database.  When I do
>this, the planner is flying blind and chooses merge join.  As soon as I
>vaccum analyze, it chooses nested loop and certain queries take 1.5 days
>to complete.  If I set enable_nestloop off, they take seconds.
>
>Does yours act like this?  That is, if you can reload the data in the
>affected tables so the planner uses default values, does it choose merge
>join?  Tom had indicated he would be interested in why this happens.  I
>can forward my schema and another example to the group if anyone wants.

In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO ...
to copy my real table to a testing table so I could refactor it. Then I did
the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I
did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM) and
voila - nested loops and half second queries turning into five minute
nightmares. Then enable_nestloop would fix the problem again after that.

I played with some of the CPU TUPLE parameters but couldn't get it to force
merge joins without giving really ridiculous values which would doubtlessly
screw other things up.

Cheers,

Doug

Re: Force a merge join?

From

Tom Lane

Date:

18 May 2002, 16:24:19

Doug Fields <dfields-pg-general@pexicom.com> writes:
> In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO ...
> to copy my real table to a testing table so I could refactor it. Then I did
> the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I
> did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM) and
> voila - nested loops and half second queries turning into five minute
> nightmares. Then enable_nestloop would fix the problem again after that.

Could we see the usual details here?  Before and after EXPLAIN ANALYZE,
and the schemas and pg_stats rows for the tables involved.

BTW, you don't really have to reload a table to get back to the
"unanalyzed" condition for testing purposes.  You can just manually
delete the rows from pg_statistic:

delete from pg_statistic where
  starelid = (select oid from pg_class where relname = 'mytable');

            regards, tom lane

Re: Force a merge join?

From

Doug Fields

Date:

18 May 2002, 18:30:08

At 04:19 PM 5/18/2002, Tom Lane wrote:
>Doug Fields <dfields-pg-general@pexicom.com> writes:
> > In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO
> ...
> > to copy my real table to a testing table so I could refactor it. Then I
> did
> > the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I
> > did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM)
> and
> > voila - nested loops and half second queries turning into five minute
> > nightmares. Then enable_nestloop would fix the problem again after that.
>
>Could we see the usual details here?  Before and after EXPLAIN ANALYZE,
>and the schemas and pg_stats rows for the tables involved.

Hi Tom,

I gave some of this information in my first post, but I will repeat it here
with the pg_stats and the exact before and after information. Thanks.

delete from pg_statistic where
         starelid = (select oid from pg_class where relname =
'test_list_entries');

EXPLAIN ANALYZE SELECT 745, a.list_entry_id
         FROM test_list_entries a, test_list_entries b
         WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
a.list_id=147 OR a.list_id=144)
                 AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR
b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR
b.list_id=418)
                 AND a.lower_email = b.lower_email;

NOTICE:  QUERY PLAN:

Merge Join  (cost=1926.12..1940.21 rows=247 width=140) (actual
time=448.32..521.99 rows=3674 loops=1)
   ->  Sort  (cost=739.66..739.66 rows=176 width=72) (actual
time=437.18..447.71 rows=15859 loops=1)
         ->  Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
a  (cost=0.00..733.09 rows=176 width=72) (actual time=0.04..139.32
rows=15859 loops=1)
   ->  Sort  (cost=1186.45..1186.45 rows=280 width=68) (actual
time=11.12..13.67 rows=3783 loops=1)
         ->  Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx on test_list_entries
b  (cost=0.00..1175.08 rows=280 width=68) (actual time=0.06..4.75 rows=573
loops=1)
Total runtime: 528.83 msec

ANALYZE;

EXPLAIN SELECT 745, a.list_entry_id
         FROM test_list_entries a, test_list_entries b
         WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
a.list_id=147 OR a.list_id=144)
                 AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR
b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR
b.list_id=418)
                 AND a.lower_email = b.lower_email;

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..1176.19 rows=4 width=140)
   ->  Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
a  (cost=0.00..454.33 rows=84 width=72)
   ->  Index Scan using test_lower_email_idx on test_list_entries
b  (cost=0.00..8.52 rows=1 width=68)

EXPLAIN ANALYZE same thing...

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..1176.19 rows=4 width=140) (actual
time=11.52..388320.86 rows=3674 loops=1)
   ->  Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
a  (cost=0.00..454.33 rows=84 width=72) (actual time=0.06..190.30
rows=15859 loops=1)
   ->  Index Scan using test_lower_email_idx on test_list_entries
b  (cost=0.00..8.52 rows=1 width=68) (actual time=14.86..24.47 rows=0
loops=15859)
Total runtime: 388330.10 msec

select attname, null_frac, avg_width, n_distinct, most_common_freqs,
correlation from pg_stats where tablename = 'test_list_entries';
-- Note, I removed some irrelevant rows

     attname    | null_frac | avg_width | n_distinct
|                                                most_common_freqs
                                       | correlation

---------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+-------------
  list_id       |         0 |         4 |        184 |
{0.383667,0.115667,0.0163333,0.0126667,0.0113333,0.01,0.00966667,0.00933333,0.009,0.009}
|    0.842899
  list_entry_id |         0 |         4 |         -1
|
|    0.719433
  lower_email   |  0.387667 |        68 |   -0.38239 |
{0.0156667,0.000666667,0.000666667,0.000666667,0.000666667}
|  0.00150877
(27 rows)

\d test_list_entries
-- Note I removed some irrelevant columns
               Table "test_list_entries"
     Column     |           Type           | Modifiers
---------------+--------------------------+-----------
  list_id       | integer                  |
  list_entry_id | integer                  |
  lower_email   | character(64)            |
Indexes: test_list_entries_pkey,
          test_list_id_idx,
          test_lower_email_idx

Cheers,

Doug

Re: Force a merge join?

From

Tom Lane

Date:

18 May 2002, 18:34:21

Doug Fields <dfields-pg-general@pexicom.com> writes:
> [ unanalyzed ]

>          ->  Index Scan using test_list_id_idx, test_list_id_idx,
> test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
> a  (cost=0.00..733.09 rows=176 width=72) (actual time=0.04..139.32
> rows=15859 loops=1)

> [ after analyze ]

>    ->  Index Scan using test_list_id_idx, test_list_id_idx,
> test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
> a  (cost=0.00..454.33 rows=84 width=72) (actual time=0.06..190.30
> rows=15859 loops=1)

The major problem clearly is the horribly bad estimate on the
selectivity of the clause
    WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
    a.list_id=147 OR a.list_id=144)
This is showing that the planner estimated 84 matching rows (vs. 176
with no stats!) whereas it was really 15859.

> select attname, null_frac, avg_width, n_distinct, most_common_freqs,
> correlation from pg_stats where tablename = 'test_list_entries';

Could we see the whole pg_stats row for list_id?  In particular I was
wondering if any of the list_id values being selected for appear in
most_common_vals.

            regards, tom lane

Re: Force a merge join?

From

Doug Fields

Date:

18 May 2002, 18:54:17

At 06:31 PM 5/18/2002, Tom Lane wrote:
[Analysis omitted]
>The major problem clearly is the horribly bad estimate on the
>selectivity of the clause
>         WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
>         a.list_id=147 OR a.list_id=144)
>This is showing that the planner estimated 84 matching rows (vs. 176
>with no stats!) whereas it was really 15859.

I find that this is actually fairly typical, where the row estimates and
actual rows are off by orders of magnitudes. Some info on this table:

pexitest=# select count(distinct list_id) from test_list_entries;
select count(*) from test_list_ count
-------
    308
(1 row)

pexitest=# select count(*) from test_list_entries;
  count
--------
  800576
(1 row)

Indicating that the safest assumption based upon no information is that
each list_id has about 2600 records associated with it.

>Could we see the whole pg_stats row for list_id?  In particular I was
>wondering if any of the list_id values being selected for appear in
>most_common_vals.

Absolutely:

select * from pg_stats where tablename = 'test_list_entries' and attname =
'list_id';
tablename     | attname | null_frac | avg_width | n_distinct
|             most_common_vals             |
         most_common_freqs                                         |
       histogram_bounds               | correlation

-------------------+---------+-----------+-----------+------------+------------------------------------------+--------------------------------------------------------------------------------------------------+---------------------------------------------+-------------
  test_list_entries | list_id |         0 |         4 |        189 |
{38,192,369,330,332,501,229,493,319,424} |
{0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333}
| {5,138,154,224,296,315,342,371,439,460,505} |    0.839262
(1 row)

Many thanks,

Doug

Re: Force a merge join?

From

Tom Lane

Date:

18 May 2002, 19:25:24

Doug Fields <dfields-pg-general@pexicom.com> writes:
> At 04:19 PM 5/18/2002, Tom Lane wrote:
>> Doug Fields <dfields-pg-general@pexicom.com> writes:
> In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO
>> ...
> to copy my real table to a testing table so I could refactor it. Then I
>> did
> the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I
> did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM)
>> and
> voila - nested loops and half second queries turning into five minute
> nightmares. Then enable_nestloop would fix the problem again after that.
>>
>> Could we see the usual details here?  Before and after EXPLAIN ANALYZE,
>> and the schemas and pg_stats rows for the tables involved.

> Hi Tom,

> I gave some of this information in my first post, but I will repeat it here
> with the pg_stats and the exact before and after information. Thanks.

> delete from pg_statistic where
>          starelid = (select oid from pg_class where relname =
> 'test_list_entries');

> EXPLAIN ANALYZE SELECT 745, a.list_entry_id
>          FROM test_list_entries a, test_list_entries b
>          WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
> a.list_id=147 OR a.list_id=144)
>                  AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR
> b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR
> b.list_id=418)
>                  AND a.lower_email = b.lower_email;

> NOTICE:  QUERY PLAN:

> Merge Join  (cost=1926.12..1940.21 rows=247 width=140) (actual
> time=448.32..521.99 rows=3674 loops=1)
>    ->  Sort  (cost=739.66..739.66 rows=176 width=72) (actual
> time=437.18..447.71 rows=15859 loops=1)
>          ->  Index Scan using test_list_id_idx, test_list_id_idx,
> test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
> a  (cost=0.00..733.09 rows=176 width=72) (actual time=0.04..139.32
> rows=15859 loops=1)
>    ->  Sort  (cost=1186.45..1186.45 rows=280 width=68) (actual
> time=11.12..13.67 rows=3783 loops=1)
>          ->  Index Scan using test_list_id_idx, test_list_id_idx,
> test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx,
> test_list_id_idx, test_list_id_idx on test_list_entries
> b  (cost=0.00..1175.08 rows=280 width=68) (actual time=0.06..4.75 rows=573
> loops=1)
> Total runtime: 528.83 msec

> ANALYZE;

> EXPLAIN SELECT 745, a.list_entry_id
>          FROM test_list_entries a, test_list_entries b
>          WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
> a.list_id=147 OR a.list_id=144)
>                  AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR
> b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR
> b.list_id=418)
>                  AND a.lower_email = b.lower_email;

> NOTICE:  QUERY PLAN:

> Nested Loop  (cost=0.00..1176.19 rows=4 width=140)
>    ->  Index Scan using test_list_id_idx, test_list_id_idx,
> test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
> a  (cost=0.00..454.33 rows=84 width=72)
>    ->  Index Scan using test_lower_email_idx on test_list_entries
> b  (cost=0.00..8.52 rows=1 width=68)

> EXPLAIN ANALYZE same thing...

> NOTICE:  QUERY PLAN:

> Nested Loop  (cost=0.00..1176.19 rows=4 width=140) (actual
> time=11.52..388320.86 rows=3674 loops=1)
>    ->  Index Scan using test_list_id_idx, test_list_id_idx,
> test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
> a  (cost=0.00..454.33 rows=84 width=72) (actual time=0.06..190.30
> rows=15859 loops=1)
>    ->  Index Scan using test_lower_email_idx on test_list_entries
> b  (cost=0.00..8.52 rows=1 width=68) (actual time=14.86..24.47 rows=0
> loops=15859)
> Total runtime: 388330.10 msec

> select attname, null_frac, avg_width, n_distinct, most_common_freqs,
> correlation from pg_stats where tablename = 'test_list_entries';
> -- Note, I removed some irrelevant rows

>      attname    | null_frac | avg_width | n_distinct
> |                                                most_common_freqs
>                                        | correlation
>
---------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+-------------
>   list_id       |         0 |         4 |        184 |
> {0.383667,0.115667,0.0163333,0.0126667,0.0113333,0.01,0.00966667,0.00933333,0.009,0.009}
> |    0.842899
>   list_entry_id |         0 |         4 |         -1
> |
> |    0.719433
>   lower_email   |  0.387667 |        68 |   -0.38239 |
> {0.0156667,0.000666667,0.000666667,0.000666667,0.000666667}
> |  0.00150877
> (27 rows)

> \d test_list_entries
> -- Note I removed some irrelevant columns
>                Table "test_list_entries"
>      Column     |           Type           | Modifiers
> ---------------+--------------------------+-----------
>   list_id       | integer                  |
>   list_entry_id | integer                  |
>   lower_email   | character(64)            |
> Indexes: test_list_entries_pkey,
>           test_list_id_idx,
>           test_lower_email_idx

> Cheers,

> Doug

Re: Force a merge join?

From

Tom Lane

Date:

18 May 2002, 19:36:04

Doug Fields <dfields-pg-general@pexicom.com> writes:
> I find that this is actually fairly typical, where the row estimates and
> actual rows are off by orders of magnitudes.

One would like to think that 7.2 is better than previous releases,
especially for relatively simple queries such as this.

> pexitest=# select count(distinct list_id) from test_list_entries;
>  count
> -------
>     308
> (1 row)

As opposed to the pg_stats estimate of 189 ... not too bad, really.
Is the most-common-values distribution shown in the pg_stats output
reasonably correct?  (Specifically, 38 has nearly 40% of the entries,
192 another 12%, and everything else 1.5% or less)

Another question is exactly what version you are running.  I tried
plugging the stats values you gave into pg_statistic by hand, and
got this plan from current sources:

 Merge Join  (cost=29720.79..29843.05 rows=423 width=140)
   Merge Cond: ("outer".lower_email = "inner".lower_email)
   ->  Sort  (cost=11371.23..11393.77 rows=9016 width=72)
         Sort Key: a.lower_email
         ->  Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx
ontest_list_entries a  (cost=0.00..10565.78 rows=9016 width=72) 
               Index Cond: ((list_id = 148) OR (list_id = 146) OR (list_id = 145) OR (list_id = 147) OR (list_id =
144))
   ->  Sort  (cost=18349.56..18385.50 rows=14377 width=68)
         Sort Key: b.lower_email
         ->  Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx,
test_list_id_idx,test_list_id_idx, test_list_id_idx on test_list_entries b  (cost=0.00..17013.92 rows=14377 width=68) 
               Index Cond: ((list_id = 247) OR (list_id = 433) OR (list_id = 249) OR (list_id = 434) OR (list_id = 238)
OR(list_id = 340) OR (list_id = 339) OR (list_id = 418)) 
(10 rows)

ie, it's estimating about 1800 matches per list_id value, which seems
pretty reasonable given that it knows none of these values are in the
most_common list.  Now I don't see anything in the CVS logs to suggest
that the estimation of this query would've changed since late in 7.2
beta cycle, so I'm confused why you don't get similar results.

            regards, tom lane

Re: Force a merge join?

From

Doug Fields

Date:

18 May 2002, 19:54:52

> >     308
>
>As opposed to the pg_stats estimate of 189 ... not too bad, really.
>Is the most-common-values distribution shown in the pg_stats output
>reasonably correct?  (Specifically, 38 has nearly 40% of the entries,
>192 another 12%, and everything else 1.5% or less)

The output is actually pretty in terms of which should be in the top 10,
but not which exactly:

pexitest=# select list_id, count(*) from test_list_entries group by list_id
order by count(*) DESC;
  list_id | count
---------+--------
      192 | 173290
      330 |  16174
      501 |  14054
      369 |  12659
      229 |  12654
      332 |  11429
      342 |  10982
      454 |  10404
      493 |   9835
      424 |   9778
      460 |   9707
       38 |   9454
      331 |   9355
      323 |   9232
      319 |   9164

So that correlates pretty well with it's guesses as to the top 10:
{38,192,369,330,332,501,229,493,319,424}

But not so well as to their relative distributions:
{0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333}

Some other stats:

select avg(count) from (select list_id, count(*) from test_list_entries
group by list_id order by count(*) DESC) as a;
        avg
-----------------
  2599.2727272727
(1 row)

pexitest=# select stddev(count) from (select list_id, count(*) from
test_list_entries group by list_id order by count(*) DESC) as a;
       stddev
------------------
  10160.4314402693
(1 row)

>Another question is exactly what version you are running.  I tried
>plugging the stats values you gave into pg_statistic by hand, and
>got this plan from current sources:

I'm running 7.2.1 as packaged and distributed in the Debian/woody 7.2.1-2
distribution.

>ie, it's estimating about 1800 matches per list_id value, which seems
>pretty reasonable given that it knows none of these values are in the
>most_common list.  Now I don't see anything in the CVS logs to suggest
>that the estimation of this query would've changed since late in 7.2
>beta cycle, so I'm confused why you don't get similar results.

It would be nice if I could get it to use a query such as the ones I gave
above to put exact values into the analysis. Or, if I could tell it to do a
more detailed sampling during ANALYZE. I could also tell it to keep more
than the top 10 in the statistics table (SET STATISTICS), but I'm not sure
what it would buy me, other than forcing a larger sample (but not knowing
how much larger).

How much would I slow the ANALYZE statement, and more importantly, the
query optimizer, if I told it to keep statistics on the top 200 instead of
the default 10 values?

In the mean time, I've surrounded my code with SET ENABLE_NESTLOOP=OFF and
ON blocks, which should force merge joins.

I appreciate your insight on this matter and others Tom. Thanks!

Cheers,

Doug

Re: Force a merge join?

From

Tom Lane

Date:

18 May 2002, 20:08:54

Doug Fields <dfields-pg-general@pexicom.com> writes:
> So that correlates pretty well with it's guesses as to the top 10:
> {38,192,369,330,332,501,229,493,319,424}

> But not so well as to their relative distributions:
> {0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333}

Very curious.  I'd have expected it to recognize 192 as the most common
value, given that actual distribution.  Is the analyze result repeatable?

> Or, if I could tell it to do a
> more detailed sampling during ANALYZE. I could also tell it to keep more
> than the top 10 in the statistics table (SET STATISTICS), but I'm not sure
> what it would buy me, other than forcing a larger sample

The sample size scales linearly with the SET STATISTICS target (more
specifically, with the largest target among the columns being analyzed).
I was just about to suggest that you try setting a larger target and see
if the stats get better.

> How much would I slow the ANALYZE statement, and more importantly, the
> query optimizer, if I told it to keep statistics on the top 200 instead of
> the default 10 values?

200 seems like overkill... 20 or 30 might well be enough.

FWIW, I went back and loaded the test case into 7.2.1, and I still get
the same estimates I showed from current sources.  So there's something
very fishy about your results.  Don't know where to look for the cause
of the discrepancy at the moment.

            regards, tom lane

PS: sorry about the blank post before...

Re: Force a merge join?

From

"Ian Harding"

Date:

20 May 2002, 14:18:31

OK, I posted the current schema and before and after EXPLAIN output.

The problem query is "select * from vtb_allocpayroll".  It is messy, but works fine until I analyze things...

http://www.pakrat.com/pgtrouble

I will look into the pg_stats thing I saw mentioned, but am not sure if it applies since I am way back at 7.1.3.

planning=# select version();
                           version
--------------------------------------------------------------
 PostgreSQL 7.1.3 on i386--netbsd, compiled by GCC egcs-1.1.2
(1 row)

Ian A. Harding
Programmer/Analyst II
Tacoma-Pierce County Health Department
(253) 798-3549
mailto: iharding@tpchd.org

>>> Neil Conway <nconway@klamath.dyndns.org> 05/17/02 04:08PM >>>
On Fri, 17 May 2002 10:06:32 -0700
"Ian Harding" <ianh@tpchd.org> wrote:
> I can forward my schema and another example to the group if anyone wants.

Yes, I'd be interested in seeing that, as well as the output of
EXPLAIN ANALYZE for both the good and bad query plans.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC