Thread:

From
Pailloncy Jean-Gérard
Date:
I run the following command three times to prevent cache/disk results.

[...]
dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=764518963 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1;
                                                          QUERY PLAN
------------------------------------------------------------------------
-----------------------------------------------------
  Limit  (cost=173.14..173.14 rows=1 width=4) (actual time=0.357..0.358
rows=1 loops=1)
    ->  Sort  (cost=173.14..173.22 rows=32 width=4) (actual
time=0.354..0.354 rows=1 loops=1)
          Sort Key: rec_id
          ->  Index Scan using url_crc on url  (cost=0.00..172.34
rows=32 width=4) (actual time=0.039..0.271 rows=50 loops=1)
                Index Cond: (crc32 = 764518963)
                Filter: ((crc32 <> 0) AND ((status = 200) OR (status =
304) OR (status = 206)))
  Total runtime: 0.410 ms
(7 rows)

dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=764518963 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT
1;
                                                          QUERY PLAN
------------------------------------------------------------------------
-----------------------------------------------------
  Limit  (cost=173.14..173.14 rows=1 width=8) (actual time=0.378..0.378
rows=1 loops=1)
    ->  Sort  (cost=173.14..173.22 rows=32 width=8) (actual
time=0.375..0.375 rows=1 loops=1)
          Sort Key: crc32, rec_id
          ->  Index Scan using url_crc on url  (cost=0.00..172.34
rows=32 width=8) (actual time=0.038..0.278 rows=50 loops=1)
                Index Cond: (crc32 = 764518963)
                Filter: ((crc32 <> 0) AND ((status = 200) OR (status =
304) OR (status = 206)))
  Total runtime: 0.432 ms
(7 rows)
dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=419903683 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1;
                                                           QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------
  Limit  (cost=0.00..37.03 rows=1 width=4) (actual time=156.712..156.713
rows=1 loops=1)
    ->  Index Scan using url_pkey on url  (cost=0.00..14996.82 rows=405
width=4) (actual time=156.707..156.707 rows=1 loops=1)
          Filter: ((crc32 <> 0) AND (crc32 = 419903683) AND ((status =
200) OR (status = 304) OR (status = 206)))
  Total runtime: 156.769 ms
(4 rows)

dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=419903683 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT
1;
                                                            QUERY PLAN
------------------------------------------------------------------------
--------------------------------------------------------
  Limit  (cost=1910.14..1910.14 rows=1 width=8) (actual
time=4.558..4.559 rows=1 loops=1)
    ->  Sort  (cost=1910.14..1911.15 rows=405 width=8) (actual
time=4.555..4.555 rows=1 loops=1)
          Sort Key: crc32, rec_id
          ->  Index Scan using url_crc on url  (cost=0.00..1892.60
rows=405 width=8) (actual time=0.042..2.935 rows=719 loops=1)
                Index Cond: (crc32 = 419903683)
                Filter: ((crc32 <> 0) AND ((status = 200) OR (status =
304) OR (status = 206)))
  Total runtime: 4.636 ms
(7 rows)

The value 764518963 is not common, it appears 50 times in the table.
The value 419903683 is the third most common value of the table url.

dps=> select u.crc32, count(*)  from url u group by u.crc32 order by
count(*) desc;
     crc32    | count
-------------+------
            0 | 82202
   -946427862 | 10545
    419903683 |   719
    945866756 |   670
[...]

How to setup pgsql to correctly select the good index for index scan ?

I run Pgsql 7.4.x
The database runs under pg_autovacuum daemon.
And a VACUUM FULL VERBOSE ANALYZE was done 10 hours before.

Cordialement,
Jean-Gérard Pailloncy

Re:

From
Tom Lane
Date:
=?ISO-8859-1?Q?Pailloncy_Jean-G=E9rard?= <pailloncy@ifrance.com> writes:
> I run the following command three times to prevent cache/disk results.

Do you think that's actually representative of how your database will
behave under load?

If the DB is small enough to be completely cached in RAM, and you
expect it to remain so, then it's sensible to optimize on the basis
of fully-cached test cases.  Otherwise I think you are optimizing
the wrong thing.

If you do want to plan on this basis, you want to set random_page_cost
to 1, make sure effective_cache_size is large, and perhaps increase
the cpu_xxx cost numbers.  (What you're essentially doing here is
reducing the estimated cost of a page fetch relative to CPU effort.)

            regards, tom lane

Re: 225 times slower

From
Pailloncy Jean-Gérard
Date:
Hi,

I apologize for the mistake.
So, I dump the database, I reload it then VACUUM ANALYZE.
For each statement: I then quit postgres, start it, execute one
command, then quit.

Le 14 avr. 04, à 14:39, Pailloncy Jean-Gérard a écrit :

dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=764518963 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1;
                                                            QUERY PLAN
------------------------------------------------------------------------
--------------------------------------------------------
  Limit  (cost=169.79..169.79 rows=1 width=4) (actual
time=502.397..502.398 rows=1 loops=1)
    ->  Sort  (cost=169.79..169.86 rows=30 width=4) (actual
time=502.393..502.393 rows=1 loops=1)
          Sort Key: rec_id
          ->  Index Scan using url_crc on url  (cost=0.00..169.05
rows=30 width=4) (actual time=43.545..490.895 rows=56 loops=1)
                Index Cond: (crc32 = 764518963)
                Filter: ((crc32 <> 0) AND ((status = 200) OR (status =
304) OR (status = 206)))
  Total runtime: 502.520 ms
(7 rows)
dps=# \q

dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=764518963 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT
1;
                                                          QUERY PLAN
------------------------------------------------------------------------
-----------------------------------------------------
  Limit  (cost=169.79..169.79 rows=1 width=8) (actual time=5.893..5.894
rows=1 loops=1)
    ->  Sort  (cost=169.79..169.86 rows=30 width=8) (actual
time=5.889..5.889 rows=1 loops=1)
          Sort Key: crc32, rec_id
          ->  Index Scan using url_crc on url  (cost=0.00..169.05
rows=30 width=8) (actual time=0.445..5.430 rows=56 loops=1)
                Index Cond: (crc32 = 764518963)
                Filter: ((crc32 <> 0) AND ((status = 200) OR (status =
304) OR (status = 206)))
  Total runtime: 6.020 ms
(7 rows)
dps=# \q

dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=419903683 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1;
                                                             QUERY PLAN
------------------------------------------------------------------------
----------------------------------------------------------
  Limit  (cost=0.00..27.95 rows=1 width=4) (actual
time=11021.875..11021.876 rows=1 loops=1)
    ->  Index Scan using url_pkey on url  (cost=0.00..11625.49 rows=416
width=4) (actual time=11021.868..11021.868 rows=1 loops=1)
          Filter: ((crc32 <> 0) AND (crc32 = 419903683) AND ((status =
200) OR (status = 304) OR (status = 206)))
  Total runtime: 11021.986 ms
(4 rows)
dps=# \q

dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
crc32=419903683 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT
1;
                                                            QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------------------
  Limit  (cost=2000.41..2000.41 rows=1 width=8) (actual
time=48.503..48.504 rows=1 loops=1)
    ->  Sort  (cost=2000.41..2001.45 rows=416 width=8) (actual
time=48.499..48.499 rows=1 loops=1)
          Sort Key: crc32, rec_id
          ->  Index Scan using url_crc on url  (cost=0.00..1982.31
rows=416 width=8) (actual time=4.848..45.452 rows=796 loops=1)
                Index Cond: (crc32 = 419903683)
                Filter: ((crc32 <> 0) AND ((status = 200) OR (status =
304) OR (status = 206)))
  Total runtime: 48.656 ms
(7 rows)
dps=# \q

So, with all fresh data, everything rebuild from scratch, on a backend
that will done one and only one query, the results is strange.
Why adding an ORDER BY clause on a column with one value speed up the
stuff 502ms to 6ms ?
Why when crc32=419903683, which is one of the most often used value in
the table, the query planner chose a plan so bad (225 times slower) ?

Cordialement,
Jean-Gérard Pailloncy


Re: 225 times slower

From
Tom Lane
Date:
=?ISO-8859-1?Q?Pailloncy_Jean-G=E9rard?= <pailloncy@ifrance.com> writes:
> dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND
> crc32=419903683 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1;
>                                                              QUERY PLAN
> ------------------------------------------------------------------------
> ----------------------------------------------------------
>   Limit  (cost=0.00..27.95 rows=1 width=4) (actual
> time=11021.875..11021.876 rows=1 loops=1)
>     ->  Index Scan using url_pkey on url  (cost=0.00..11625.49 rows=416
> width=4) (actual time=11021.868..11021.868 rows=1 loops=1)
>           Filter: ((crc32 <> 0) AND (crc32 = 419903683) AND ((status =
> 200) OR (status = 304) OR (status = 206)))
>   Total runtime: 11021.986 ms
> (4 rows)
> dps=# \q

The planner is guessing that scanning in rec_id order will produce a
matching row fairly quickly (sooner than selecting all the matching rows
and sorting them would do).  It's wrong in this case, but I'm not sure
it could do better without very detailed cross-column statistics.  Am I
right to guess that the rows that match the WHERE clause are not evenly
distributed in the rec_id order, but rather there are no such rows till
you get well up in the ordering?

            regards, tom lane