[BUGS] Query planner skipping index depending on DISTINCT parameter order(2) - Mailing list pgsql-bugs

From Дилян Палаузов
Subject [BUGS] Query planner skipping index depending on DISTINCT parameter order(2)
Date
Msg-id 57ccc03b-279a-63f7-19c6-6fe3b2d0e1be@aegee.org
Whole thread Raw
In response to [BUGS] Query planner skipping index depending on DISTINCT parameter order  (Дилян Палаузов<dilyan.palauzov@aegee.org>)
Responses Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)
List pgsql-bugs
Post Scriptum to my email from yesterday.

I deleted the_index, so that I have now

spamassassin=> \d bayes_token              Table "public.bayes_token"
┌────────────┬─────────┬──────────────────────────────┐
│   Column   │  Type   │          Modifiers           │
├────────────┼─────────┼──────────────────────────────┤
│ id         │ integer │ not null default 0           │
│ token      │ bytea   │ not null default '\x'::bytea │
│ spam_count │ integer │ not null default 0           │
│ ham_count  │ integer │ not null default 0           │
│ atime      │ integer │ not null default 0           │
└────────────┴─────────┴──────────────────────────────┘
Indexes:    "bayes_token_pkey" PRIMARY KEY, btree (id, token)    "bayes_token_idx1" btree (token)

and as you can see, there is still a btree index on (id, token).  I expect the same query plans, as before deleting
the_index,as bayes_token_pkey has the same information as the_index had.  This does not happen on my system and I have
nottweaked the default configuration files.
 

spamassassin=> EXPLAIN ANALYZE SELECT DISTINCT(id, token) token FROM bayes_token;

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                           QUERY PLAN
        │
 

├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Unique  (cost=21737.73..22555.11 rows=163476 width=32) (actual time=914.275..1206.239 rows=165523 loops=1)
        │
 
│   ->  Sort  (cost=21737.73..22146.42 rows=163476 width=32) (actual time=914.274..1125.674 rows=165523 loops=1)
        │
 
│         Sort Key: (ROW(id, token))
        │
 
│         Sort Method: external merge  Disk: 6616kB
        │
 
│         ->  Seq Scan on bayes_token  (cost=0.00..3668.76 rows=163476 width=32) (actual time=0.015..50.849 rows=165523
loops=1)│
 
│ Planning time: 0.079 ms
        │
 
│ Execution time: 1216.047 ms
        │
 

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

What is the difference makes ROW between the one case here  Sort Key: (ROW(id, token))
and in the other below:  Sort Key: token, id

Regards  Dilyan

On 09/14/17 17:38, Дилян Палаузов wrote:
> Hello,
> 
> I have this database in Pg 9.6.5, the schema comes from spamassassin:
> 
> ┌────────────┬─────────┬──────────────────────────────┐
> │   Column   │  Type   │          Modifiers           │
> ├────────────┼─────────┼──────────────────────────────┤
> │ id         │ integer │ not null default 0           │
> │ token      │ bytea   │ not null default '\x'::bytea │
> │ spam_count │ integer │ not null default 0           │
> │ ham_count  │ integer │ not null default 0           │
> │ atime      │ integer │ not null default 0           │
> └────────────┴─────────┴──────────────────────────────┘
> Indexes:
>      "bayes_token_pkey" PRIMARY KEY, btree (id, token)
>      "bayes_token_idx1" btree (token)
>      "the_index" btree (id, token)
> 
> with 261000 rows. According to my understanding, "SELECT DISTINCT ON(token, id) token FROM bayes_token;" and "SELECT
DISTINCTON(id, token) token FROM bayes_token;" are supposed to deliver the same results, as only the "token" and "id"
columnsare exchanged in DISTINCT, so the query optimizer is supposed to generate the same query plan.  But it does
not:
> 
> EXPLAIN ANALYZE SELECT DISTINCT ON(token, id) token  FROM bayes_token;
>
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN                                                           │
>
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Unique  (cost=32624.56..34588.73 rows=261889 width=10) (actual time=154.947..264.439 rows=261935 loops=1) │
> │   ->  Sort  (cost=32624.56..33279.28 rows=261889 width=10) (actual time=154.945..215.378 rows=261935 loops=1)
          │
 
> │         Sort Key: token, id │
> │         Sort Method: external merge  Disk: 5624kB │
> │         ->  Seq Scan on bayes_token  (cost=0.00..4579.89 rows=261889 width=10) (actual time=0.015..35.282
rows=261935loops=1) │
 
> │ Planning time: 0.085 ms │
> │ Execution time: 285.303 ms │
>
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> 
> EXPLAIN ANALYZE SELECT DISTINCT ON(id, token) token  FROM bayes_token;
>
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN                 │
>
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Unique  (cost=0.42..13819.07 rows=262023 width=10) (actual time=0.028..207.695 rows=262018 loops=1)
                             │
 
> │   ->  Index Only Scan using the_index on bayes_token (cost=0.42..12508.95 rows=262023 width=10) (actual
time=0.026..147.882rows=262018 loops=1) │
 
> │         Heap Fetches: 261729             │
> │ Planning time: 0.086 ms             │
> │ Execution time: 232.598 ms             │
>
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> 
> The_index and the SELECTs are invented for the sake of this demonstration.
> 
> Regards
>    Diluan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: "joe.liu"
Date:
Subject: Re: [BUGS] Urgent! PGWatch issue
Next
From: alex@yuscott.co.uk
Date:
Subject: [BUGS] BUG #14817: pg_dumpall is not generating create database statements