Thread: Slow SELECT by primary key? Postgres 9.1.2

Slow SELECT by primary key? Postgres 9.1.2

From
John Mudd
Date:
Postgres 9.1.2 on Ubuntu 12.04

Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost. 

I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.



test=# VACUUM ANALYZE test_select;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 12.678 ms

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 31.504 ms



Schema:

test=# \d test_select
                            Table "public.test_select"
 Column |     Type     |                        Modifiers                         
--------+--------------+----------------------------------------------------------
 id     | integer      | not null default nextval('test_select_id_seq'::regclass)
 key1   | integer      | 
 key2   | integer      | 
 key3   | integer      | 
 data   | character(4) | 
Indexes:
    "test_select_pkey" PRIMARY KEY, btree (id)
    "my_key" btree (key1, key2, key3, id)

test=# 



Sample data:

test=# SELECT * FROM test_select LIMIT 10;
 id |  key1  |  key2  |  key3  | data 
----+--------+--------+--------+------
  1 | 984966 | 283954 | 772063 | x   
  2 | 817668 | 393533 | 924888 | x   
  3 | 751039 | 798753 | 454309 | x   
  4 | 128505 | 329643 | 280553 | x   
  5 | 105600 | 257225 | 710015 | x   
  6 | 323891 | 615614 |  83206 | x   
  7 | 194054 |  63506 | 353171 | x   
  8 | 212068 | 881225 | 271804 | x   
  9 | 644180 |  26693 | 200738 | x   
 10 | 136586 | 498699 | 554417 | x   
(10 rows)




Here's how I populated the table:

import psycopg2

conn = psycopg2.connect('dbname=test')

cur = conn.cursor()

def random_int():
    n = 1000000
    return random.randint(0,n)

def random_key():
    return random_int(), random_int(), random_int()

def create_table():
    cur.execute('''
            DROP TABLE IF EXISTS test_select;

            CREATE TABLE test_select (
                id                      SERIAL PRIMARY KEY,
                key1                    INTEGER,
                key2                    INTEGER,
                key3                    INTEGER,
                data                    char(4)
            );
        ''')
    conn.commit()

    n = 1000000
    for i in range(n):
        cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())
    conn.commit()

    cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
    conn.commit()

create_table()

Re: Slow SELECT by primary key? Postgres 9.1.2

From
Evgeny Shishkin
Date:
On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd@gmail.com> wrote:

> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping
usingthe primary key would give me a boost.  
>

You created my_key after data loading, and PK was there all the time.
If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597
rows=1loops=1) 
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>                                                            QUERY PLAN
           
>
---------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398
rows=1loops=1) 
>    Index Cond: (id = 500000)
>  Total runtime: 31.504 ms
>
>
>
> Schema:
>
> test=# \d test_select
>                             Table "public.test_select"
>  Column |     Type     |                        Modifiers
> --------+--------------+----------------------------------------------------------
>  id     | integer      | not null default nextval('test_select_id_seq'::regclass)
>  key1   | integer      |
>  key2   | integer      |
>  key3   | integer      |
>  data   | character(4) |
> Indexes:
>     "test_select_pkey" PRIMARY KEY, btree (id)
>     "my_key" btree (key1, key2, key3, id)
>
> test=#
>
>
>
> Sample data:
>
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data
> ----+--------+--------+--------+------
>   1 | 984966 | 283954 | 772063 | x
>   2 | 817668 | 393533 | 924888 | x
>   3 | 751039 | 798753 | 454309 | x
>   4 | 128505 | 329643 | 280553 | x
>   5 | 105600 | 257225 | 710015 | x
>   6 | 323891 | 615614 |  83206 | x
>   7 | 194054 |  63506 | 353171 | x
>   8 | 212068 | 881225 | 271804 | x
>   9 | 644180 |  26693 | 200738 | x
>  10 | 136586 | 498699 | 554417 | x
> (10 rows)
>
>
>
>
> Here's how I populated the table:
>
> import psycopg2
>
> conn = psycopg2.connect('dbname=test')
>
> cur = conn.cursor()
>
> def random_int():
>     n = 1000000
>     return random.randint(0,n)
>
> def random_key():
>     return random_int(), random_int(), random_int()
>
> def create_table():
>     cur.execute('''
>             DROP TABLE IF EXISTS test_select;
>
>             CREATE TABLE test_select (
>                 id                      SERIAL PRIMARY KEY,
>                 key1                    INTEGER,
>                 key2                    INTEGER,
>                 key3                    INTEGER,
>                 data                    char(4)
>             );
>         ''')
>     conn.commit()
>
>     n = 1000000
>     for i in range(n):
>         cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())
>     conn.commit()
>
>     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
>     conn.commit()
>
> create_table()
>



Re: Slow SELECT by primary key? Postgres 9.1.2

From
John Mudd
Date:
Thanks, that's easy enough to test. Didn't seem to help though.


test=# REINDEX index test_select_pkey;
REINDEX
test=# VACUUM ANALYZE test_select ;
VACUUM


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 16.444 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 23.192 ms




On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd@gmail.com> wrote:

> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost.
>

You created my_key after data loading, and PK was there all the time.
If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>                                                            QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
>    Index Cond: (id = 500000)
>  Total runtime: 31.504 ms
>
>
>
> Schema:
>
> test=# \d test_select
>                             Table "public.test_select"
>  Column |     Type     |                        Modifiers
> --------+--------------+----------------------------------------------------------
>  id     | integer      | not null default nextval('test_select_id_seq'::regclass)
>  key1   | integer      |
>  key2   | integer      |
>  key3   | integer      |
>  data   | character(4) |
> Indexes:
>     "test_select_pkey" PRIMARY KEY, btree (id)
>     "my_key" btree (key1, key2, key3, id)
>
> test=#
>
>
>
> Sample data:
>
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data
> ----+--------+--------+--------+------
>   1 | 984966 | 283954 | 772063 | x
>   2 | 817668 | 393533 | 924888 | x
>   3 | 751039 | 798753 | 454309 | x
>   4 | 128505 | 329643 | 280553 | x
>   5 | 105600 | 257225 | 710015 | x
>   6 | 323891 | 615614 |  83206 | x
>   7 | 194054 |  63506 | 353171 | x
>   8 | 212068 | 881225 | 271804 | x
>   9 | 644180 |  26693 | 200738 | x
>  10 | 136586 | 498699 | 554417 | x
> (10 rows)
>
>
>
>
> Here's how I populated the table:
>
> import psycopg2
>
> conn = psycopg2.connect('dbname=test')
>
> cur = conn.cursor()
>
> def random_int():
>     n = 1000000
>     return random.randint(0,n)
>
> def random_key():
>     return random_int(), random_int(), random_int()
>
> def create_table():
>     cur.execute('''
>             DROP TABLE IF EXISTS test_select;
>
>             CREATE TABLE test_select (
>                 id                      SERIAL PRIMARY KEY,
>                 key1                    INTEGER,
>                 key2                    INTEGER,
>                 key3                    INTEGER,
>                 data                    char(4)
>             );
>         ''')
>     conn.commit()
>
>     n = 1000000
>     for i in range(n):
>         cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())
>     conn.commit()
>
>     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
>     conn.commit()
>
> create_table()
>


Re: Slow SELECT by primary key? Postgres 9.1.2

From
Evgeny Shishkin
Date:

On May 27, 2013, at 6:35 PM, John Mudd <johnbmudd@gmail.com> wrote:

Thanks, that's easy enough to test. Didn't seem to help though.


Ok. And if you CLUSTER tables USING PK?


test=# REINDEX index test_select_pkey;
REINDEX
test=# VACUUM ANALYZE test_select ;
VACUUM


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 16.444 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 23.192 ms




On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd@gmail.com> wrote:

> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost.
>

You created my_key after data loading, and PK was there all the time.
If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>                                                            QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
>    Index Cond: (id = 500000)
>  Total runtime: 31.504 ms
>
>
>
> Schema:
>
> test=# \d test_select
>                             Table "public.test_select"
>  Column |     Type     |                        Modifiers
> --------+--------------+----------------------------------------------------------
>  id     | integer      | not null default nextval('test_select_id_seq'::regclass)
>  key1   | integer      |
>  key2   | integer      |
>  key3   | integer      |
>  data   | character(4) |
> Indexes:
>     "test_select_pkey" PRIMARY KEY, btree (id)
>     "my_key" btree (key1, key2, key3, id)
>
> test=#
>
>
>
> Sample data:
>
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data
> ----+--------+--------+--------+------
>   1 | 984966 | 283954 | 772063 | x
>   2 | 817668 | 393533 | 924888 | x
>   3 | 751039 | 798753 | 454309 | x
>   4 | 128505 | 329643 | 280553 | x
>   5 | 105600 | 257225 | 710015 | x
>   6 | 323891 | 615614 |  83206 | x
>   7 | 194054 |  63506 | 353171 | x
>   8 | 212068 | 881225 | 271804 | x
>   9 | 644180 |  26693 | 200738 | x
>  10 | 136586 | 498699 | 554417 | x
> (10 rows)
>
>
>
>
> Here's how I populated the table:
>
> import psycopg2
>
> conn = psycopg2.connect('dbname=test')
>
> cur = conn.cursor()
>
> def random_int():
>     n = 1000000
>     return random.randint(0,n)
>
> def random_key():
>     return random_int(), random_int(), random_int()
>
> def create_table():
>     cur.execute('''
>             DROP TABLE IF EXISTS test_select;
>
>             CREATE TABLE test_select (
>                 id                      SERIAL PRIMARY KEY,
>                 key1                    INTEGER,
>                 key2                    INTEGER,
>                 key3                    INTEGER,
>                 data                    char(4)
>             );
>         ''')
>     conn.commit()
>
>     n = 1000000
>     for i in range(n):
>         cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())
>     conn.commit()
>
>     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
>     conn.commit()
>
> create_table()
>



Re: Slow SELECT by primary key? Postgres 9.1.2

From
John Mudd
Date:
Thanks again.

Well, I have two problems with using the CLUSTER option. It's only temporary since any updates, depending how much free space is reserved per page, requires re-running the CLUSTER. And my primary concern is that it arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's easy to test so here are the results. The primary key still looses even with the CLUSTER. Granted it is close but considering this is now an unfair comparison it still doesn't make sense to me. How can a search for a specific row that should be fairly straight forward take longer than a search that includes an ORDER BY clause?


test=# CLUSTER test_select USING test_select_pkey ;
CLUSTER
test=# VACUUM ANALYZE test_select ;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41938.15 rows=499992 width=21) (actual time=19.428..19.428 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 19.526 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=21.070..21.072 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 21.178 ms




On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:35 PM, John Mudd <johnbmudd@gmail.com> wrote:

Thanks, that's easy enough to test. Didn't seem to help though.


Ok. And if you CLUSTER tables USING PK?


test=# REINDEX index test_select_pkey;
REINDEX
test=# VACUUM ANALYZE test_select ;
VACUUM


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 16.444 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 23.192 ms




On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd@gmail.com> wrote:

> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost.
>

You created my_key after data loading, and PK was there all the time.
If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>                                                            QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
>    Index Cond: (id = 500000)
>  Total runtime: 31.504 ms
>
>
>
> Schema:
>
> test=# \d test_select
>                             Table "public.test_select"
>  Column |     Type     |                        Modifiers
> --------+--------------+----------------------------------------------------------
>  id     | integer      | not null default nextval('test_select_id_seq'::regclass)
>  key1   | integer      |
>  key2   | integer      |
>  key3   | integer      |
>  data   | character(4) |
> Indexes:
>     "test_select_pkey" PRIMARY KEY, btree (id)
>     "my_key" btree (key1, key2, key3, id)
>
> test=#
>
>
>
> Sample data:
>
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data
> ----+--------+--------+--------+------
>   1 | 984966 | 283954 | 772063 | x
>   2 | 817668 | 393533 | 924888 | x
>   3 | 751039 | 798753 | 454309 | x
>   4 | 128505 | 329643 | 280553 | x
>   5 | 105600 | 257225 | 710015 | x
>   6 | 323891 | 615614 |  83206 | x
>   7 | 194054 |  63506 | 353171 | x
>   8 | 212068 | 881225 | 271804 | x
>   9 | 644180 |  26693 | 200738 | x
>  10 | 136586 | 498699 | 554417 | x
> (10 rows)
>
>
>
>
> Here's how I populated the table:
>
> import psycopg2
>
> conn = psycopg2.connect('dbname=test')
>
> cur = conn.cursor()
>
> def random_int():
>     n = 1000000
>     return random.randint(0,n)
>
> def random_key():
>     return random_int(), random_int(), random_int()
>
> def create_table():
>     cur.execute('''
>             DROP TABLE IF EXISTS test_select;
>
>             CREATE TABLE test_select (
>                 id                      SERIAL PRIMARY KEY,
>                 key1                    INTEGER,
>                 key2                    INTEGER,
>                 key3                    INTEGER,
>                 data                    char(4)
>             );
>         ''')
>     conn.commit()
>
>     n = 1000000
>     for i in range(n):
>         cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())
>     conn.commit()
>
>     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
>     conn.commit()
>
> create_table()
>




Re: Slow SELECT by primary key? Postgres 9.1.2

From
Evgeniy Shishkin
Date:




On 28.05.2013, at 2:17, John Mudd <johnbmudd@gmail.com> wrote:

Thanks again.

Well, I have two problems with using the CLUSTER option. It's only temporary since any updates, depending how much free space is reserved per page, requires re-running the CLUSTER. And my primary concern is that it arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's easy to test so here are the results. The primary key still looses even with the CLUSTER. Granted it is close but considering this is now an unfair comparison it still doesn't make sense to me. How can a search for a specific row that should be fairly straight forward take longer than a search that includes an ORDER BY clause?


Well, you do just regular index scan because of LIMIT 1.

And now it is just a matter of index size and table organization.

I also don't understand why you consider CLUSTER unfair - the way you populated the table was natural cluster over my_key.

But it bothers me why my_key is always better. Can you please test it on different values but the same rows? Because now it is two different tuples and you count every io.


test=# CLUSTER test_select USING test_select_pkey ;
CLUSTER
test=# VACUUM ANALYZE test_select ;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41938.15 rows=499992 width=21) (actual time=19.428..19.428 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 19.526 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=21.070..21.072 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 21.178 ms




On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:35 PM, John Mudd <johnbmudd@gmail.com> wrote:

Thanks, that's easy enough to test. Didn't seem to help though.


Ok. And if you CLUSTER tables USING PK?


test=# REINDEX index test_select_pkey;
REINDEX
test=# VACUUM ANALYZE test_select ;
VACUUM


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 16.444 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 23.192 ms




On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd@gmail.com> wrote:

> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost.
>

You created my_key after data loading, and PK was there all the time.
If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>                                                            QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
>    Index Cond: (id = 500000)
>  Total runtime: 31.504 ms
>
>
>
> Schema:
>
> test=# \d test_select
>                             Table "public.test_select"
>  Column |     Type     |                        Modifiers
> --------+--------------+----------------------------------------------------------
>  id     | integer      | not null default nextval('test_select_id_seq'::regclass)
>  key1   | integer      |
>  key2   | integer      |
>  key3   | integer      |
>  data   | character(4) |
> Indexes:
>     "test_select_pkey" PRIMARY KEY, btree (id)
>     "my_key" btree (key1, key2, key3, id)
>
> test=#
>
>
>
> Sample data:
>
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data
> ----+--------+--------+--------+------
>   1 | 984966 | 283954 | 772063 | x
>   2 | 817668 | 393533 | 924888 | x
>   3 | 751039 | 798753 | 454309 | x
>   4 | 128505 | 329643 | 280553 | x
>   5 | 105600 | 257225 | 710015 | x
>   6 | 323891 | 615614 |  83206 | x
>   7 | 194054 |  63506 | 353171 | x
>   8 | 212068 | 881225 | 271804 | x
>   9 | 644180 |  26693 | 200738 | x
>  10 | 136586 | 498699 | 554417 | x
> (10 rows)
>
>
>
>
> Here's how I populated the table:
>
> import psycopg2
>
> conn = psycopg2.connect('dbname=test')
>
> cur = conn.cursor()
>
> def random_int():
>     n = 1000000
>     return random.randint(0,n)
>
> def random_key():
>     return random_int(), random_int(), random_int()
>
> def create_table():
>     cur.execute('''
>             DROP TABLE IF EXISTS test_select;
>
>             CREATE TABLE test_select (
>                 id                      SERIAL PRIMARY KEY,
>                 key1                    INTEGER,
>                 key2                    INTEGER,
>                 key3                    INTEGER,
>                 data                    char(4)
>             );
>         ''')
>     conn.commit()
>
>     n = 1000000
>     for i in range(n):
>         cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())
>     conn.commit()
>
>     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
>     conn.commit()
>
> create_table()
>




Re: Slow SELECT by primary key? Postgres 9.1.2

From
Roman Konoval
Date:
John,

And can you please include BUFFERS to ANALYZE?

Regards,
Roman Konoval


On Tue, May 28, 2013 at 9:48 AM, Evgeniy Shishkin <itparanoia@gmail.com> wrote:




On 28.05.2013, at 2:17, John Mudd <johnbmudd@gmail.com> wrote:

Thanks again.

Well, I have two problems with using the CLUSTER option. It's only temporary since any updates, depending how much free space is reserved per page, requires re-running the CLUSTER. And my primary concern is that it arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's easy to test so here are the results. The primary key still looses even with the CLUSTER. Granted it is close but considering this is now an unfair comparison it still doesn't make sense to me. How can a search for a specific row that should be fairly straight forward take longer than a search that includes an ORDER BY clause?


Well, you do just regular index scan because of LIMIT 1.

And now it is just a matter of index size and table organization.

I also don't understand why you consider CLUSTER unfair - the way you populated the table was natural cluster over my_key.

But it bothers me why my_key is always better. Can you please test it on different values but the same rows? Because now it is two different tuples and you count every io.


test=# CLUSTER test_select USING test_select_pkey ;
CLUSTER
test=# VACUUM ANALYZE test_select ;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41938.15 rows=499992 width=21) (actual time=19.428..19.428 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 19.526 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=21.070..21.072 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 21.178 ms




On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:35 PM, John Mudd <johnbmudd@gmail.com> wrote:

Thanks, that's easy enough to test. Didn't seem to help though.


Ok. And if you CLUSTER tables USING PK?


test=# REINDEX index test_select_pkey;
REINDEX
test=# VACUUM ANALYZE test_select ;
VACUUM


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 16.444 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 23.192 ms




On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparanoia@gmail.com> wrote:

On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd@gmail.com> wrote:

> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost.
>

You created my_key after data loading, and PK was there all the time.
If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>                                                            QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
>    Index Cond: (id = 500000)
>  Total runtime: 31.504 ms
>
>
>
> Schema:
>
> test=# \d test_select
>                             Table "public.test_select"
>  Column |     Type     |                        Modifiers
> --------+--------------+----------------------------------------------------------
>  id     | integer      | not null default nextval('test_select_id_seq'::regclass)
>  key1   | integer      |
>  key2   | integer      |
>  key3   | integer      |
>  data   | character(4) |
> Indexes:
>     "test_select_pkey" PRIMARY KEY, btree (id)
>     "my_key" btree (key1, key2, key3, id)
>
> test=#
>
>
>
> Sample data:
>
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data
> ----+--------+--------+--------+------
>   1 | 984966 | 283954 | 772063 | x
>   2 | 817668 | 393533 | 924888 | x
>   3 | 751039 | 798753 | 454309 | x
>   4 | 128505 | 329643 | 280553 | x
>   5 | 105600 | 257225 | 710015 | x
>   6 | 323891 | 615614 |  83206 | x
>   7 | 194054 |  63506 | 353171 | x
>   8 | 212068 | 881225 | 271804 | x
>   9 | 644180 |  26693 | 200738 | x
>  10 | 136586 | 498699 | 554417 | x
> (10 rows)
>
>
>
>
> Here's how I populated the table:
>
> import psycopg2
>
> conn = psycopg2.connect('dbname=test')
>
> cur = conn.cursor()
>
> def random_int():
>     n = 1000000
>     return random.randint(0,n)
>
> def random_key():
>     return random_int(), random_int(), random_int()
>
> def create_table():
>     cur.execute('''
>             DROP TABLE IF EXISTS test_select;
>
>             CREATE TABLE test_select (
>                 id                      SERIAL PRIMARY KEY,
>                 key1                    INTEGER,
>                 key2                    INTEGER,
>                 key3                    INTEGER,
>                 data                    char(4)
>             );
>         ''')
>     conn.commit()
>
>     n = 1000000
>     for i in range(n):
>         cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())
>     conn.commit()
>
>     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
>     conn.commit()
>
> create_table()
>





Re: Slow SELECT by primary key? Postgres 9.1.2

From
Matheus de Oliveira
Date:



On Mon, May 27, 2013 at 11:02 AM, John Mudd <johnbmudd@gmail.com> wrote:
Postgres 9.1.2 on Ubuntu 12.04

Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost. 

I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.



test=# VACUUM ANALYZE test_select;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 12.678 ms

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 31.504 ms



Schema:

test=# \d test_select
                            Table "public.test_select"
 Column |     Type     |                        Modifiers                         
--------+--------------+----------------------------------------------------------
 id     | integer      | not null default nextval('test_select_id_seq'::regclass)
 key1   | integer      | 
 key2   | integer      | 
 key3   | integer      | 
 data   | character(4) | 
Indexes:
    "test_select_pkey" PRIMARY KEY, btree (id)
    "my_key" btree (key1, key2, key3, id)

test=# 



Sample data:

test=# SELECT * FROM test_select LIMIT 10;
 id |  key1  |  key2  |  key3  | data 
----+--------+--------+--------+------
  1 | 984966 | 283954 | 772063 | x   
  2 | 817668 | 393533 | 924888 | x   
  3 | 751039 | 798753 | 454309 | x   
  4 | 128505 | 329643 | 280553 | x   
  5 | 105600 | 257225 | 710015 | x   
  6 | 323891 | 615614 |  83206 | x   
  7 | 194054 |  63506 | 353171 | x   
  8 | 212068 | 881225 | 271804 | x   
  9 | 644180 |  26693 | 200738 | x   
 10 | 136586 | 498699 | 554417 | x   
(10 rows)



For me looks like "my_key" index should be better than the PK in this case. For some reasons:

1. You are using a ORDER BY that has the same fields (and at the same order) from your index, so PG only needs to navigate the index.
2. You are using LIMIT 1, which means PG only needs to fetch the first element which key1>=50000 (and stop the search right after it).

In the case of your PK, PG will need to navigate through the index and return only one value also, but in this case the number of entries it needs to look at is bigger, because "id" has more distinct values than "key1".

Regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: Slow SELECT by primary key? Postgres 9.1.2

From
Merlin Moncure
Date:
On Mon, May 27, 2013 at 9:02 AM, John Mudd <johnbmudd@gmail.com> wrote:
> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that
> includes an ORDER BY? I was really hoping using the primary key would give
> me a boost.
>
> I stopped the server and cleared the O/S cache using "sync; echo 3 >
> /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER
> BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1
> loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)


why are you flushing postgres/os cache?  when you do that, you are
measuring raw read time from disks.  Typical disk seek time is
measured in milliseconds so the timings are completely appropriate
once you remove caching effects. Hard drives (at least, the spinning
kind) are slow and one of the major challenges of database and
hardware engineering is working around their limitations.  Fortunately
it looks like faster storage will soon be commonplace for reasonable
prices.

merlin


Re: Slow SELECT by primary key? Postgres 9.1.2

From
Igor Neyman
Date:
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Merlin Moncure
> Sent: Thursday, May 30, 2013 11:14 AM
> To: John Mudd
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
>
> On Mon, May 27, 2013 at 9:02 AM, John Mudd <johnbmudd@gmail.com> wrote:
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select
> > that includes an ORDER BY? I was really hoping using the primary key
> > would give me a boost.
> >
> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
> > /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000
> > ORDER BY key1, key2, key3, id LIMIT 1;
> >                                                               QUERY
> > PLAN
> > ---------------------------------------------------------------------
> -
> > ----------------------------------------------------------------
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> > rows=1
> > loops=1)
> >    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >          Index Cond: (key1 >= 500000)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
>
>
> why are you flushing postgres/os cache?  when you do that, you are
> measuring raw read time from disks.  Typical disk seek time is measured
> in milliseconds so the timings are completely appropriate once you
> remove caching effects. Hard drives (at least, the spinning
> kind) are slow and one of the major challenges of database and hardware
> engineering is working around their limitations.  Fortunately it looks
> like faster storage will soon be commonplace for reasonable prices.
>
> merlin
>

True.
But, on the hand (back to original question),
execution plans that John got before and after suggested change in configuration parameters are exactly the same,
thoughtiming is different but only due to buffer cache issue. 

Igor Neyman


Re: Slow SELECT by primary key? Postgres 9.1.2

From
John Mudd
Date:
I flushed the caches in an attempt to get meaningful results. I've seen complaints to previous posts that don't include clearing the caches.

I agree this tends to be artificial in another direction. I will strive to come up with a more realistic test environment next time. Maybe performing many random reads initially to fill the caches with random blocks. That might allow for minimal assistance from the cache and be more realistic.



On Thu, May 30, 2013 at 11:13 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, May 27, 2013 at 9:02 AM, John Mudd <johnbmudd@gmail.com> wrote:
> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that
> includes an ORDER BY? I was really hoping using the primary key would give
> me a boost.
>
> I stopped the server and cleared the O/S cache using "sync; echo 3 >
> /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER
> BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1
> loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)


why are you flushing postgres/os cache?  when you do that, you are
measuring raw read time from disks.  Typical disk seek time is
measured in milliseconds so the timings are completely appropriate
once you remove caching effects. Hard drives (at least, the spinning
kind) are slow and one of the major challenges of database and
hardware engineering is working around their limitations.  Fortunately
it looks like faster storage will soon be commonplace for reasonable
prices.

merlin

Re: Slow SELECT by primary key? Postgres 9.1.2

From
Merlin Moncure
Date:
On Thu, May 30, 2013 at 10:22 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>> -----Original Message-----
>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
>> owner@postgresql.org] On Behalf Of Merlin Moncure
>> Sent: Thursday, May 30, 2013 11:14 AM
>> To: John Mudd
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
>>
>> On Mon, May 27, 2013 at 9:02 AM, John Mudd <johnbmudd@gmail.com> wrote:
>> > Postgres 9.1.2 on Ubuntu 12.04
>> >
>> > Any reason why a select by primary key would be slower than a select
>> > that includes an ORDER BY? I was really hoping using the primary key
>> > would give me a boost.
>> >
>> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
>> > /proc/sys/vm/drop_caches" between the runs.
>> >
>> >
>> >
>> > test=# VACUUM ANALYZE test_select;
>> > VACUUM
>> >
>> > (stopped postgres; reset O/S cache; started postgres)
>> >
>> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000
>> > ORDER BY key1, key2, key3, id LIMIT 1;
>> >                                                               QUERY
>> > PLAN
>> > ---------------------------------------------------------------------
>> -
>> > ----------------------------------------------------------------
>> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
>> > rows=1
>> > loops=1)
>> >    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49
>> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>> >          Index Cond: (key1 >= 500000)
>> >  Total runtime: 12.678 ms
>> >
>> > (stopped postgres; reset O/S cache; started postgres)
>>
>>
>> why are you flushing postgres/os cache?  when you do that, you are
>> measuring raw read time from disks.  Typical disk seek time is measured
>> in milliseconds so the timings are completely appropriate once you
>> remove caching effects. Hard drives (at least, the spinning
>> kind) are slow and one of the major challenges of database and hardware
>> engineering is working around their limitations.  Fortunately it looks
>> like faster storage will soon be commonplace for reasonable prices.
>>
>> merlin
>>
>
> True.
> But, on the hand (back to original question),
> execution plans that John got before and after suggested change in configuration parameters are exactly the same,
thoughtiming is different but only due to buffer cache issue. 

Right.  Well, I think Matheus's answer is the right one.   But my
point was that what's going here is we are measuring number of raw
uncached seeks to satisfy query on index A vs B.  Pure luck in terms
of how the index data is organized could throw it off one way or the
other.  But the test methodology is bogus because the root index pages
will stay hot so the more compact pkey will likely be slightly faster
in real world usage.  (but, I prefer the composite key style of design
especially for range searching).

merlin