Thread: Bad performance of SELECT ... where id IN (...)

Bad performance of SELECT ... where id IN (...)

From
Xia Qingran
Date:
Hi,
I have a big performance problem in my SQL select query:

========================================
select * from event where user_id in

(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,

400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,

300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,

200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,

100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
========================================

The above SELECT always spends 1200ms.

The EXPLAIN ANLYSE result of it is :
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on event  (cost=73685.08..5983063.49 rows=662018
width=36) (actual time=24.857..242.826 rows=134289 loops=1)
   Recheck Cond: (user_id = ANY

('{499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451

,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,400,399,398,397,396,395,394,3

93,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336

,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,2

78,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221

,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,1

63,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106

,105,104,103,102,101,100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,3
1,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0}'::integer[]))
   ->  Bitmap Index Scan on event_user_id_idx  (cost=0.00..71699.03
rows=662018 width=0) (actual time=24.610..24.610 rows=134289 loops=1)
         Index Cond: (user_id = ANY

('{499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452

,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,400,399,398,397,396,395,3

94,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337

,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,2

79,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222

,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,1

64,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107

,106,105,104,103,102,101,100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,
32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0}'::integer[]))

--------------------------------------------------------------------------------------------------------



My table's structure is :
=====================
CREATE TABLE event (
    id integer NOT NULL,
    user_id integer NOT NULL,
    action_type integer NOT NULL,
    resource_type integer NOT NULL,
    resource_sn integer NOT NULL,
    result_type integer,
    result_sn integer,
    created_date timestamp with time zone NOT NULL
);
=====================

And the table event has more than 100,000,000 rows, and I have a btree
index, event_user_id_idx, on user_id, the index size is 2171MB.

Do anyone have good ideas to optimize this query?

Thanks very much.
--
夏清然
Xia Qingran
qingran.xia@gmail.com
Sent from Beijing, 11, China
Charles de Gaulle  - "The better I get to know men, the more I find
myself loving dogs." -
http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html

Re: Bad performance of SELECT ... where id IN (...)

From
Claus Guttesen
Date:
> I have a big performance problem in my SQL select query:
>
> ========================================
> select * from event where user_id in
>
(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,
>
400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,
>
300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,
>
200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,
>
100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
> ========================================

What happens if you change the query to

select * from event where user_id >= 0 and user_id <= 500;

? :-)

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare

Re: Bad performance of SELECT ... where id IN (...)

From
Craig James
Date:
Xia Qingran wrote:
> Hi,
> I have a big performance problem in my SQL select query:
>
> ========================================
> select * from event where user_id in
>
(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,
>
400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,
>
300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,
>
200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,
>
100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
> ========================================
>
> The above SELECT always spends 1200ms.

If your user_id is always in a narrow range like this, or even in any range that is a small fraction of the total, then
adda range condition, like this: 

select * from event where user_id <= 500 and user_id >= 0 and user_id in (...)

I did this exact same thing in my application and it worked well.

Craig

Re: Bad performance of SELECT ... where id IN (...)

From
Paul Ooi
Date:
On 26-Sep-2009, at 10:16 PM, Claus Guttesen wrote:

>> I have a big performance problem in my SQL select query:
>>
>> ========================================
>> select * from event where user_id in
>>
(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401
>> ,
>>
400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301
>> ,
>>
300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201
>> ,
>>
200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101
>> ,
>>
100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
>> );
>> ========================================
>
> What happens if you change the query to
>
> select * from event where user_id >= 0 and user_id <= 500;

or select * from event where user_id <= 500; :)

Besides, your index seem quite huge >2G, and it usually takes some
time to process the result, even though it's already indexed with btree.


>
> ? :-)
>
> --
> regards
> Claus
>
> When lenity and cruelty play for a kingdom,
> the gentler gamester is the soonest winner.
>
> Shakespeare
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: Bad performance of SELECT ... where id IN (...)

From
Tom Lane
Date:
Xia Qingran <qingran.xia@gmail.com> writes:
> I have a big performance problem in my SQL select query:
> select * from event where user_id in
> (500,499,498, ... ,1,0);
> The above SELECT always spends 1200ms.

Your EXPLAIN ANALYZE shows that the actual runtime is only about 240ms.
So either the planning time is about 1000ms, or transmitting and
displaying the 134K rows produced by the query takes that long, or some
combination of the two.  I wouldn't be too surprised if it's the data
display that's slow; but if it's the planning time that you're unhappy
about, updating to a more recent PG release might possibly help.  What
version is this anyway?

            regards, tom lane

Re: Bad performance of SELECT ... where id IN (...)

From
Grzegorz Jaśkiewicz
Date:
if you reuse that set a lot, how about storing it in a table , and doing the join on db side ? 
if it is large, it sometimes makes sense to create temp table just for single query (I use that sort of stuff for comparing with few M records). 
But temp tables in that case have to be short lived, as they can't reuse space (no FSM in temporary table world I'm afraid, I hope it will be fixed at some stage tho).

Re: Bad performance of SELECT ... where id IN (...)

From
Xia Qingran
Date:
On Sun, Sep 27, 2009 at 1:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Xia Qingran <qingran.xia@gmail.com> writes:
>> I have a big performance problem in my SQL select query:
>> select * from event where user_id in
>> (500,499,498, ... ,1,0);
>> The above SELECT always spends 1200ms.
>
> Your EXPLAIN ANALYZE shows that the actual runtime is only about 240ms.
> So either the planning time is about 1000ms, or transmitting and
> displaying the 134K rows produced by the query takes that long, or some
> combination of the two.  I wouldn't be too surprised if it's the data
> display that's slow; but if it's the planning time that you're unhappy
> about, updating to a more recent PG release might possibly help.  What
> version is this anyway?
>
>                        regards, tom lane

Oh, It is a problem.

Forgot to talk about my platform. I am running PostgreSQL 8.4.0 on
FreeBSD 7.2-amd64 box, which has dual Xeon 5410 CPUs, 8GB memory and 2
SATA disks.

And my postgresql.conf is listed as follow:
---------------------------------------------------------------------------------------

listen_addresses = '*'        # what IP address(es) to listen on;
port = 5432                # (change requires restart)
max_connections = 88            # (change requires restart)
superuser_reserved_connections = 3
ssl = off                # (change requires restart)
tcp_keepalives_idle = 0        # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 0        # TCP_KEEPINTVL, in seconds;
tcp_keepalives_count = 0        # TCP_KEEPCNT;
shared_buffers = 2048MB            # min 128kB or max_connections*16kB
temp_buffers = 32MB            # min 800kB
max_prepared_transactions = 150        # can be 0 or more, 0 to shutdown the
prepared transactions.
work_mem = 8MB                # min 64kB
maintenance_work_mem = 1024MB        # min 1MB
max_stack_depth = 8MB            # min 100kB
max_files_per_process = 16384        # min 25
vacuum_cost_delay = 100            # 0-1000 milliseconds
vacuum_cost_page_hit = 1        # 0-10000 credits
vacuum_cost_page_miss = 10        # 0-10000 credits
vacuum_cost_page_dirty = 20        # 0-10000 credits
vacuum_cost_limit = 500        # 1-10000 credits
bgwriter_delay = 500ms            # 10-10000ms between rounds
bgwriter_lru_maxpages = 100        # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 2.0        # 0-10.0 multipler on buffers scanned/round
fsync = off                # turns forced synchronization on or off
synchronous_commit = off        # immediate fsync at commit
wal_sync_method = fsync        # the default is the first option
full_page_writes = off            # recover from partial page writes
wal_buffers = 2MB            # min 32kB
wal_writer_delay = 200ms        # 1-10000 milliseconds
commit_delay = 50            # range 0-100000, in microseconds
commit_siblings = 5            # range 1-1000
checkpoint_segments = 32        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 6min        # range 30s-1h
checkpoint_completion_target = 0.5    # checkpoint target duration, 0.0 - 1.0
checkpoint_warning = 30s        # 0 is off
enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
enable_tidscan = on
seq_page_cost = 1.8            # measured on an arbitrary scale
random_page_cost = 2            # same scale as above
cpu_tuple_cost = 0.15            # same scale as above
cpu_index_tuple_cost = 0.07        # same scale as above
cpu_operator_cost = 0.03        # same scale as above
effective_cache_size = 3072MB
geqo = on
geqo_threshold = 20
geqo_effort =  7                 # range 1-10
geqo_pool_size = 0            # selects default based on effort
geqo_generations = 0            # selects default based on effort
geqo_selection_bias = 2.0        # range 1.5-2.0
default_statistics_target = 500        # range 1-1000
constraint_exclusion = partition
from_collapse_limit = 20
join_collapse_limit = 20        # 1 disables collapsing of explicit
log_destination = 'syslog'
syslog_facility = 'LOCAL2'
syslog_ident = 'postgres'
client_min_messages = notice        # values in order of decreasing detail:
log_min_messages = error        # values in order of decreasing detail:
log_error_verbosity = terse        # terse, default, or verbose messages
log_min_error_statement = panic    # values in order of decreasing detail:
log_min_duration_statement = -1    # -1 is disabled, 0 logs all statements
silent_mode = on
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = off
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = on
log_hostname = off
log_line_prefix = ''            # special values:
log_lock_waits = off            # log lock waits >= deadlock_timeout
log_statement = 'none'            # none, ddl, mod, all
log_temp_files = -1            # log temporary files equal or larger
track_activities = on
track_counts = on
update_process_title = off
log_parser_stats = off
log_planner_stats = off
log_executor_stats = off
log_statement_stats = off
autovacuum = on            # Enable autovacuum subprocess?  'on'
log_autovacuum_min_duration = 10    # -1 disables, 0 logs all actions and
autovacuum_max_workers = 3        # max number of autovacuum subprocesses
autovacuum_naptime = 10min        # time between autovacuum runs
autovacuum_vacuum_threshold = 100    # min number of row updates before
autovacuum_analyze_threshold = 50    # min number of row updates before
autovacuum_vacuum_scale_factor = 0.2    # fraction of table size before vacuum
autovacuum_analyze_scale_factor = 0.1    # fraction of table size before analyze
autovacuum_freeze_max_age = 200000000    # maximum XID age before forced vacuum
autovacuum_vacuum_cost_delay = 30    # default vacuum cost delay for
autovacuum_vacuum_cost_limit = 200    # default vacuum cost limit for
datestyle = 'iso, mdy'
client_encoding = utf-8        # actually, defaults to database
lc_messages = 'C'            # locale for system error message
lc_monetary = 'C'            # locale for monetary formatting
lc_numeric = 'C'            # locale for number formatting
lc_time = 'C'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'
deadlock_timeout = 60s
max_locks_per_transaction = 32        # min 10
regex_flavor = basic        # advanced, extended, or basic
---------------------------------------------------------------------------------------

Thanks a lot.
--
夏清然
Xia Qingran
qingran.xia@gmail.com
Sent from Beijing, 11, China
Joan Crawford  - "I, Joan Crawford, I believe in the dollar.
Everything I earn, I spend." -
http://www.brainyquote.com/quotes/authors/j/joan_crawford.html

Re: Bad performance of SELECT ... where id IN (...)

From
Xia Qingran
Date:
On Sat, Sep 26, 2009 at 10:59 PM, Craig James
<craig_james@emolecules.com> wrote:
>
> If your user_id is always in a narrow range like this, or even in any range
> that is a small fraction of the total, then add a range condition, like
> this:
>
> select * from event where user_id <= 500 and user_id >= 0 and user_id in
> (...)
>
> I did this exact same thing in my application and it worked well.
>
> Craig
>

It is a good idea. But In my application, most of the queries' user_id
are random and difficult to range.
Thanks anyway.



--
夏清然
Xia Qingran
qingran.xia@gmail.com
Sent from Beijing, 11, China
Charles de Gaulle  - "The better I get to know men, the more I find
myself loving dogs." -
http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html

Re: Bad performance of SELECT ... where id IN (...)

From
Ivan Voras
Date:
Xia Qingran wrote:
> On Sun, Sep 27, 2009 at 1:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Xia Qingran <qingran.xia@gmail.com> writes:
>>> I have a big performance problem in my SQL select query:
>>> select * from event where user_id in
>>> (500,499,498, ... ,1,0);
>>> The above SELECT always spends 1200ms.
>> Your EXPLAIN ANALYZE shows that the actual runtime is only about 240ms.
>> So either the planning time is about 1000ms, or transmitting and
>> displaying the 134K rows produced by the query takes that long, or some
>> combination of the two.  I wouldn't be too surprised if it's the data
>> display that's slow; but if it's the planning time that you're unhappy
>> about, updating to a more recent PG release might possibly help.  What
>> version is this anyway?
>>
>>                        regards, tom lane
>
> Oh, It is a problem.

I don't see where the "Total runtime" information is in your first message.

Also, did you run VACUUM FULL ANALYZE lately?

> Forgot to talk about my platform. I am running PostgreSQL 8.4.0 on
> FreeBSD 7.2-amd64 box, which has dual Xeon 5410 CPUs, 8GB memory and 2
> SATA disks.
>
> And my postgresql.conf is listed as follow:
> ---------------------------------------------------------------------------------------
>
> listen_addresses = '*'        # what IP address(es) to listen on;
> port = 5432                # (change requires restart)
> max_connections = 88            # (change requires restart)
> superuser_reserved_connections = 3
> ssl = off                # (change requires restart)
> tcp_keepalives_idle = 0        # TCP_KEEPIDLE, in seconds;
> tcp_keepalives_interval = 0        # TCP_KEEPINTVL, in seconds;
> tcp_keepalives_count = 0        # TCP_KEEPCNT;
> shared_buffers = 2048MB            # min 128kB or max_connections*16kB

For start I think you will need to make shared_buffers larger than your
index to get decent performance - try setting it to 4096 MB and see if
it helps.

> temp_buffers = 32MB            # min 800kB
> max_prepared_transactions = 150        # can be 0 or more, 0 to shutdown the
> prepared transactions.
> work_mem = 8MB                # min 64kB

Depending on the type of your workload (how many clients are connected
and how complex are the queries) you might want to increase work_mem
also. Try 16 MB - 32 MB or more and see if it helps.

> fsync = off                # turns forced synchronization on or off
> synchronous_commit = off        # immediate fsync at commit

Offtopic - you probably know what you are doing by disabling these, right?

Re: Bad performance of SELECT ... where id IN (...)

From
Omar Kilani
Date:
Hi Xia,

Try this patch:

http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch

It's a hack, but it works for us. I think you're probably spending
most of your query time planning, and this patch helps speed things up
10x over here.

Regards,
Omar

On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran.xia@gmail.com> wrote:
> On Sat, Sep 26, 2009 at 10:59 PM, Craig James
> <craig_james@emolecules.com> wrote:
>>
>> If your user_id is always in a narrow range like this, or even in any range
>> that is a small fraction of the total, then add a range condition, like
>> this:
>>
>> select * from event where user_id <= 500 and user_id >= 0 and user_id in
>> (...)
>>
>> I did this exact same thing in my application and it worked well.
>>
>> Craig
>>
>
> It is a good idea. But In my application, most of the queries' user_id
> are random and difficult to range.
> Thanks anyway.
>
>
>
> --
> 夏清然
> Xia Qingran
> qingran.xia@gmail.com
> Sent from Beijing, 11, China
> Charles de Gaulle  - "The better I get to know men, the more I find
> myself loving dogs." -
> http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Bad performance of SELECT ... where id IN (...)

From
Robert Haas
Date:
On Sun, Oct 4, 2009 at 9:58 PM, Omar Kilani <omar.kilani@gmail.com> wrote:
> Hi Xia,
>
> Try this patch:
>
> http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch
>
> It's a hack, but it works for us. I think you're probably spending
> most of your query time planning, and this patch helps speed things up
> 10x over here.

Woof.  I can see that helping in some situations, but what a foot-gun!

...Robert

Re: Bad performance of SELECT ... where id IN (...)

From
Omar Kilani
Date:
Robert,

On Mon, Oct 5, 2009 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Oct 4, 2009 at 9:58 PM, Omar Kilani <omar.kilani@gmail.com> wrote:
>> Hi Xia,
>>
>> Try this patch:
>>
>> http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch
>>
>> It's a hack, but it works for us. I think you're probably spending
>> most of your query time planning, and this patch helps speed things up
>> 10x over here.
>
> Woof.  I can see that helping in some situations, but what a foot-gun!

We've run that patch for about 4 years (originally coded for us by
Neil Conway for 8.2, I think), and have never seen any negatives from
it.

I'm not really sure what the alternatives are -- it never really makes
sense to get the selectivity for thousands of items in the IN clause.
I've never seen a different plan for the same query against a DB with
that patch vs without -- it just takes a huge amount of time longer to
run without it. :)

But yeah, definitely a hack, and should only be used if needed --
hopefully there's some sort of official solution on the horizon. :)

> ...Robert

Regards,
Omar

Re: Bad performance of SELECT ... where id IN (...)

From
Grzegorz Jaśkiewicz
Date:


On Mon, Oct 5, 2009 at 1:24 PM, Omar Kilani <omar.kilani@gmail.com> wrote:


I'm not really sure what the alternatives are -- it never really makes
sense to get the selectivity for thousands of items in the IN clause.
I've never seen a different plan for the same query against a DB with
that patch vs without -- it just takes a huge amount of time longer to
run without it. :)

But yeah, definitely a hack, and should only be used if needed --
hopefully there's some sort of official solution on the horizon. :)

start using temporary tables, transactions, and joins.
Depending on source of the data (if the source is another query, than just combine it in one query with join), otherwise create temp table, fill out with data, and run query with join.
If you do all that in transaction, it will be very fast.
 
--
GJ

Re: Bad performance of SELECT ... where id IN (...)

From
Xia Qingran
Date:
On Mon, Oct 5, 2009 at 9:58 AM, Omar Kilani <omar.kilani@gmail.com> wrote:
> Hi Xia,
>
> Try this patch:
>
> http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch
>
> It's a hack, but it works for us. I think you're probably spending
> most of your query time planning, and this patch helps speed things up
> 10x over here.

Thanks!
I am trying it.

Regards,

Xia Qingran

>
> Regards,
> Omar
>
> On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran.xia@gmail.com> wrote:
>> On Sat, Sep 26, 2009 at 10:59 PM, Craig James
>> <craig_james@emolecules.com> wrote:
>>>
>>> If your user_id is always in a narrow range like this, or even in any range
>>> that is a small fraction of the total, then add a range condition, like
>>> this:
>>>
>>> select * from event where user_id <= 500 and user_id >= 0 and user_id in
>>> (...)
>>>
>>> I did this exact same thing in my application and it worked well.
>>>
>>> Craig
>>>
>>
>> It is a good idea. But In my application, most of the queries' user_id
>> are random and difficult to range.
>> Thanks anyway.
>>
>>
>>
>> --
>> 夏清然
>> Xia Qingran
>> qingran.xia@gmail.com
>> Sent from Beijing, 11, China
>> Charles de Gaulle  - "The better I get to know men, the more I find
>> myself loving dogs." -
>> http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>



--
夏清然
Xia Qingran
qingran.xia@gmail.com
Sent from Beijing, 11, China
Stephen Leacock  - "I detest life-insurance agents: they always argue
that I shall some day die, which is not so." -
http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html

Re: Bad performance of SELECT ... where id IN (...)

From
Kenneth Marshall
Date:
On Fri, Oct 09, 2009 at 08:31:54PM +0800, Xia Qingran wrote:
> On Mon, Oct 5, 2009 at 9:58 AM, Omar Kilani <omar.kilani@gmail.com> wrote:
> > Hi Xia,
> >
> > Try this patch:
> >
> > http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch
> >
> > It's a hack, but it works for us. I think you're probably spending
> > most of your query time planning, and this patch helps speed things up
> > 10x over here.
>
> Thanks!
> I am trying it.
>
> Regards,
>
> Xia Qingran
>

We have a similar situation when using DSPAM with a PostgreSQL
backend. In that case we used a function like the following to
speed up the lookups. I do not know if it would be useful in
your situation, but I thought I would post it for the group:

The original query was of the form:

SELECT uid, token, spam_hits, innocent_hits FROM dspam_token_data
WHERE uid = 'xxx' AND token IN (...);

The faster version of the query in the current code is:

SELECT * FROM lookup_tokens(%d, '{...});

where lookup_tokens is defined as follows:

create function lookup_tokens(integer,bigint[])
  returns setof dspam_token_data
  language plpgsql stable
  as '
declare
  v_rec record;
begin
  for v_rec in select * from dspam_token_data
                where uid=$1
                  and token in (select $2[i]
                                  from generate_series(array_lower($2,1),
                                                       array_upper($2,1)) s(i))
  loop
    return next v_rec;
  end loop;
  return;
end;';

Anyway, you may want to try a similar approach instead of the
posted code change.

Regards,
Ken

> >
> > Regards,
> > Omar
> >
> > On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran.xia@gmail.com> wrote:
> >> On Sat, Sep 26, 2009 at 10:59 PM, Craig James
> >> <craig_james@emolecules.com> wrote:
> >>>
> >>> If your user_id is always in a narrow range like this, or even in any range
> >>> that is a small fraction of the total, then add a range condition, like
> >>> this:
> >>>
> >>> select * from event where user_id <= 500 and user_id >= 0 and user_id in
> >>> (...)
> >>>
> >>> I did this exact same thing in my application and it worked well.
> >>>
> >>> Craig
> >>>
> >>
> >> It is a good idea. But In my application, most of the queries' user_id
> >> are random and difficult to range.
> >> Thanks anyway.
> >>
> >>
> >>
> >> --
> >> ?????????
> >> Xia Qingran
> >> qingran.xia@gmail.com
> >> Sent from Beijing, 11, China
> >> Charles de Gaulle ??- "The better I get to know men, the more I find
> >> myself loving dogs." -
> >> http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> >>
> >
>
>
>
> --
> ?????????
> Xia Qingran
> qingran.xia@gmail.com
> Sent from Beijing, 11, China
> Stephen Leacock  - "I detest life-insurance agents: they always argue
> that I shall some day die, which is not so." -
> http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>