Home > mailing lists

Re: query performance - Mailing list pgsql-general

From	Alex Turner
Subject	Re: query performance
Date	January 14, 2008 04:54:35
Msg-id	33c6269f0801132154u6324049fsc21f0029e7798ee2@mail.gmail.com Whole thread Raw
In response to	Re: query performance ("Alex Turner" <armtuk@gmail.com>)
List	pgsql-general

Tree view

Oh - if you do this then make sure that you have the primary key index on overview too.

Alex

On Jan 14, 2008 12:53 AM, Alex Turner <armtuk@gmail.com > wrote:

If you combine it with Tom Lane's suggestion - it will go even better, something like:

select * from t_documentcontent where _id in (select _id from overview where _id>x order by _id limit 50);

Alex

On Jan 13, 2008 11:59 PM, pepone. onrez <pepone.onrez@gmail.com> wrote:

Thanks Alex

I test your solution and is realy more faster.

Nested Loop (cost=1743.31..2044.58 rows=50 width=908) (actual time=101.695..106.178 rows=50 loops=1)
-> HashAggregate (cost=1743.31..1743.31 rows=50 width=108) (actual time=101.509..101.567 rows=50 loops=1)
        -> Subquery Scan "IN_subquery" (cost=1741.60..1743.19 rows=50 width=108) (actual time=101.327..101.456 rows=50 loops=1)
              -> Limit (cost=1741.60..1742.69 rows=50 width=108) (actual time=101.313..101.383 rows=50 loops=1)
                    -> Seq Scan on overview (cost=0.00..3283.07 rows=150807 width=108) (actual time=0.036..72.249 rows=80050 loops=1)
-> Index Scan using i_documentcontent_id on t_documentcontent (cost=0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
        Index Cond: ((t_documentcontent._id)::text = ("outer"._id)::text)
Total runtime: 106.323 ms

I now need to see what trigers i need to add, and test the insertions. Thanks again

On Jan 14, 2008 5:54 AM, Alex Turner < armtuk@gmail.com> wrote:
Here is a table I threw together to demonstrate the approximate speed of a materialized view in this case:

trend=# explain analyze select property_id from overview order by property_id limit 50 offset 50000;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=19112.75..19112.88 rows=50 width=8) (actual time=446.048..446.125 rows=50 loops=1)
   -> Sort (cost=18987.75..19400.49 rows=165094 width=8) (actual time=384.788..424.433 rows=50050 loops=1)
         Sort Key: property_id
         -> Seq Scan on overview (cost=0.00..2501.94 rows=165094 width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
Total runtime: 447.578 ms
(5 rows)

trend=# select count(*) from overview;
count
--------
173409
(1 row)

trend=#

It's not great - but it's better than 47 seconds (The machine I'm running it on is far from big iron, so these results should be fairly typical for any modern x86 box - also this materialized view is almost certainly in RAM, and therefore IO speed is irrelevant).

Tom lane has already suggested another approach, whereby you order your results, then select the next 10 from the set where the id is greater than the greatest of the last one:

select id from overview order by id limit 50;

x = get row['id'] // for row 1
do something
x=get row['id'] // for row 2
do something
...
x=get row['id'] // for row 50

select id from overview where id>x order by id limit 50.

The order by is relevant and infact imperative, because you must order your results somehow, otherwise your pagination will produce different results each time you try it as database updates will affect the order the rows come back by default without an order by clause.

Let me say that again to be clear: The order rows come back if you don't specify an order by can change! so pulling rows without an order by is a REALLY bad idea. This will break your pagination if a database update happens between someone viewing a page and hitting next to view the next page.

Alex

On Jan 13, 2008 11:43 PM, Alex Turner <armtuk@gmail.com > wrote:
If you have to access the data this way (with no where clause at all - which sometimes you do) then I have already provided a solution that will work reasonably well. If you create what is essentially a materialized view of just the id field, the sequence scan will return much fewer pages than when you do it on the main table. Then you join it to the indexed main table, and page in just the rows you need. Voila - much faster result. Of course we haven't really talked about how that will affect insert speed and delete speed if you trigger then up, but you haven't really talked about any requirements there.

Alex

On Jan 13, 2008 11:27 PM, pepone. onrez <pepone.onrez@gmail.com > wrote:
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on and indexscan on and added order by _id

The table has an index in the _id field

CREATE INDEX i_documentcontent_document
ON t_documentcontent
USING btree
(_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000

"Limit (cost=137068.24..137068.36 rows=50 width=58) (actual time=41119.702..41119.792 rows=50 loops=1)"
" -> Sort (cost=136943.24..137320.26 rows=150807 width=58) (actual time=41064.802..41100.424 rows=50050 loops=1)"
"        Sort Key: _id"
"        -> Seq Scan on t_documentcontent (cost= 0.00..110772.07 rows=150807 width=58) (actual time=106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000

"Limit (cost=39839.37..40038.56 rows=50 width=58) (actual time=1172.969..1194.228 rows=50 loops=1)"
" -> Index Scan using i_documentcontent_id on t_documentcontent (cost=0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688 rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
i using uuid for the _id field that is the primary key add a WHERE id > ? don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of documents avoid offset is not posible always

pgsql-general by date:

From: Sim Zacks
Date: 14 January 2008, 04:53:30
Subject: Re: 8.2.4 serious slowdown

From: alphax
Date: 14 January 2008, 05:09:44
Subject: Re: How to safely compare transaction id?

Re: query performance - Mailing list pgsql-general

Previous

Next