Thread: how to make this database / query faster

how to make this database / query faster

From
mark
Date:
Hi
I use postgres v 8.3 on a dual quad core, intel xeon e5405@2.00GHz, fedora core 8 x86_64, and 32GB RAM

settings i changed on postgresql.conf:

shared_buffers = 1000MB                 # min 128kB or max_connections*16kB
effective_cache_size = 4000MB


I have a user table structure is attached and I have around 2 million rows and adding like 10k-30k rows everyday..
id is the primary key, and i have an index session_key

i iterate through the users table like this

select * from users where session_key is not Null order by id offset OFFSET limit 300


i want to go through the whole table... it gets really slow like greater than 5 minutes when the OFFSET is over 500,000..

what is the best way to iterate through the whole table? should i increase the  limit?
thanks a lot!
Attachment

Re: how to make this database / query faster

From
"Richard Broersma"
Date:
On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
 
select * from users where session_key is not Null order by id offset OFFSET limit 300

OFFSET actually scans past all of the records that specify in the "OFFSET".  So the bigger your offset the longer the scan will be.
 
One solution is to retain the last ID from the previous scan:
 
SELECT *
  FROM Users
 WHERE session_key IS NOT NULL
    AND id > your_last_id
  LIMIT 300;

Re: how to make this database / query faster

From
mark
Date:
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <richard.broersma@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
 
select * from users where session_key is not Null order by id offset OFFSET limit 300

One solution is to retain the last ID from the previous scan:
 
SELECT *
  FROM Users
 WHERE session_key IS NOT NULL
    AND id > your_last_id
  LIMIT 300;

will this ensure that no row is repeated when i itereate over the table? what are the rows ordered by?
thanks

Re: how to make this database / query faster

From
"Richard Broersma"
Date:
On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <richard.broersma@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
 
select * from users where session_key is not Null order by id offset OFFSET limit 300

One solution is to retain the last ID from the previous scan:
 
SELECT *
  FROM Users
 WHERE session_key IS NOT NULL
    AND id > your_last_id
  LIMIT 300;

will this ensure that no row is repeated when i itereate over the table? what are the rows ordered by?
thanks

Ya, sorry I forgot to include the order by.
 
SELECT *
  FROM Users
 WHERE session_key IS NOT NULL
    AND id > your_last_id
 ORDER BY id
  LIMIT 300;
 
Yes there will not be any repeated rows sence you are using a order set that who's ID are greated than the last set.

Re: how to make this database / query faster

From
brian
Date:
Richard Broersma wrote:
> On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
>
>> On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
>> richard.broersma@gmail.com> wrote:
>>
>>>  On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
>>>
>>>
>>>> select * from users where session_key is not Null order by id offset
>>>> OFFSET limit 300
>>>>
>>>>  One solution is to retain the last ID from the previous scan:
>>> SELECT *
>>>   FROM Users
>>>  WHERE session_key IS NOT NULL
>>>     AND id > your_last_id
>>>   LIMIT 300;
>>>
>> will this ensure that no row is repeated when i itereate over the table?
>> what are the rows ordered by?
>> thanks
>>
>
> Ya, sorry I forgot to include the order by.
>
>  SELECT *
>   FROM Users
>  WHERE session_key IS NOT NULL
>     AND id > your_last_id
>  ORDER BY id
>   LIMIT 300;
>
> Yes there will not be any repeated rows sence you are using a order set
> that who's ID are greated than the last set.
>

As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

Re: how to make this database / query faster

From
mark
Date:

On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
Richard Broersma wrote:
> On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
>
>> On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
>> richard.broersma@gmail.com> wrote:
>>
>>>  On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
>>>
>>>
>>>> select * from users where session_key is not Null order by id offset
>>>> OFFSET limit 300
>>>>
>>>>  One solution is to retain the last ID from the previous scan:
>>> SELECT *
>>>   FROM Users
>>>  WHERE session_key IS NOT NULL
>>>     AND id > your_last_id
>>>   LIMIT 300;
>>>
>> will this ensure that no row is repeated when i itereate over the table?
>> what are the rows ordered by?
>> thanks
>>
>
> Ya, sorry I forgot to include the order by.
>
>  SELECT *
>   FROM Users
>  WHERE session_key IS NOT NULL
>     AND id > your_last_id
>  ORDER BY id
>   LIMIT 300;
>
> Yes there will not be any repeated rows sence you are using a order set
> that who's ID are greated than the last set.
>

As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.
brian,
i have an index on session_key also
but i dont have a session key combined on both. should i do that?

Re: how to make this database / query faster

From
brian
Date:
mark wrote:
> On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
>>
>> As there's an index on id would it be faster to transpose the WHERE
>> conditions?
>>
>> WHERE id > your_last_id
>> AND session_key IS NOT NULL
>>
>> I can't remember if the order of WHERE is significant.
>
> brian,
> i have an index on session_key also
>

Yeah, I should have asked, "If there's just the index on id ..."

 > but i dont have a session key combined on both. should i do that?

I'll defer to someone with more insight.

Re: how to make this database / query faster

From
Volkan YAZICI
Date:
On Sat, 15 Mar 2008, mark <markkicks@gmail.com> writes:
> select * from users where session_key is not Null order by id offset OFFSET limit 300
>
> i want to go through the whole table... it gets really slow like
> greater than 5 minutes when the OFFSET is over 500,000..

Did you try your chance with CURSORs? (See manual page for DECLARE.)


Regards.

Re: how to make this database / query faster

From
"Richard Broersma"
Date:
On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

I don't think that the order of WHERE clause conditions are significant with PostgreSQL since it has a sophisticated planner.  One way to be sure would be to examine the EXPLAIN plans for different where clause ordering to see if the query plan changes depending upon the order specified.

Another way to make this query faster would be to create a partial index on id;

CREATE UNIQUE INDEX Partial_id
        ON Users
 WHERE session_key IS NOT NULL;


This index would be much smaller if there are many session_keys that are null, so it should be loaded as searched much faster.

Re: how to make this database / query faster

From
"Martin Gainty"
Date:
Hi Richard-

My understanding is that Partial index is implemented for low cardinality scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
 
Would it matter the selectivity is balanced?
thus 1 null record and 1 trillion null records would not apply

?
Martin-
----- Original Message -----
To: brian
Sent: Sunday, March 16, 2008 10:25 AM
Subject: Re: [GENERAL] how to make this database / query faster

On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

I don't think that the order of WHERE clause conditions are significant with PostgreSQL since it has a sophisticated planner.  One way to be sure would be to examine the EXPLAIN plans for different where clause ordering to see if the query plan changes depending upon the order specified.

Another way to make this query faster would be to create a partial index on id;

CREATE UNIQUE INDEX Partial_id
        ON Users
 WHERE session_key IS NOT NULL;


This index would be much smaller if there are many session_keys that are null, so it should be loaded as searched much faster.

Re: how to make this database / query faster

From
"Richard Broersma"
Date:
On Sun, Mar 16, 2008 at 9:47 AM, Martin Gainty <mgainty@hotmail.com> wrote:
My understanding is that Partial index is implemented for low cardinality scenarios ('Y'/'N') ('T'/'F') (null/not null) ?

Low cardinality can apply for more than just boolean or null/not null?

Let say I wanted to run the majority of my select queries on users with the name 'Richard'.   Now of the billion users that I have, only 500 have the name 'Richard'.  Since 'Richard' only makes up an insignificant part of the users table, have a partial index on 'Richard' would greatly improve select query performance for these kinds of queries.

If your boolean fields T and F were about 50% even throughout your entire trillion record table, a partial index wouldn't do much to help since 50% isn't selective enough. The same thing applies for records that have an even distribution of nulls and not nulls.
 
 
Would it matter the selectivity is balanced?
thus 1 null record

In this case, a partial index would be a really good idea if you were mostly interested in records that *were* null.  However, if you were most interested in records that were not null in a table distribution like this, then a partial index would not do much for you in this case.
 
and 1 trillion null records would not apply

once again, assuming that you are mostly interested in querying the NOT NULL records in a mostly null record table, then a partial index would be a really good idea for query speed improvement.

I hope I am making sense.

Regards,
Richard Broersma Jr.