Thread: how to make this database / query faster
Hi
I use postgres v 8.3 on a dual quad core, intel xeon e5405@2.00GHz, fedora core 8 x86_64, and 32GB RAM
settings i changed on postgresql.conf:
shared_buffers = 1000MB # min 128kB or max_connections*16kB
effective_cache_size = 4000MB
I have a user table structure is attached and I have around 2 million rows and adding like 10k-30k rows everyday..
id is the primary key, and i have an index session_key
i iterate through the users table like this
select * from users where session_key is not Null order by id offset OFFSET limit 300
i want to go through the whole table... it gets really slow like greater than 5 minutes when the OFFSET is over 500,000..
what is the best way to iterate through the whole table? should i increase the limit?
thanks a lot!
I use postgres v 8.3 on a dual quad core, intel xeon e5405@2.00GHz, fedora core 8 x86_64, and 32GB RAM
settings i changed on postgresql.conf:
shared_buffers = 1000MB # min 128kB or max_connections*16kB
effective_cache_size = 4000MB
I have a user table structure is attached and I have around 2 million rows and adding like 10k-30k rows everyday..
id is the primary key, and i have an index session_key
i iterate through the users table like this
select * from users where session_key is not Null order by id offset OFFSET limit 300
i want to go through the whole table... it gets really slow like greater than 5 minutes when the OFFSET is over 500,000..
what is the best way to iterate through the whole table? should i increase the limit?
thanks a lot!
Attachment
On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
select * from users where session_key is not Null order by id offset OFFSET limit 300
OFFSET actually scans past all of the records that specify in the "OFFSET". So the bigger your offset the longer the scan will be.
One solution is to retain the last ID from the previous scan:
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <richard.broersma@gmail.com> wrote:
will this ensure that no row is repeated when i itereate over the table? what are the rows ordered by?
thanks
On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:select * from users where session_key is not Null order by id offset OFFSET limit 300One solution is to retain the last ID from the previous scan:SELECT *FROM UsersWHERE session_key IS NOT NULLAND id > your_last_idLIMIT 300;
will this ensure that no row is repeated when i itereate over the table? what are the rows ordered by?
thanks
On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <richard.broersma@gmail.com> wrote:On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:select * from users where session_key is not Null order by id offset OFFSET limit 300One solution is to retain the last ID from the previous scan:SELECT *FROM UsersWHERE session_key IS NOT NULLAND id > your_last_idLIMIT 300;
will this ensure that no row is repeated when i itereate over the table? what are the rows ordered by?
thanks
Ya, sorry I forgot to include the order by.
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
ORDER BY id
LIMIT 300;
Yes there will not be any repeated rows sence you are using a order set that who's ID are greated than the last set.
Richard Broersma wrote: > On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote: > >> On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma < >> richard.broersma@gmail.com> wrote: >> >>> On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote: >>> >>> >>>> select * from users where session_key is not Null order by id offset >>>> OFFSET limit 300 >>>> >>>> One solution is to retain the last ID from the previous scan: >>> SELECT * >>> FROM Users >>> WHERE session_key IS NOT NULL >>> AND id > your_last_id >>> LIMIT 300; >>> >> will this ensure that no row is repeated when i itereate over the table? >> what are the rows ordered by? >> thanks >> > > Ya, sorry I forgot to include the order by. > > SELECT * > FROM Users > WHERE session_key IS NOT NULL > AND id > your_last_id > ORDER BY id > LIMIT 300; > > Yes there will not be any repeated rows sence you are using a order set > that who's ID are greated than the last set. > As there's an index on id would it be faster to transpose the WHERE conditions? WHERE id > your_last_id AND session_key IS NOT NULL I can't remember if the order of WHERE is significant.
On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
As there's an index on id would it be faster to transpose the WHERERichard Broersma wrote:
> On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
>
>> On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
>> richard.broersma@gmail.com> wrote:
>>
>>> On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
>>>
>>>
>>>> select * from users where session_key is not Null order by id offset
>>>> OFFSET limit 300
>>>>
>>>> One solution is to retain the last ID from the previous scan:
>>> SELECT *
>>> FROM Users
>>> WHERE session_key IS NOT NULL
>>> AND id > your_last_id
>>> LIMIT 300;
>>>
>> will this ensure that no row is repeated when i itereate over the table?
>> what are the rows ordered by?
>> thanks
>>
>
> Ya, sorry I forgot to include the order by.
>
> SELECT *
> FROM Users
> WHERE session_key IS NOT NULL
> AND id > your_last_id
> ORDER BY id
> LIMIT 300;
>
> Yes there will not be any repeated rows sence you are using a order set
> that who's ID are greated than the last set.
>
conditions?
WHERE id > your_last_id
AND session_key IS NOT NULL
I can't remember if the order of WHERE is significant.
brian,
i have an index on session_key also
but i dont have a session key combined on both. should i do that?
i have an index on session_key also
but i dont have a session key combined on both. should i do that?
mark wrote: > On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote: >> >> As there's an index on id would it be faster to transpose the WHERE >> conditions? >> >> WHERE id > your_last_id >> AND session_key IS NOT NULL >> >> I can't remember if the order of WHERE is significant. > > brian, > i have an index on session_key also > Yeah, I should have asked, "If there's just the index on id ..." > but i dont have a session key combined on both. should i do that? I'll defer to someone with more insight.
On Sat, 15 Mar 2008, mark <markkicks@gmail.com> writes: > select * from users where session_key is not Null order by id offset OFFSET limit 300 > > i want to go through the whole table... it gets really slow like > greater than 5 minutes when the OFFSET is over 500,000.. Did you try your chance with CURSORs? (See manual page for DECLARE.) Regards.
On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
I don't think that the order of WHERE clause conditions are significant with PostgreSQL since it has a sophisticated planner. One way to be sure would be to examine the EXPLAIN plans for different where clause ordering to see if the query plan changes depending upon the order specified.
Another way to make this query faster would be to create a partial index on id;
CREATE UNIQUE INDEX Partial_id
ON Users
WHERE session_key IS NOT NULL;
This index would be much smaller if there are many session_keys that are null, so it should be loaded as searched much faster.
As there's an index on id would it be faster to transpose the WHERE
conditions?
WHERE id > your_last_id
AND session_key IS NOT NULL
I can't remember if the order of WHERE is significant.
I don't think that the order of WHERE clause conditions are significant with PostgreSQL since it has a sophisticated planner. One way to be sure would be to examine the EXPLAIN plans for different where clause ordering to see if the query plan changes depending upon the order specified.
Another way to make this query faster would be to create a partial index on id;
CREATE UNIQUE INDEX Partial_id
ON Users
WHERE session_key IS NOT NULL;
This index would be much smaller if there are many session_keys that are null, so it should be loaded as searched much faster.
Hi Richard-
My understanding is that Partial index is implemented for low cardinality scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
My understanding is that Partial index is implemented for low cardinality scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
Would it matter the selectivity is balanced?
thus 1 null record and 1 trillion null records would not apply
?
Martin-
----- Original Message -----From: Richard BroersmaTo: brianSent: Sunday, March 16, 2008 10:25 AMSubject: Re: [GENERAL] how to make this database / query fasterOn Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:As there's an index on id would it be faster to transpose the WHERE
conditions?
WHERE id > your_last_id
AND session_key IS NOT NULL
I can't remember if the order of WHERE is significant.
I don't think that the order of WHERE clause conditions are significant with PostgreSQL since it has a sophisticated planner. One way to be sure would be to examine the EXPLAIN plans for different where clause ordering to see if the query plan changes depending upon the order specified.
Another way to make this query faster would be to create a partial index on id;
CREATE UNIQUE INDEX Partial_id
ON Users
WHERE session_key IS NOT NULL;
This index would be much smaller if there are many session_keys that are null, so it should be loaded as searched much faster.
On Sun, Mar 16, 2008 at 9:47 AM, Martin Gainty <mgainty@hotmail.com> wrote:
Low cardinality can apply for more than just boolean or null/not null?
Let say I wanted to run the majority of my select queries on users with the name 'Richard'. Now of the billion users that I have, only 500 have the name 'Richard'. Since 'Richard' only makes up an insignificant part of the users table, have a partial index on 'Richard' would greatly improve select query performance for these kinds of queries.
If your boolean fields T and F were about 50% even throughout your entire trillion record table, a partial index wouldn't do much to help since 50% isn't selective enough. The same thing applies for records that have an even distribution of nulls and not nulls.
In this case, a partial index would be a really good idea if you were mostly interested in records that *were* null. However, if you were most interested in records that were not null in a table distribution like this, then a partial index would not do much for you in this case.
once again, assuming that you are mostly interested in querying the NOT NULL records in a mostly null record table, then a partial index would be a really good idea for query speed improvement.
I hope I am making sense.
Regards,
Richard Broersma Jr.
My understanding is that Partial index is implemented for low cardinality scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
Low cardinality can apply for more than just boolean or null/not null?
Let say I wanted to run the majority of my select queries on users with the name 'Richard'. Now of the billion users that I have, only 500 have the name 'Richard'. Since 'Richard' only makes up an insignificant part of the users table, have a partial index on 'Richard' would greatly improve select query performance for these kinds of queries.
If your boolean fields T and F were about 50% even throughout your entire trillion record table, a partial index wouldn't do much to help since 50% isn't selective enough. The same thing applies for records that have an even distribution of nulls and not nulls.
Would it matter the selectivity is balanced?thus 1 null record
In this case, a partial index would be a really good idea if you were mostly interested in records that *were* null. However, if you were most interested in records that were not null in a table distribution like this, then a partial index would not do much for you in this case.
and 1 trillion null records would not apply
once again, assuming that you are mostly interested in querying the NOT NULL records in a mostly null record table, then a partial index would be a really good idea for query speed improvement.
I hope I am making sense.
Regards,
Richard Broersma Jr.