Home > mailing lists

Thread: Performance with very large tables

Performance with very large tables

From

"Jan van der Weijde"

Date:

15 January 2007, 06:52:57

Hello all,

one of our customers is using PostgreSQL with tables containing millions of records. A simple 'SELECT * FROM <table>' takes way too much time in that case, so we have advised him to use the LIMIT and OFFSET clauses. However now he has a concurrency problem. Records deleted, added or updated in one process have an influence on the OFFSET value of another process such that records are either skipped of read again.

The solution to that problem is to use transactions with isolation level serializable. But to use transactions around a loop that reads millions of records is far from ideal I think.

Does anyone have a suggestion for this problem ? Is there for instance an alternative to LIMIT/OFFSET so that SELECT on large tables has a good performance ?

Thank you for your help

Jan van der Weijde

Re: Performance with very large tables

From

Richard Huxton

Date:

15 January 2007, 07:01:41

Jan van der Weijde wrote:
> Hello all,
>
> one of our customers is using PostgreSQL with tables containing millions
> of records. A simple 'SELECT * FROM <table>'  takes way too much time in
> that case, so we have advised him to use the LIMIT and OFFSET clauses.

That won't reduce the time to fetch millions of rows.

It sounds like your customer doesn't want millions of rows at once, but
rather a few rows quickly and then to fetch more as required. For this
you want to use a cursor. You can do this via SQL, or perhaps via your
database library.

In SQL:
http://www.postgresql.org/docs/8.2/static/sql-declare.html
http://www.postgresql.org/docs/8.2/static/sql-fetch.html
In pl/pgsql:
http://www.postgresql.org/docs/8.2/static/plpgsql-cursors.html

HTH
--
   Richard Huxton
   Archonet Ltd

Re: Performance with very large tables

From

"Shoaib Mir"

Date:

15 January 2007, 07:24:41

You can also opt for partitioning the tables and this way select will only get the data from the required partition.

--------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com )

On 1/15/07, Richard Huxton <dev@archonet.com> wrote:

Jan van der Weijde wrote:
> Hello all,
>
> one of our customers is using PostgreSQL with tables containing millions
> of records. A simple 'SELECT * FROM <table>'  takes way too much time in
> that case, so we have advised him to use the LIMIT and OFFSET clauses.

That won't reduce the time to fetch millions of rows.

It sounds like your customer doesn't want millions of rows at once, but
rather a few rows quickly and then to fetch more as required. For this
you want to use a cursor. You can do this via SQL, or perhaps via your
database library.

In SQL:
http://www.postgresql.org/docs/8.2/static/sql-declare.html
http://www.postgresql.org/docs/8.2/static/sql-fetch.html
In pl/pgsql:
http://www.postgresql.org/docs/8.2/static/plpgsql-cursors.html

HTH
--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: Performance with very large tables

From

"Jan van der Weijde"

Date:

15 January 2007, 07:27:08

Thank you.
It is true he want to have the first few record quickly and then
continue with the next records. However without LIMIT it already takes a
very long time before the first record is returned.
I reproduced this with a table with 1.1 million records on an XP machine
and in my case it took about 25 seconds before the select returned the
first record. I tried it both interactively with pgAdmin and with a
C-application using a cursor (with hold). Both took about the same time.

Thanks,
Jan van der Weijde

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]
Sent: Monday, January 15, 2007 12:01
To: Jan van der Weijde
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Hello all,
>
> one of our customers is using PostgreSQL with tables containing
> millions of records. A simple 'SELECT * FROM <table>'  takes way too
> much time in that case, so we have advised him to use the LIMIT and
OFFSET clauses.

That won't reduce the time to fetch millions of rows.

It sounds like your customer doesn't want millions of rows at once, but
rather a few rows quickly and then to fetch more as required. For this
you want to use a cursor. You can do this via SQL, or perhaps via your
database library.

In SQL:
http://www.postgresql.org/docs/8.2/static/sql-declare.html
http://www.postgresql.org/docs/8.2/static/sql-fetch.html
In pl/pgsql:
http://www.postgresql.org/docs/8.2/static/plpgsql-cursors.html

HTH
--
   Richard Huxton
   Archonet Ltd

Re: Performance with very large tables

From

Richard Huxton

Date:

15 January 2007, 07:31:35

Shoaib Mir wrote:
> You can also opt for partitioning the tables and this way select will only
> get the data from the required partition.

Not in the case of SELECT * FROM <table> though. Unless you access the
specific partitioned table.

> On 1/15/07, Richard Huxton <dev@archonet.com> wrote:
>>
>> Jan van der Weijde wrote:
>> > Hello all,
>> >
>> > one of our customers is using PostgreSQL with tables containing
>> millions
>> > of records. A simple 'SELECT * FROM <table>'  takes way too much
>> time in
>> > that case, so we have advised him to use the LIMIT and OFFSET clauses.

--
   Richard Huxton
   Archonet Ltd

Re: Performance with very large tables

From

Richard Huxton

Date:

15 January 2007, 07:31:47

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes a
> very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP machine
> and in my case it took about 25 seconds before the select returned the
> first record. I tried it both interactively with pgAdmin and with a
> C-application using a cursor (with hold). Both took about the same time.

That doesn't sound right to me, not for a straigforward SELECT * FROM
where there's no sorting etc.

--
   Richard Huxton
   Archonet Ltd

Re: Performance with very large tables

From

"Shoaib Mir"

Date:

15 January 2007, 07:37:37

Oh yes, need to have a condition first for which you have partitioned tables. Only in that case it will work with partitions.

---------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com )

On 1/15/07, Richard Huxton <dev@archonet.com> wrote:

Shoaib Mir wrote:
> You can also opt for partitioning the tables and this way select will only
> get the data from the required partition.

Not in the case of SELECT * FROM <table> though. Unless you access the
specific partitioned table.

> On 1/15/07, Richard Huxton <dev@archonet.com> wrote:
>>
>> Jan van der Weijde wrote:
>> > Hello all,
>> >
>> > one of our customers is using PostgreSQL with tables containing
>> millions
>> > of records. A simple 'SELECT * FROM <table>'  takes way too much
>> time in
>> > that case, so we have advised him to use the LIMIT and OFFSET clauses.

--
   Richard Huxton
   Archonet Ltd

Re: Performance with very large tables

From

Alban Hertroys

Date:

15 January 2007, 07:49:22

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes a
> very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP machine
> and in my case it took about 25 seconds before the select returned the
> first record. I tried it both interactively with pgAdmin and with a
> C-application using a cursor (with hold). Both took about the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: Performance with very large tables

From

"Jan van der Weijde"

Date:

15 January 2007, 08:44:42

That is exactly the problem I think. However I do not deliberately
retrieve the entire table. I use the default settings of the PostgreSQL
installation and just execute a simple SELECT * FROM table.
I am using a separate client and server (both XP in the test
environment), but that should not make much difference.
I would expect that the default behavior of PostgreSQL should be such
that without LIMIT, a SELECT returns records immediately.

Thank you,
Jan

-----Original Message-----
From: Alban Hertroys [mailto:alban@magproductions.nl]
Sent: Monday, January 15, 2007 12:49
To: Jan van der Weijde
Cc: Richard Huxton; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes

> a very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP
> machine and in my case it took about 25 seconds before the select
> returned the first record. I tried it both interactively with pgAdmin
> and with a C-application using a cursor (with hold). Both took about
the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: Performance with very large tables

From

"Shoaib Mir"

Date:

15 January 2007, 08:48:59

If you go with Java, you can make it faster by using setFetchSize (JDBC functionality) from client and that will help you with the performance in case of fetching large amounts of data.

---------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 1/15/07, Jan van der Weijde < Jan.van.der.Weijde@attachmate.com> wrote:

That is exactly the problem I think. However I do not deliberately
retrieve the entire table. I use the default settings of the PostgreSQL
installation and just execute a simple SELECT * FROM table.
I am using a separate client and server (both XP in the test
environment), but that should not make much difference.
I would expect that the default behavior of PostgreSQL should be such
that without LIMIT, a SELECT returns records immediately.

Thank you,
Jan

-----Original Message-----
From: Alban Hertroys [mailto: alban@magproductions.nl]
Sent: Monday, January 15, 2007 12:49
To: Jan van der Weijde
Cc: Richard Huxton; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes

> a very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP
> machine and in my case it took about 25 seconds before the select
> returned the first record. I tried it both interactively with pgAdmin
> and with a C-application using a cursor (with hold). Both took about
the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Performance with very large tables

From

"Jan van der Weijde"

Date:

15 January 2007, 08:54:45

Unfortunately a large C program has already been written.. But if a function like PQsetFetchSize() was available in libpq, that would also solve the problem.

From: Shoaib Mir [mailto:shoaibmir@gmail.com]
Sent: Monday, January 15, 2007 13:49
To: Jan van der Weijde
Cc: Alban Hertroys; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

If you go with Java, you can make it faster by using setFetchSize (JDBC functionality) from client and that will help you with the performance in case of fetching large amounts of data.

---------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 1/15/07, Jan van der Weijde < Jan.van.der.Weijde@attachmate.com> wrote:

That is exactly the problem I think. However I do not deliberately
retrieve the entire table. I use the default settings of the PostgreSQL
installation and just execute a simple SELECT * FROM table.
I am using a separate client and server (both XP in the test
environment), but that should not make much difference.
I would expect that the default behavior of PostgreSQL should be such
that without LIMIT, a SELECT returns records immediately.

Thank you,
Jan

-----Original Message-----
From: Alban Hertroys [mailto: alban@magproductions.nl]
Sent: Monday, January 15, 2007 12:49
To: Jan van der Weijde
Cc: Richard Huxton; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes

> a very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP
> machine and in my case it took about 25 seconds before the select
> returned the first record. I tried it both interactively with pgAdmin
> and with a C-application using a cursor (with hold). Both took about
the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Performance with very large tables

From

Richard Huxton

Date:

15 January 2007, 08:54:54

Jan van der Weijde wrote:
> That is exactly the problem I think. However I do not deliberately
> retrieve the entire table. I use the default settings of the PostgreSQL
> installation and just execute a simple SELECT * FROM table.
> I am using a separate client and server (both XP in the test
> environment), but that should not make much difference.
> I would expect that the default behavior of PostgreSQL should be such
> that without LIMIT, a SELECT returns records immediately.

No it doesn't. You've asked for all the records, so it assumes you want
all the records. If you want a few at a time, use a cursor.
--
   Richard Huxton
   Archonet Ltd

Re: Performance with very large tables

From

"Gregory S. Williamson"

Date:

15 January 2007, 09:02:50

Limit is somewhat magical ... at least to a degree. Not sure about cursors since I am not currently using them.

select count(*) from bill_rpt_work;
  count
---------
 2317451
(1 row)

Time: 1709.829 ms

billing=# \d bill_rpt_work
          Table "reporting.bill_rpt_work"
    Column     |         Type          | Modifiers
---------------+-----------------------+-----------
 report_id     | integer               |
 client_id     | character varying(10) |
 contract_id   | integer               | not null
 rate          | numeric               | not null
 appid         | character varying(10) | not null
 userid        | text                  | not null
 collection_id | integer               | not null
 client_name   | character varying(60) |
 use_sius      | integer               | not null
 is_subscribed | integer               | not null
 hits          | numeric               | not null
 sius          | numeric               | not null
 total_amnt    | numeric               | not null
 royalty_total | numeric               |
Indexes:
    "billrptw_ndx" UNIQUE, btree (report_id, client_id, contract_id, rate, appid, userid, collection_id)
    "billrpt_cntrct_ndx" btree (report_id, contract_id, client_id)
    "billrpt_collid_ndx" btree (report_id, collection_id, client_id, contract_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (report_id) REFERENCES billing_reports(report_id)
    "$2" FOREIGN KEY (client_id) REFERENCES "work".clients(client_id)

billing=# select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id;
Cancel request sent

after more than 10 seconds

THEN:

select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id limit 1000;
 report_id | client_id  | contract_id |  rate  |  appid   |                    userid
           | collection_id |                 client_name                 | use_sius | is_subscr
ibed |    hits     |          sius          |         total_amnt         |       royalty_total

-----------+------------+-------------+--------+----------+------------------------------------
-----------+---------------+---------------------------------------------+----------+----------
-----+-------------+------------------------+----------------------------+---------------------
-------
<...deleted details...>
Time: 52.745 ms

THEN:

billing=#  select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id
limit100000; 
 report_id | client_id  | contract_id |   rate   |  appid   |                                 u
serid                                 | collection_id |                     client_name
             | use_sius | is_subscribed |     hits     |          sius          |         total
_amnt         |       royalty_total
-----------+------------+-------------+----------+----------+----------------------------------
--------------------------------------+---------------+----------------------------------------
-------------+----------+---------------+--------------+------------------------+--------------
--------------+----------------------------
<...deleted details...>
Time: 1043.582 ms

Noticibly longer but not bad ...

But with no limit it takes quite a while:
select * from bill_rpt_work order by report_id, client_id, contract_id, rate, appid, userid, collection_id;
<...>
Time: 132033.171 ms

but with a limit, even a fairly large one (a little less than half the table) it was markedly faster. Maybe try more
thanhalf, see if there's a limit to what you can do ... 

YMMV, HTH, yadda yadda ...

Greg Williamson
DBA
GlobeXplorer LLC (part of Digital Globe Inc.)

-----Original Message-----
From:    pgsql-general-owner@postgresql.org on behalf of Jan van der Weijde
Sent:    Mon 1/15/2007 4:44 AM
To:    Alban Hertroys
Cc:    pgsql-general@postgresql.org
Subject:    Re: [GENERAL] Performance with very large tables

That is exactly the problem I think. However I do not deliberately
retrieve the entire table. I use the default settings of the PostgreSQL
installation and just execute a simple SELECT * FROM table.
I am using a separate client and server (both XP in the test
environment), but that should not make much difference.
I would expect that the default behavior of PostgreSQL should be such
that without LIMIT, a SELECT returns records immediately.

Thank you,
Jan

-----Original Message-----
From: Alban Hertroys [mailto:alban@magproductions.nl]
Sent: Monday, January 15, 2007 12:49
To: Jan van der Weijde
Cc: Richard Huxton; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes

> a very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP
> machine and in my case it took about 25 seconds before the select
> returned the first record. I tried it both interactively with pgAdmin
> and with a C-application using a cursor (with hold). Both took about
the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45ab760c142921470421014&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:45ab760c142921470421014!
-------------------------------------------------------

Re: Performance with very large tables

From

"Shoaib Mir"

Date:

15 January 2007, 09:04:46

An old post on the archives might help you --> http://archives.postgresql.org/pgsql-interfaces/2005-11/msg00010.php that was an attempt to simulate functionality similar to setFetchSize in JDBC.

---------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 1/15/07, Jan van der Weijde < Jan.van.der.Weijde@attachmate.com> wrote:

Unfortunately a large C program has already been written.. But if a function like PQsetFetchSize() was available in libpq, that would also solve the problem.

From: Shoaib Mir [mailto:shoaibmir@gmail.com]
Sent: Monday, January 15, 2007 13:49
To: Jan van der Weijde
Cc: Alban Hertroys; pgsql-general@postgresql.org

Subject: Re: [GENERAL] Performance with very large tables

If you go with Java, you can make it faster by using setFetchSize (JDBC functionality) from client and that will help you with the performance in case of fetching large amounts of data.

---------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 1/15/07, Jan van der Weijde < Jan.van.der.Weijde@attachmate.com> wrote:
That is exactly the problem I think. However I do not deliberately
retrieve the entire table. I use the default settings of the PostgreSQL
installation and just execute a simple SELECT * FROM table.
I am using a separate client and server (both XP in the test
environment), but that should not make much difference.
I would expect that the default behavior of PostgreSQL should be such
that without LIMIT, a SELECT returns records immediately.

Thank you,
Jan

-----Original Message-----
From: Alban Hertroys [mailto: alban@magproductions.nl]
Sent: Monday, January 15, 2007 12:49
To: Jan van der Weijde
Cc: Richard Huxton; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

Jan van der Weijde wrote:
> Thank you.
> It is true he want to have the first few record quickly and then
> continue with the next records. However without LIMIT it already takes

> a very long time before the first record is returned.
> I reproduced this with a table with 1.1 million records on an XP
> machine and in my case it took about 25 seconds before the select
> returned the first record. I tried it both interactively with pgAdmin
> and with a C-application using a cursor (with hold). Both took about
the same time.

Are you sure you don't retrieve the entire result set first, and only
start iterating it after that? Notably the fact that LIMIT changes this
behaviour seems to point in that direction.

A quick calculation shows that (provided my assumption holds true)
fetching each record takes about 12.5 usec on average (25s / 2m
records). A quick test on our dev-db fetches (~40k records) in 5 usec
average, so that looks reasonable to me (apples and oranges, I know).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Performance with very large tables

From

Shane Ambler

Date:

15 January 2007, 11:10:38

Jan van der Weijde wrote:
> That is exactly the problem I think. However I do not deliberately
> retrieve the entire table. I use the default settings of the PostgreSQL

You will want to increase the default settings and let PostgreSQL use as
much RAM as you have - especially when retrieving a large dataset.

Some good points of reference is -

http://www.powerpostgresql.com/Downloads/annotated_conf_80.html

http://www.varlena.com/GeneralBits/Tidbits/perf.html

http://www.powerpostgresql.com/PerfList

> installation and just execute a simple SELECT * FROM table.
> I am using a separate client and server (both XP in the test
> environment), but that should not make much difference.
> I would expect that the default behavior of PostgreSQL should be such
> that without LIMIT, a SELECT returns records immediately.
>
> Thank you,
> Jan


--

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

Re: Performance with very large tables

From

Bruno Wolff III

Date:

16 January 2007, 14:07:01

On Mon, Jan 15, 2007 at 11:52:29 +0100,
  Jan van der Weijde <Jan.van.der.Weijde@attachmate.com> wrote:
> Does anyone have a suggestion for this problem ? Is there for instance
> an alternative to LIMIT/OFFSET so that SELECT on large tables has a good
> performance ?

Depending on exactly what you want to happen, you may be able to continue
where you left off using a condition on the primary key, using the last
primary key value for a row that you have viewed, rather than OFFSET.
This will still be fast and will not skip rows that are now visible to
your transaction (or show duplicates when deleted rows are no longer visible
to your transaction).

Another option would be to do all of the selects in a single serializable
transaction. This will use the same snapshot for all of the selects, so
you won't have rows appear or disappear on you do to other concurrent
transactions.

Re: Performance with very large tables

From

Bruno Wolff III

Date:

16 January 2007, 14:12:17

On Tue, Jan 16, 2007 at 12:06:38 -0600,
  Bruno Wolff III <bruno@wolff.to> wrote:
>
> Depending on exactly what you want to happen, you may be able to continue
> where you left off using a condition on the primary key, using the last
> primary key value for a row that you have viewed, rather than OFFSET.
> This will still be fast and will not skip rows that are now visible to
> your transaction (or show duplicates when deleted rows are no longer visible
> to your transaction).

I should have mentioned that you also will need to use an ORDER BY clause
on the primary key when doing things this way.

Re: Performance with very large tables

From

"Jan van der Weijde"

Date:

23 January 2007, 12:13:27

Hi Bruno,

Good to read that your advice to me is the solution I was considering!
Although I think this is something PostgreSQL should solve internally, I
prefer the WHERE clause over a long lasting SERIALIZABLE transaction.

Thanks,
Jan

-----Original Message-----
From: Bruno Wolff III [mailto:bruno@wolff.to]
Sent: Tuesday, January 16, 2007 19:12
To: Jan van der Weijde; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance with very large tables

On Tue, Jan 16, 2007 at 12:06:38 -0600,
  Bruno Wolff III <bruno@wolff.to> wrote:
>
> Depending on exactly what you want to happen, you may be able to
continue
> where you left off using a condition on the primary key, using the
last
> primary key value for a row that you have viewed, rather than OFFSET.
> This will still be fast and will not skip rows that are now visible to
> your transaction (or show duplicates when deleted rows are no longer
visible
> to your transaction).

I should have mentioned that you also will need to use an ORDER BY
clause
on the primary key when doing things this way.