Thread: Anything I can do to speed up this query?
I have a table that has roughly 200,000 entries and many columns. The query is very simple: SELECT Field1, Field2, Field3... FieldN FROM TargetTable; TargetTable has an index that is Field1. The thing is on this machine with 1Gig Ram, the above query still takes about 20 seconds to finish. And I need it to run faster, ideally around 5 seconds. I already increased the following configurations: Test=> show sort_mem; sort_mem ---------- 262144 (1 row) Test=> show shared_buffers ; shared_buffers ---------------- 60800 (1 row) And the EXPLAIN ANALYZE result is also as follows: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on TargetTable (cost=0.00..28471.72 rows=210872 width=988) (actual time=0.037..6084.385 rows=211286 loops=1) Total runtime: 6520.499 ms what else can I improve? Thanks in advance Wei
On Tuesday 05 December 2006 3:56 pm, Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > The query is very simple: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > > TargetTable has an index that is Field1. > > The thing is on this machine with 1Gig Ram, the above query still takes > about 20 seconds to finish. And I need it to run faster, ideally around > 5 seconds. I suspect it's the time needed to ship the data over the wire. Don't forget you're asking for an awful lot of data... > > Wei > jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
Forgot to mention the version I am using. PostgreSQL 7.4.13 Thanks On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > The query is very simple: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > > TargetTable has an index that is Field1. > > The thing is on this machine with 1Gig Ram, the above query still takes > about 20 seconds to finish. And I need it to run faster, ideally around > 5 seconds. > > I already increased the following configurations: > > Test=> show sort_mem; > sort_mem > ---------- > 262144 > (1 row) > > Test=> show shared_buffers ; > shared_buffers > ---------------- > 60800 > (1 row) > > > And the EXPLAIN ANALYZE result is also as follows: > > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on TargetTable (cost=0.00..28471.72 rows=210872 width=988) > (actual time=0.037..6084.385 rows=211286 loops=1) > Total runtime: 6520.499 ms > > what else can I improve? Thanks in advance > > > Wei > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
I am running this in the same machine as the database though. Thanks On Tue, 2006-12-05 at 16:02 -0500, Jan de Visser wrote: > On Tuesday 05 December 2006 3:56 pm, Wei Weng wrote: > > I have a table that has roughly 200,000 entries and many columns. > > > > The query is very simple: > > > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > > > > TargetTable has an index that is Field1. > > > > The thing is on this machine with 1Gig Ram, the above query still takes > > about 20 seconds to finish. And I need it to run faster, ideally around > > 5 seconds. > > I suspect it's the time needed to ship the data over the wire. Don't forget > you're asking for an awful lot of data... > > > > > Wei > > > > jan >
On Tue, 2006-12-05 at 14:56, Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > The query is very simple: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > > TargetTable has an index that is Field1. > > The thing is on this machine with 1Gig Ram, the above query still takes > about 20 seconds to finish. And I need it to run faster, ideally around > 5 seconds. You're basically asking for everything in the table, right? If you're not using a cursor, then it's gonna take the time to grab the data then transfer it across the wire. If you wrap that query in a cursor: begin; declare bubba cursor for select * from targettable; fetch 100 from bubba; -- repeat as necessary commit; -- or rollback, doesn't really matter. That way you can start getting data before the whole result set is returned. You won't get the data any faster, but you can start spewing it at the user almost immediately. Which makes is feel faster.
On Dec 5, 2006, at 21:56 , Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > The query is very simple: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; This is the very definition of a sequential scan: you're reading 200,000 rows from that table, and the performance of doing this is constrained by the amount of time PostgreSQL can read the data from (at worst) disk or (at best) the disk cache. It's bound to be slow on any database system. > Test=> show shared_buffers ; > shared_buffers > ---------------- > 60800 > (1 row) That's 475MB. How large is your table? You can find out with this query: select relpages * 8192 from pg_class where relname ilike 'TargetTable'; Note that unlike the 8.x series, 7.4 apparently doesn't use shared_buffers that much for caching tuples across queries. In other words, a large shared_buffers setting might not have much of an effect. Also: How much memory is left available to the OS cache? Alexander.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/05/06 14:56, Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > The query is very simple: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > > TargetTable has an index that is Field1. > > The thing is on this machine with 1Gig Ram, the above query still takes > about 20 seconds to finish. And I need it to run faster, ideally around > 5 seconds. > > I already increased the following configurations: > [snip] > And the EXPLAIN ANALYZE result is also as follows: > > QUERY > PLAN > ----------------------------------------------------------------------- > Seq Scan on TargetTable (cost=0.00..28471.72 rows=210872 width=988) > (actual time=0.037..6084.385 rows=211286 loops=1) > Total runtime: 6520.499 ms > > what else can I improve? Thanks in advance Are you dumping this to the screen or {file|application}? Video is slow, so that would take time. Also, MVCC seems to fragment data in "very multi-user" situations. Maybe the data is not localized on disk? As for the Seq Scan, Alexander is correct. What is the point of using an index if there are no aggregates or WHERE, ORDER BY, GROUP BY, etc clauses? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFde09S9HxQb37XmcRAtTtAJ0Szm1TNRrtQooByAlwQA+5LIKxwwCgsZL1 x+qg5JXCgTbkwju/8WxIQ4o= =npoa -----END PGP SIGNATURE-----
On þri, 2006-12-05 at 15:56 -0500, Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > The thing is on this machine with 1Gig Ram, the above query still takes > about 20 seconds to finish. And I need it to run faster, ideally around > 5 seconds. > Test=> show shared_buffers ; > shared_buffers > ---------------- > 60800 do you mean that you want the data to be cached, or do you need it to be fast the first time? if you want it to be cached you might be better served with a lower shared buffers, to leave more memory tothe OS cache. If you just need more speed reading from the disks, you probably just need a faster disk subsystem, although should should make sure the table does not contain a lot of dead rows, by doing a VACUUM FULL or a CLUSTER once before trying again. If Field1, Field2 ... FieldN are a small subset of the tables row-width, you should consider vertical partitioning, to minimize IO needed for this particular query, although this will not help much if the subset is not fixed. gnari
Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > The query is very simple: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > > TargetTable has an index that is Field1. > > The thing is on this machine with 1Gig Ram, the above query still takes > about 20 seconds to finish. And I need it to run faster, ideally around > 5 seconds. > ------------------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on TargetTable (cost=0.00..28471.72 rows=210872 width=988) > (actual time=0.037..6084.385 rows=211286 loops=1) > Total runtime: 6520.499 ms Thats 988 * 211286 =~ 200MB of data. Since the explain-analyse completes in 6.5secs that would mean you're spending 13.5 seconds building the result-set, transferring it and processing it at the client end. That will take up at least 400MB of RAM (realistically more) - I'd suggest you'd be better off with a cursor, unless you really need the whole thing in one go. If you do need all the data at once, you'll want a faster CPU and faster RAM I guess. -- Richard Huxton Archonet Ltd
On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote: > I have a table that has roughly 200,000 entries and many columns. > > The query is very simple: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable; > > TargetTable has an index that is Field1. > I think I have discovered the reason for why the query runs so slow. The original query has an ORDER BY Field1 clause that I forgot to put in my email. So the query looks like this: SELECT Field1, Field2, Field3... FieldN FROM TargetTable ORDER BY Field1 DESC; What is the effective way to optimize this query(or to optimize the system) to run a little faster than it does now? Thanks and I really appreciate all the helps I've gotten so far. Wei
On mið, 2006-12-06 at 17:34 -0500, Wei Weng wrote: > On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote: > > I have a table that has roughly 200,000 entries and many columns. >... > I think I have discovered the reason for why the query runs so slow. The > original query has an ORDER BY Field1 clause that I forgot to put in my > email. > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable ORDER BY Field1 > DESC; > in that case you did not provide us with a useful explain analyze > What is the effective way to optimize this query(or to optimize the > system) to run a little faster than it does now? you might consider CLUSTER gnari
On Dec 6, 2006, at 2:34 PM, Wei Weng wrote: > On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote: >> I have a table that has roughly 200,000 entries and many columns. >> >> The query is very simple: >> >> SELECT Field1, Field2, Field3... FieldN FROM TargetTable; >> >> TargetTable has an index that is Field1. >> > > I think I have discovered the reason for why the query runs so > slow. The > original query has an ORDER BY Field1 clause that I forgot to put > in my > email. > > So the query looks like this: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable ORDER BY > Field1 > DESC; > > What is the effective way to optimize this query(or to optimize the > system) to run a little faster than it does now? > > Thanks and I really appreciate all the helps I've gotten so far. clustering the table on the index used for the sort might help, but likely performance of the above is probably worse than your original example due to the added random access overhead caused by the index scan (assuming the sort uses an index). If the table changes infrequently, you could consider creating an in-order copy of the data (using INSERT INTO) so that you don't need to use an index to order it. If you need the full list of results before the application can do anything (thus cursors won't help), then you'll either need to reduce the amount of data scanned and returned (optimize fields for byte size, move any fields not needed by this query to another table, reduce the number of rows in the table, etc) or scan it faster (faster disks and cpu). If you have multiple cpus on the database side, and you are not I/O bound, you could consider using table partitioning to break the table up into pieces where each piece contains a fraction of the whole (say one quarter). Then issue four concurrent queries from the application and reassemble the results on that end. You might not need to use table partitioning if you can efficiently determine the "cleave" points at run time. Then you would just use use a where clause to select the proper range. -Casey
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/06/06 16:34, Wei Weng wrote: > On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote: [snip] > I think I have discovered the reason for why the query runs so slow. The > original query has an ORDER BY Field1 clause that I forgot to put in my > email. > > So the query looks like this: > > SELECT Field1, Field2, Field3... FieldN FROM TargetTable ORDER BY Field1 > DESC; > > What is the effective way to optimize this query(or to optimize the > system) to run a little faster than it does now? > > Thanks and I really appreciate all the helps I've gotten so far. Is the index ASC or DESC? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFd3sLS9HxQb37XmcRAttuAKCnSEXoXcK+CCCaa3yIX+FXS/NqtwCgzlZS IQ4C7hbsCXiFLGkhrPQGGUA= =4z9T -----END PGP SIGNATURE-----
[ Marcus, you should folow up to the lists, so that other people can benefit from the discussion ] On fim, 2006-12-07 at 09:25 +0100, Marcus Engene wrote: > Ragnar skrev: > > On mið, 2006-12-06 at 17:34 -0500, Wei Weng wrote: > >> On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote: > >>> I have a table that has roughly 200,000 entries and many columns. > >> > >> SELECT Field1, Field2, Field3... FieldN FROM TargetTable ORDER BY Field1 > >> DESC; > > > > > > you might consider CLUSTER > > Would putting the index on a separate tablespace on another harddisk > have a similar effect? we haven't had any real information from the OP. as far as I can tell, he has not answered any questions about his case, so we really have no idea where his problem is. if he has extra harddisks that are not used, there may be many ways of taking advantage of that. gnari