Thread: TB-sized databases
Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize them (mostly hardware and config parameters, maybe a little advocacy). I can't speak on that since I don't have any DBs approaching that size. The other part of this puzzle is that he's torn between MS SQL Server (running on Windows and unsupported by us) and PostgreSQL (running on Linux...which we would fully support). If any of you have ideas of how well PostgreSQL compares to SQL Server, especially in TB-sized databases, that would be much appreciated. We're running PG 8.2.5, by the way. Peter
Peter Koczan wrote: > Hi all, > > I have a user who is looking to store 500+ GB of data in a database > (and when all the indexes and metadata are factored in, it's going to > be more like 3-4 TB). He is wondering how well PostgreSQL scales with > TB-sized databases and what can be done to help optimize them (mostly > hardware and config parameters, maybe a little advocacy). I can't > speak on that since I don't have any DBs approaching that size. > > The other part of this puzzle is that he's torn between MS SQL Server > (running on Windows and unsupported by us) and PostgreSQL (running on > Linux...which we would fully support). If any of you have ideas of how > well PostgreSQL compares to SQL Server, especially in TB-sized > databases, that would be much appreciated. > > We're running PG 8.2.5, by the way. Well I can't speak to MS SQL-Server because all of our clients run PostgreSQL ;).. I can tell you we have many that are in the 500GB - 1.5TB range. All perform admirably as long as you have the hardware behind it and are doing correct table structuring (such as table partitioning). Sincerely, Joshua D. Drake > > Peter > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
We have several TB database in production and it works well on HP rx1620 dual Itanium2, MSA 20, running Linux. It's read-only storage for astronomical catalogs with about 4-billions objects. We have custom index for spherical coordinates which provide great performance. Oleg On Mon, 26 Nov 2007, Peter Koczan wrote: > Hi all, > > I have a user who is looking to store 500+ GB of data in a database > (and when all the indexes and metadata are factored in, it's going to > be more like 3-4 TB). He is wondering how well PostgreSQL scales with > TB-sized databases and what can be done to help optimize them (mostly > hardware and config parameters, maybe a little advocacy). I can't > speak on that since I don't have any DBs approaching that size. > > The other part of this puzzle is that he's torn between MS SQL Server > (running on Windows and unsupported by us) and PostgreSQL (running on > Linux...which we would fully support). If any of you have ideas of how > well PostgreSQL compares to SQL Server, especially in TB-sized > databases, that would be much appreciated. > > We're running PG 8.2.5, by the way. > > Peter > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
I had a client that tried to use Ms Sql Server to run a 500Gb+ database. The database simply colapsed. They switched to Teradata and it is running good. This database has now 1.5Tb+. Currently I have clients using postgresql huge databases and they are happy. In one client's database the biggest table has 237Gb+ (only 1 table!) and postgresql run the database without problem using partitioning, triggers and rules (using postgresql 8.2.5). Pablo Peter Koczan wrote: > Hi all, > > I have a user who is looking to store 500+ GB of data in a database > (and when all the indexes and metadata are factored in, it's going to > be more like 3-4 TB). He is wondering how well PostgreSQL scales with > TB-sized databases and what can be done to help optimize them (mostly > hardware and config parameters, maybe a little advocacy). I can't > speak on that since I don't have any DBs approaching that size. > > The other part of this puzzle is that he's torn between MS SQL Server > (running on Windows and unsupported by us) and PostgreSQL (running on > Linux...which we would fully support). If any of you have ideas of how > well PostgreSQL compares to SQL Server, especially in TB-sized > databases, that would be much appreciated. > > We're running PG 8.2.5, by the way. > > Peter > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > >
I think either would work; both PostgreSQL and MS SQL Server have success stories out there running VLDBs. It really depends on what you know and what you have. If you have a lot of experience with Postgres running on Linux, and not much with SQL Server on Windows, of course the former would be a better choice for you. You stand a much better chance working with tools you know. Pablo Alcaraz wrote: > I had a client that tried to use Ms Sql Server to run a 500Gb+ database. > The database simply colapsed. They switched to Teradata and it is > running good. This database has now 1.5Tb+. > > Currently I have clients using postgresql huge databases and they are > happy. In one client's database the biggest table has 237Gb+ (only 1 > table!) and postgresql run the database without problem using > partitioning, triggers and rules (using postgresql 8.2.5). > > Pablo > > Peter Koczan wrote: >> Hi all, >> >> I have a user who is looking to store 500+ GB of data in a database >> (and when all the indexes and metadata are factored in, it's going to >> be more like 3-4 TB). He is wondering how well PostgreSQL scales with >> TB-sized databases and what can be done to help optimize them (mostly >> hardware and config parameters, maybe a little advocacy). I can't >> speak on that since I don't have any DBs approaching that size. >> >> The other part of this puzzle is that he's torn between MS SQL Server >> (running on Windows and unsupported by us) and PostgreSQL (running on >> Linux...which we would fully support). If any of you have ideas of how >> well PostgreSQL compares to SQL Server, especially in TB-sized >> databases, that would be much appreciated. >> >> We're running PG 8.2.5, by the way. >> >> Peter >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Thanks all. This is just what I needed. On Nov 26, 2007 1:16 PM, Stephen Cook <sclists@gmail.com> wrote: > I think either would work; both PostgreSQL and MS SQL Server have > success stories out there running VLDBs. It really depends on what you > know and what you have. If you have a lot of experience with Postgres > running on Linux, and not much with SQL Server on Windows, of course the > former would be a better choice for you. You stand a much better chance > working with tools you know. > > > > Pablo Alcaraz wrote: > > I had a client that tried to use Ms Sql Server to run a 500Gb+ database. > > The database simply colapsed. They switched to Teradata and it is > > running good. This database has now 1.5Tb+. > > > > Currently I have clients using postgresql huge databases and they are > > happy. In one client's database the biggest table has 237Gb+ (only 1 > > table!) and postgresql run the database without problem using > > partitioning, triggers and rules (using postgresql 8.2.5). > > > > Pablo > > > > Peter Koczan wrote: > >> Hi all, > >> > >> I have a user who is looking to store 500+ GB of data in a database > >> (and when all the indexes and metadata are factored in, it's going to > >> be more like 3-4 TB). He is wondering how well PostgreSQL scales with > >> TB-sized databases and what can be done to help optimize them (mostly > >> hardware and config parameters, maybe a little advocacy). I can't > >> speak on that since I don't have any DBs approaching that size. > >> > >> The other part of this puzzle is that he's torn between MS SQL Server > >> (running on Windows and unsupported by us) and PostgreSQL (running on > >> Linux...which we would fully support). If any of you have ideas of how > >> well PostgreSQL compares to SQL Server, especially in TB-sized > >> databases, that would be much appreciated. > >> > >> We're running PG 8.2.5, by the way. > >> > >> Peter > >> > >> ---------------------------(end of broadcast)--------------------------- > >> TIP 4: Have you searched our list archives? > >> > >> http://archives.postgresql.org > >> > >> > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Tue, 2007-11-27 at 14:18 -0600, Peter Koczan wrote: > Thanks all. This is just what I needed. All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance critical. So it all depends upon your workload really as to how well PostgreSQL, or another other RDBMS vendor can handle them. Anyway, my reason for replying to this thread is that I'm planning changes for PostgreSQL 8.4+ that will make allow us to get bigger and faster databases. If anybody has specific concerns then I'd like to hear them so I can consider those things in the planning stages. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > All of those responses have cooked up quite a few topics into one. Large > databases might mean text warehouses, XML message stores, relational > archives and fact-based business data warehouses. > > The main thing is that TB-sized databases are performance critical. So > it all depends upon your workload really as to how well PostgreSQL, or > another other RDBMS vendor can handle them. > > > Anyway, my reason for replying to this thread is that I'm planning > changes for PostgreSQL 8.4+ that will make allow us to get bigger and > faster databases. If anybody has specific concerns then I'd like to hear > them so I can consider those things in the planning stages it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. In my opinion, by definition, a huge database sooner or later will have tables far bigger than RAM available (same for their indexes). I think the queries need to be solved using indexes enough smart to be fast on disk. Pablo
On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > it would be nice to do something with selects so we can recover a rowset > on huge tables using a criteria with indexes without fall running a full > scan. You mean: Be able to tell Postgres "Don't ever do a sequential scan of this table. It's silly. I would rather the query failed than have to wait for a sequential scan of the entire table." Yes, that would be really useful, if you have huge tables in your database. Matthew -- Trying to write a program that can't be written is... well, it can be an enormous amount of fun! -- Computer Science Lecturer
In response to Matthew <matthew@flymine.org>: > On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > > it would be nice to do something with selects so we can recover a rowset > > on huge tables using a criteria with indexes without fall running a full > > scan. > > You mean: Be able to tell Postgres "Don't ever do a sequential scan of > this table. It's silly. I would rather the query failed than have to wait > for a sequential scan of the entire table." > > Yes, that would be really useful, if you have huge tables in your > database. Is there something wrong with: set enable_seqscan = off ? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: > Is there something wrong with: > set enable_seqscan = off > ? Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... if you want the big table to never use seqscan, but a medium table which is joined in should use it, then what you do ? And setting enable_seqscan = off will actually not mean the planner can't use a sequential scan for the query if no other alternative exist. In any case it doesn't mean "please throw an error if you can't do this without a sequential scan". In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Cheers, Csaba.
In response to Csaba Nagy <nagy@ecircle-ag.com>: > On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: > > Is there something wrong with: > > set enable_seqscan = off > > ? > > Nothing wrong with enable_seqscan = off except it is all or nothing type > of thing... If that's true, then I have a bug report to file: test=# set enable_seqscan=off; SET test=# show enable_seqscan; enable_seqscan ---------------- off (1 row) test=# set enable_seqscan=on; SET test=# show enable_seqscan; enable_seqscan ---------------- on (1 row) It looks to me to be session-alterable. > if you want the big table to never use seqscan, but a medium > table which is joined in should use it, then what you do ? And setting > enable_seqscan = off will actually not mean the planner can't use a > sequential scan for the query if no other alternative exist. In any case > it doesn't mean "please throw an error if you can't do this without a > sequential scan". True. It would still choose some other plan. > In fact an even more useful option would be to ask the planner to throw > error if the expected cost exceeds a certain threshold... Interesting concept. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
"Bill Moran" <wmoran@collaborativefusion.com> writes: > In response to Matthew <matthew@flymine.org>: > >> On Tue, 27 Nov 2007, Pablo Alcaraz wrote: >> > it would be nice to do something with selects so we can recover a rowset >> > on huge tables using a criteria with indexes without fall running a full >> > scan. >> >> You mean: Be able to tell Postgres "Don't ever do a sequential scan of >> this table. It's silly. I would rather the query failed than have to wait >> for a sequential scan of the entire table." >> >> Yes, that would be really useful, if you have huge tables in your >> database. > > Is there something wrong with: > set enable_seqscan = off > ? This does kind of the opposite of what you would actually want here. What you want is that if you give it a query which would be best satisfied by a sequential scan it should throw an error since you've obviously made an error in the query. What this does is it forces such a query to use an even *slower* method such as a large index scan. In cases where there isn't any other method it goes ahead and does the sequential scan anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
On Wed, 28 Nov 2007, Csaba Nagy wrote: > On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: >> Is there something wrong with: >> set enable_seqscan = off >> ? > > Nothing wrong with enable_seqscan = off except it is all or nothing type > of thing... if you want the big table to never use seqscan, but a medium > table which is joined in should use it, then what you do ? And setting > enable_seqscan = off will actually not mean the planner can't use a > sequential scan for the query if no other alternative exist. In any case > it doesn't mean "please throw an error if you can't do this without a > sequential scan". > > In fact an even more useful option would be to ask the planner to throw > error if the expected cost exceeds a certain threshold... and even better if the option can be overridden for a specific transaction or connection. that way it can be set relativly low for normal operations, but when you need to do an expensive query you can change it for that query. David Lang
In response to Gregory Stark <stark@enterprisedb.com>: > "Bill Moran" <wmoran@collaborativefusion.com> writes: > > > In response to Matthew <matthew@flymine.org>: > > > >> On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > >> > it would be nice to do something with selects so we can recover a rowset > >> > on huge tables using a criteria with indexes without fall running a full > >> > scan. > >> > >> You mean: Be able to tell Postgres "Don't ever do a sequential scan of > >> this table. It's silly. I would rather the query failed than have to wait > >> for a sequential scan of the entire table." > >> > >> Yes, that would be really useful, if you have huge tables in your > >> database. > > > > Is there something wrong with: > > set enable_seqscan = off > > ? > > This does kind of the opposite of what you would actually want here. What you > want is that if you give it a query which would be best satisfied by a > sequential scan it should throw an error since you've obviously made an error > in the query. > > What this does is it forces such a query to use an even *slower* method such > as a large index scan. In cases where there isn't any other method it goes > ahead and does the sequential scan anyways. Ah. I misunderstood the intent of the comment. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
On Wed, 2007-11-28 at 08:54 -0500, Bill Moran wrote: > > Nothing wrong with enable_seqscan = off except it is all or nothing type > > of thing... > > If that's true, then I have a bug report to file: [snip] > It looks to me to be session-alterable. I didn't mean that it can't be set per session, I meant that it is not fine grained enough to select the affected table but it affects _all_ tables in a query... and big tables are rarely alone in a query. Cheers, Csaba.
Matthew wrote: > On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > >> it would be nice to do something with selects so we can recover a rowset >> on huge tables using a criteria with indexes without fall running a full >> scan. >> > > You mean: Be able to tell Postgres "Don't ever do a sequential scan of > this table. It's silly. I would rather the query failed than have to wait > for a sequential scan of the entire table." > > Yes, that would be really useful, if you have huge tables in your > database. > Thanks. That would be nice too. I want that Postgres does not fall so easy to do sequential scan if a field are indexed. if it concludes that the index is *huge* and it does not fit in ram I want that Postgresql uses the index anyway because the table is *more than huge* and a sequential scan will take hours. I ll put some examples in a next mail. Regards Pablo
On Wed, 28 Nov 2007, Gregory Stark wrote: > > Is there something wrong with: > > set enable_seqscan = off > > ? > > This does kind of the opposite of what you would actually want here. What you > want is that if you give it a query which would be best satisfied by a > sequential scan it should throw an error since you've obviously made an error > in the query. > > What this does is it forces such a query to use an even *slower* method such > as a large index scan. In cases where there isn't any other method it goes > ahead and does the sequential scan anyways. The query planner is not always right. I would like an option like "set enable_seqscan = off" but with the added effect of making Postgres return an error if there is no alternative to scanning the whole table, because I have obviously made a mistake setting up my indexes. I would effectively be telling Postgres "For this table, I *know* that a full table scan is dumb for all of my queries, even if the statistics say otherwise." Of course, it would have to be slightly intelligent, because there are circumstances where a sequential scan doesn't necessarily mean a full table scan (for instance if there is a LIMIT), and where an index scan *does* mean a full table scan (for instance, selecting the whole table and ordering by an indexed field). Matthew -- Existence is a convenient concept to designate all of the files that an executable program can potentially process. -- Fortran77 standard
Pablo Alcaraz wrote: > Simon Riggs wrote: >> All of those responses have cooked up quite a few topics into one. Large >> databases might mean text warehouses, XML message stores, relational >> archives and fact-based business data warehouses. >> >> The main thing is that TB-sized databases are performance critical. So >> it all depends upon your workload really as to how well PostgreSQL, or >> another other RDBMS vendor can handle them. >> >> >> Anyway, my reason for replying to this thread is that I'm planning >> changes for PostgreSQL 8.4+ that will make allow us to get bigger and >> faster databases. If anybody has specific concerns then I'd like to hear >> them so I can consider those things in the planning stages > it would be nice to do something with selects so we can recover a > rowset on huge tables using a criteria with indexes without fall > running a full scan. > > In my opinion, by definition, a huge database sooner or later will > have tables far bigger than RAM available (same for their indexes). I > think the queries need to be solved using indexes enough smart to be > fast on disk. > > Pablo I am dealing with a very huge database. I am not sure if all these things could be solved with the current Postgres version using somes configuration parameters. I ll be happy to read your suggestions and ideas about these queries. In my opinion there are queries that I think they ll need to be tuned for "huge databases" (huge databases = a database which relevant tables(indexes) are (will be) far bigger that all the ram available): -- example table CREATE TABLE homes ( id bigserial, name text, location text, bigint money_win, int zipcode; ); CREATE INDEX money_win_idx ON homes(money_win); CREATE INDEX zipcode_idx ON homes(zipcode); SELECT max( id) from homes; I think the information to get the max row quickly could be found using the pk index. Idem min( id). SELECT max( id) from homes WHERE id > 8000000000; Same, but useful to find out the same thing in partitioned tables (using id like partition criteria). It would be nice if Postgres would not need the WHERE clause to realize it does not need to scan every single partition, but only the last. Idem min(id). SELECT * from homes WHERE money_win = 1300000000; Postgres thinks too easily to solve these kind of queries that it must to do a sequential scan where the table (or the index) does not fix in memory if the number of rows is not near 1 (example: if the query returns 5000 rows). Same case with filters like 'WHERE money_win >= xx', 'WHERE money_win BETWEEN xx AND yy'. But I do not know if this behavior is because I did a wrong posgresql's configuration or I missed something. SELECT count( *) from homes; it would be *cute* that Postgres stores this value and only recalculate if it thinks the stored value is wrong (example: after an anormal shutdown). SELECT zipcode, count( zipcode) as n from homes GROUP BY zipcode; it would be *very cute* that Postgres could store this value (or is this there?) on the index or wherever and it only recalculates if it thinks the stored value is wrong (example: after an anormal shutdown). In my opinion, partitioned tables in "huge databases" would be the usual, not the exception. It would be important (for me at least) that these queries could be fast solved when they run in partitioned tables. Maybe one or more of these queries could be solved using some kind of optimization. But I do not discover which ones (I ll be happy to read suggestions :D). I am sure a lot/all these queries could be solved using some kind of triggers/sequence to store information to solve the stuff. But in general the information is there right now (is it there?) and the queries only need that the server could look in the right place. A trigger/function using some pgsql supported languages probably will consume far more CPU resources to find out the same information that exist right now and we need to do it using transactions (more perfomance costs) only to be sure we are fine if the server has an anormal shutdown. Currently I have several 250Gb+ tables with billions of rows (little rows like the homes table example). I partitioned and distributed the partitions/index in different tablespaces, etc. I think "I did not need" so much partitions like I have right now (300+ for some tables and growing). I just would need enough partitions to distribute the tables in differents tablespaces. I did so much partitions because the perfomance with really big tables is not enough good for me when the programs run these kind of queries and the insert/update speed is worst and worst with the time. I hope that a couple of tables will be 1Tb+ in a few months... buy more and more RAM is an option but not a solution because eventually the database will be far bigger than ram available. Last but not least, it would be *excelent* that this kind of optimization would be posible without weird non standard sql sentences. I think that Postgresql would be better with huge databases if it can solve for itself these kind of queries in the fastest way or at least we are abled to tell it to choice a different criteria. I could help it using postgresql.conf to activate/deactivate some behavior or to use some system table to tell the criteria I want with some tables (like autovacuum does right now with table exception vacuums) or using non standard DDL to define that criteria. But the thing is that the programmers must be able to use standard SQL for selects/inserts/updates/deletes with 'where' and 'group by' clauses. In my case the programs are builded with java + JPA, so standard SQL (but no DDL) is important to keep the things like they are. :) Well, that's my 2cents feedback. Regards Pablo PD: Sorry my broken english.
Pablo Alcaraz escribió: > In my opinion there are queries that I think they ll need to be tuned for > "huge databases" (huge databases = a database which relevant > tables(indexes) are (will be) far bigger that all the ram available): > > -- example table > CREATE TABLE homes ( > id bigserial, > name text, > location text, > bigint money_win, > int zipcode; > ); > CREATE INDEX money_win_idx ON homes(money_win); > CREATE INDEX zipcode_idx ON homes(zipcode); Your example does not work, so I created my own for your first item. alvherre=# create table test (a int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE alvherre=# insert into test select * from generate_series(1, 100000); INSERT 0 100000 alvherre=# analyze test; ANALYZE > SELECT max( id) from homes; > I think the information to get the max row quickly could be found using the > pk index. Idem min( id). alvherre=# explain analyze select max(a) from test; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.03..0.04 rows=1 width=0) (actual time=0.054..0.057 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.03 rows=1 width=4) (actual time=0.041..0.043 rows=1 loops=1) -> Index Scan Backward using test_pkey on test (cost=0.00..3148.26 rows=100000 width=4) (actual time=0.034..0.034rows=1 loops=1) Filter: (a IS NOT NULL) Total runtime: 0.143 ms (6 rows) > SELECT max( id) from homes WHERE id > 8000000000; > Same, but useful to find out the same thing in partitioned tables (using id > like partition criteria). It would be nice if Postgres would not need the > WHERE clause to realize it does not need to scan every single partition, > but only the last. Idem min(id). Yeah, this could be improved. > SELECT * from homes WHERE money_win = 1300000000; > Postgres thinks too easily to solve these kind of queries that it must to > do a sequential scan where the table (or the index) does not fix in memory > if the number of rows is not near 1 (example: if the query returns 5000 > rows). Same case with filters like 'WHERE money_win >= xx', 'WHERE > money_win BETWEEN xx AND yy'. But I do not know if this behavior is because > I did a wrong posgresql's configuration or I missed something. There are thresholds to switch from index scan to seqscans. It depends on the selectivity of the clauses. > SELECT count( *) from homes; > it would be *cute* that Postgres stores this value and only recalculate if > it thinks the stored value is wrong (example: after an anormal shutdown). This is not as easy as you put it for reasons that have been discussed at length. I'll only say that there are workarounds to make counting quick. > SELECT zipcode, count( zipcode) as n from homes GROUP BY zipcode; > it would be *very cute* that Postgres could store this value (or is this > there?) on the index or wherever and it only recalculates if it thinks the > stored value is wrong (example: after an anormal shutdown). Same as above. > Last but not least, it would be *excelent* that this kind of optimization > would be posible without weird non standard sql sentences. Right. If you can afford to sponsor development, it could make them a reality sooner. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" "You're _really_ hosed if the person doing the hiring doesn't understand relational systems: you end up with a whole raft of programmers, none of whom has had a Date with the clue stick." (Andrew Sullivan)
On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > In fact an even more useful option would be to ask the planner to throw > error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people like me to re-write and resubmit this patch for 8.4? Tom's previous concerns were along the lines of "How would know what to set it to?", given that the planner costs are mostly arbitrary numbers. Any bright ideas, or is it we want it and we don't care about the possible difficulties? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: > Simon Riggs wrote: > > All of those responses have cooked up quite a few topics into one. Large > > databases might mean text warehouses, XML message stores, relational > > archives and fact-based business data warehouses. > > > > The main thing is that TB-sized databases are performance critical. So > > it all depends upon your workload really as to how well PostgreSQL, or > > another other RDBMS vendor can handle them. > > > > > > Anyway, my reason for replying to this thread is that I'm planning > > changes for PostgreSQL 8.4+ that will make allow us to get bigger and > > faster databases. If anybody has specific concerns then I'd like to hear > > them so I can consider those things in the planning stages > it would be nice to do something with selects so we can recover a rowset > on huge tables using a criteria with indexes without fall running a full > scan. > > In my opinion, by definition, a huge database sooner or later will have > tables far bigger than RAM available (same for their indexes). I think > the queries need to be solved using indexes enough smart to be fast on disk. OK, I agree with this one. I'd thought that index-only plans were only for OLTP, but now I see they can also make a big difference with DW queries. So I'm very interested in this area now. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Wed, 28 Nov 2007, Simon Riggs wrote: > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? Yes please. The more options, the better. > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. > > Any bright ideas, or is it we want it and we don't care about the > possible difficulties? I think this is something that the average person should just knuckle down and work out. At the moment on my work's system, we call EXPLAIN before queries to find out if it will take too long. This would improve performance by stopping us having to pass the query into the query planner twice. Matthew -- An ant doesn't have a lot of processing power available to it. I'm not trying to be speciesist - I wouldn't want to detract you from such a wonderful creature, but, well, there isn't a lot there, is there? -- Computer Science Lecturer
On Wed, 28 Nov 2007, Simon Riggs wrote: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... > > Well, I've suggested it before: > > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? > > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. arbitrary numbers are fine if they are relativly consistant with each other. will a plan with a estimated cost of 1,000,000 take approximatly 100 times as long as one with a cost of 10,000? or more importantly, will a plan with an estimated cost of 2000 reliably take longer then one with an estimated cost of 1000? David Lang > Any bright ideas, or is it we want it and we don't care about the > possible difficulties? > >
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... > > Well, I've suggested it before: > > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? > > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. Hm, that's only kind of true. Since 8.mumble seq_page_cost is itself configurable meaning you can adjust the base unit and calibrate all the parameters to be time in whatever unit you choose. But even assuming you haven't so adjusted seq_page_cost and all the other parameters to match the numbers aren't entirely arbitrary. They represent time in units of "however long a single sequential page read takes". Obviously few people know how long such a page read takes but surely you would just run a few sequential reads of large tables and set the limit to some multiple of whatever you find. This isn't going to precise to the level of being able to avoid executing any query which will take over 1000ms. But it is going to be able to catch unconstrained cross joins or large sequential scans or such. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Gregory Stark <stark@enterprisedb.com> writes: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> Tom's previous concerns were along the lines of "How would know what to >> set it to?", given that the planner costs are mostly arbitrary numbers. > Hm, that's only kind of true. The units are not the problem. The problem is that you are staking non-failure of your application on the planner's estimates being pretty well in line with reality. Not merely in line enough that it picks a reasonably cheap plan, but in line enough that if it thinks plan A is 10x more expensive than plan B, then the actual ratio is indeed somewhere near 10. Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what you really want to tolerate, but how useful is it then? regards, tom lane
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > "Simon Riggs" <simon@2ndquadrant.com> writes: > >> Tom's previous concerns were along the lines of "How would know what to > >> set it to?", given that the planner costs are mostly arbitrary numbers. > > > Hm, that's only kind of true. > > The units are not the problem. The problem is that you are staking > non-failure of your application on the planner's estimates being > pretty well in line with reality. Not merely in line enough that > it picks a reasonably cheap plan, but in line enough that if it > thinks plan A is 10x more expensive than plan B, then the actual > ratio is indeed somewhere near 10. > > Given that this list spends all day every day discussing cases where the > planner is wrong, I'd have to think that that's a bet I wouldn't take. I think you have a point, but the alternative is often much worse. If an SQL statement fails because of too high cost, we can investigate the problem and re-submit. If a website slows down because somebody allowed a very large query to execute then everybody is affected, not just the person who ran the bad query. Either way the guy that ran the query loses, but without constraints in place one guy can kill everybody else also. > You could probably avoid this risk by setting the cutoff at something > like 100 or 1000 times what you really want to tolerate, but how > useful is it then? Still fairly useful, as long as we understand its a blunt instrument. If the whole performance of your system depends upon indexed access then rogue queries can have disastrous, unpredictable consequences. Many sites construct their SQL dynamically, so a mistake in a seldom used code path can allow killer queries through. Even the best DBAs have been known to make mistakes. e.g. An 80GB table has 8 million blocks in it. - So putting a statement_cost limit = 1 million would allow some fairly large queries but prevent anything that did a SeqScan (or worse). - Setting it 10 million is going to prevent things like sorting the whole table without a LIMIT - Setting it at 100 million is going to prevent unconstrained product joins etc.. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
* Simon Riggs (simon@2ndquadrant.com) wrote: > On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > > Given that this list spends all day every day discussing cases where the > > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > I think you have a point, but the alternative is often much worse. I'm not convinced you've outlined the consequences of implementing a plan cost limit sufficiently. > If an SQL statement fails because of too high cost, we can investigate > the problem and re-submit. If a website slows down because somebody > allowed a very large query to execute then everybody is affected, not > just the person who ran the bad query. Either way the guy that ran the > query loses, but without constraints in place one guy can kill everybody > else also. It's entirely possible (likely even) that most of the users accessing a webpage are using the same queries and the same tables. If the estimates for those tables ends up changing enough that PG adjusts the plan cost to be above the plan cost limit then *all* of the users would be affected. The plan cost isn't going to change for just one user if it's the same query that a bunch of users are using. I'm not sure if handling the true 'rougue query' case with this limit would actually be a net improvment overall in a website-based situation. I could see it being useful to set a 'notice_on_high_cost_query' variable where someone working in a data warehouse situation would get a notice if the query he's hand-crafting has a very high cost (in which case he could ctrl-c it if he thinks something is wrong, rather than waiting 5 hours before realizing he forgot a join clause), but the website with the one rougue query run by one user seems a stretch. Thanks, Stephen
Attachment
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Given that this list spends all day every day discussing cases where the > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > You could probably avoid this risk by setting the cutoff at something > like 100 or 1000 times what you really want to tolerate, but how > useful is it then? It would still be useful in the sense that if the planner is taking wrong estimates you must correct it somehow... raise statistics target, rewrite query or other tweaking, you should do something. An error is sometimes better than gradually decreasing performance because of too low statistics target for example. So if the error is thrown because of wrong estimate, it is still a valid error raising a signal that the DBA has to do something about it. It's still true that if the planner estimates too low, it will raise no error and will take the resources. But that's just what we have now, so it wouldn't be a regression of any kind... Cheers, Csaba.
Simon Riggs wrote: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... >> > > Well, I've suggested it before: > > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? > > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. > > Any bright ideas, or is it we want it and we don't care about the > possible difficulties? > > Knowing how to set it is a problem - but a possibly bigger one is that monster query crippling your DW system, so I'd say lets have it. Cheers Mark
Simon Riggs wrote: > On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: > >> Simon Riggs wrote: >> >>> All of those responses have cooked up quite a few topics into one. Large >>> databases might mean text warehouses, XML message stores, relational >>> archives and fact-based business data warehouses. >>> >>> The main thing is that TB-sized databases are performance critical. So >>> it all depends upon your workload really as to how well PostgreSQL, or >>> another other RDBMS vendor can handle them. >>> >>> >>> Anyway, my reason for replying to this thread is that I'm planning >>> changes for PostgreSQL 8.4+ that will make allow us to get bigger and >>> faster databases. If anybody has specific concerns then I'd like to hear >>> them so I can consider those things in the planning stages >>> >> it would be nice to do something with selects so we can recover a rowset >> on huge tables using a criteria with indexes without fall running a full >> scan. >> >> In my opinion, by definition, a huge database sooner or later will have >> tables far bigger than RAM available (same for their indexes). I think >> the queries need to be solved using indexes enough smart to be fast on disk. >> > > OK, I agree with this one. > > I'd thought that index-only plans were only for OLTP, but now I see they > can also make a big difference with DW queries. So I'm very interested > in this area now. > > If that's true, then you want to get behind the work Gokulakannan Somasundaram (http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php) has done with relation to thick indexes. I would have thought that concept particularly useful in DW. Only having to scan indexes on a number of join tables would be a huge win for some of these types of queries. My tiny point of view would say that is a much better investment than setting up the proposed parameter. I can see the use of the parameter though. Most of the complaints about indexes having visibility is about update /delete contention. I would expect in a DW that those things aren't in the critical path like they are in many other applications. Especially with partitioning and previous partitions not getting may updates, I would think there could be great benefit. I would think that many of Pablo's requests up-thread would get significant performance benefit from this type of index. But as I mentioned at the start, that's my tiny point of view and I certainly don't have the resources to direct what gets looked at for PostgreSQL. Regards Russell Smith
On Fri, 2007-11-30 at 17:41 +1100, Russell Smith wrote: > Simon Riggs wrote: > > On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: > > > >> Simon Riggs wrote: > >> > >>> All of those responses have cooked up quite a few topics into one. Large > >>> databases might mean text warehouses, XML message stores, relational > >>> archives and fact-based business data warehouses. > >>> > >>> The main thing is that TB-sized databases are performance critical. So > >>> it all depends upon your workload really as to how well PostgreSQL, or > >>> another other RDBMS vendor can handle them. > >>> > >>> > >>> Anyway, my reason for replying to this thread is that I'm planning > >>> changes for PostgreSQL 8.4+ that will make allow us to get bigger and > >>> faster databases. If anybody has specific concerns then I'd like to hear > >>> them so I can consider those things in the planning stages > >>> > >> it would be nice to do something with selects so we can recover a rowset > >> on huge tables using a criteria with indexes without fall running a full > >> scan. > >> > >> In my opinion, by definition, a huge database sooner or later will have > >> tables far bigger than RAM available (same for their indexes). I think > >> the queries need to be solved using indexes enough smart to be fast on disk. > >> > > > > OK, I agree with this one. > > > > I'd thought that index-only plans were only for OLTP, but now I see they > > can also make a big difference with DW queries. So I'm very interested > > in this area now. > > > > > If that's true, then you want to get behind the work Gokulakannan > Somasundaram > (http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php) has > done with relation to thick indexes. I would have thought that concept > particularly useful in DW. Only having to scan indexes on a number of > join tables would be a huge win for some of these types of queries. Hmm, well I proposed that in Jan/Feb, but I'm sure others have also. I don't think its practical to add visibility information to *all* indexes, but I like Heikki's Visibility Map proposal much better. > My tiny point of view would say that is a much better investment than > setting up the proposed parameter. They are different things entirely, with dissimilar dev costs also. We can have both. > I can see the use of the parameter > though. Good -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On 11/29/07, Gregory Stark <stark@enterprisedb.com> wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw > >> error if the expected cost exceeds a certain threshold... > > Tom's previous concerns were along the lines of "How would know what to > > set it to?", given that the planner costs are mostly arbitrary numbers. > Hm, that's only kind of true. > Obviously few people know how long such a page read takes but surely you would > just run a few sequential reads of large tables and set the limit to some > multiple of whatever you find. > > This isn't going to precise to the level of being able to avoid executing any > query which will take over 1000ms. But it is going to be able to catch > unconstrained cross joins or large sequential scans or such. Isn't that what statement_timeout is for? Since this is entirely based on estimates, using arbitrary fuzzy numbers for this seems fine to me; precision isn't really the goal.
> Isn't that what statement_timeout is for? Since this is entirely based > on estimates, using arbitrary fuzzy numbers for this seems fine to me; > precision isn't really the goal. There's an important difference to statement_timeout: this proposal would avoid completely taking any resources if it estimates it can't be executed in proper time, but statement_timeout will allow a bad query to run at least statement_timeout long... Cheers, Csaba.
Hi Peter, If you run into a scaling issue with PG (you will at those scales 1TB+), you can deploy Greenplum DB which is PG 8.2.5 compatible. A large internet company (look for press soon) is in production with a 150TB database on a system capable of doing 400TB and we have others in production at 60TB, 40TB, etc. We can provide references when needed - note that we had 20 successful customer references supporting Gartner's magic quadrant report on data warehouses which put Greenplum in the "upper visionary" area of the magic quadrant - which only happens if your customers can scale (see this: http://www.esj.com/business_intelligence/article.aspx?EditorialsID=8712) In other words, no matter what happens you'll be able to scale up with your Postgres strategy. - Luke On 11/26/07 10:44 AM, "Pablo Alcaraz" <pabloa@laotraesquina.com.ar> wrote: > I had a client that tried to use Ms Sql Server to run a 500Gb+ database. > The database simply colapsed. They switched to Teradata and it is > running good. This database has now 1.5Tb+. > > Currently I have clients using postgresql huge databases and they are > happy. In one client's database the biggest table has 237Gb+ (only 1 > table!) and postgresql run the database without problem using > partitioning, triggers and rules (using postgresql 8.2.5). > > Pablo > > Peter Koczan wrote: >> Hi all, >> >> I have a user who is looking to store 500+ GB of data in a database >> (and when all the indexes and metadata are factored in, it's going to >> be more like 3-4 TB). He is wondering how well PostgreSQL scales with >> TB-sized databases and what can be done to help optimize them (mostly >> hardware and config parameters, maybe a little advocacy). I can't >> speak on that since I don't have any DBs approaching that size. >> >> The other part of this puzzle is that he's torn between MS SQL Server >> (running on Windows and unsupported by us) and PostgreSQL (running on >> Linux...which we would fully support). If any of you have ideas of how >> well PostgreSQL compares to SQL Server, especially in TB-sized >> databases, that would be much appreciated. >> >> We're running PG 8.2.5, by the way. >> >> Peter >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
On Thursday 29 November 2007 11:14, Simon Riggs wrote: > On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > > Gregory Stark <stark@enterprisedb.com> writes: > > > "Simon Riggs" <simon@2ndquadrant.com> writes: > > >> Tom's previous concerns were along the lines of "How would know what > > >> to set it to?", given that the planner costs are mostly arbitrary > > >> numbers. > > > > > > Hm, that's only kind of true. > > > > The units are not the problem. The problem is that you are staking > > non-failure of your application on the planner's estimates being > > pretty well in line with reality. Not merely in line enough that > > it picks a reasonably cheap plan, but in line enough that if it > > thinks plan A is 10x more expensive than plan B, then the actual > > ratio is indeed somewhere near 10. > > > > Given that this list spends all day every day discussing cases where the > > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > I think you have a point, but the alternative is often much worse. > > If an SQL statement fails because of too high cost, we can investigate > the problem and re-submit. If a website slows down because somebody > allowed a very large query to execute then everybody is affected, not > just the person who ran the bad query. Either way the guy that ran the > query loses, but without constraints in place one guy can kill everybody > else also. > > > You could probably avoid this risk by setting the cutoff at something > > like 100 or 1000 times what you really want to tolerate, but how > > useful is it then? > > Still fairly useful, as long as we understand its a blunt instrument. > > If the whole performance of your system depends upon indexed access then > rogue queries can have disastrous, unpredictable consequences. Many > sites construct their SQL dynamically, so a mistake in a seldom used > code path can allow killer queries through. Even the best DBAs have been > known to make mistakes. > If the whole performance of your system depends upon indexed access, then maybe you need a database that gives you a way to force index access at the query level? > e.g. An 80GB table has 8 million blocks in it. > - So putting a statement_cost limit = 1 million would allow some fairly > large queries but prevent anything that did a SeqScan (or worse). > - Setting it 10 million is going to prevent things like sorting the > whole table without a LIMIT > - Setting it at 100 million is going to prevent unconstrained product > joins etc.. I think you're completly overlooking the effect of disk latency has on query times. We run queries all the time that can vary from 4 hours to 12 hours in time based solely on the amount of concurrent load on the system, even though they always plan with the same cost. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Nov 28, 2007, at 7:27 AM, Bill Moran wrote: > Is there something wrong with: > set enable_seqscan = off Note that in cases of very heavy skew, that won't work. It only adds 10M to the cost estimate for a seqscan, and it's definitely possible to have an index scan that looks even more expensive. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Attachment
Robert, On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: > If the whole performance of your system depends upon indexed access, then > maybe you need a database that gives you a way to force index access at the > query level? That sounds like a request for hints, which is OT here, ISTM. The issue is that if somebody issues a "large query" then it will be a problem whichever plan the query takes. Forcing index scans can make a plan more expensive than a seq scan in many cases. > > e.g. An 80GB table has 8 million blocks in it. > > - So putting a statement_cost limit = 1 million would allow some fairly > > large queries but prevent anything that did a SeqScan (or worse). > > - Setting it 10 million is going to prevent things like sorting the > > whole table without a LIMIT > > - Setting it at 100 million is going to prevent unconstrained product > > joins etc.. > > I think you're completly overlooking the effect of disk latency has on query > times. We run queries all the time that can vary from 4 hours to 12 hours in > time based solely on the amount of concurrent load on the system, even though > they always plan with the same cost. Not at all. If we had statement_cost_limit then it would be applied after planning and before execution begins. The limit would be based upon the planner's estimate, not the likely actual execution time. So yes a query may vary in execution time by a large factor as you suggest, and it would be difficult to set the proposed parameter accurately. However, the same is also true of statement_timeout, which we currently support, so I don't see this point as an blocker. Which leaves us at the burning question: Would you use such a facility, or would the difficulty in setting it exactly prevent you from using it for real? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Thu, Dec 06, 2007 at 09:38:16AM +0000, Simon Riggs wrote: >The issue is that if somebody issues a "large query" then it will be a >problem whichever plan the query takes. Forcing index scans can make a >plan more expensive than a seq scan in many cases. OTOH, the planner can really screw up queries on really large databases. IIRC, the planner can use things like unique constraints to get some idea, e.g., of how many rows will result from a join. Unfortunately, the planner can't apply those techniques to certain constructs common in really large db's (e.g., partitioned tables--how do you do a unique constraint on a partitioned table?) I've got some queries that the planner thinks will return on the order of 10^30 rows for that sort of reason. In practice, the query may return 10^3 rows, and the difference between the seq scan and the index scan is the difference between a query that takes a few seconds and a query that I will never run to completion. I know the goal would be to make the planner understand those queries better, but for now the answer seems to be to craft the queries very carefully and run explain first, making sure you see index scans in the right places. Mike Stone
Michael Stone <mstone+postgres@mathom.us> writes: > OTOH, the planner can really screw up queries on really large databases. > IIRC, the planner can use things like unique constraints to get some > idea, e.g., of how many rows will result from a join. Unfortunately, > the planner can't apply those techniques to certain constructs common in > really large db's (e.g., partitioned tables--how do you do a unique > constraint on a partitioned table?) I've got some queries that the > planner thinks will return on the order of 10^30 rows for that sort of > reason. In practice, the query may return 10^3 rows, and the difference > between the seq scan and the index scan is the difference between a > query that takes a few seconds and a query that I will never run to > completion. I know the goal would be to make the planner understand > those queries better, Indeed, and if you've got examples where it's that far off, you should report them. regards, tom lane
On Thu, 6 Dec 2007, Tom Lane wrote: > Indeed, and if you've got examples where it's that far off, you should > report them. Oo, oo, I have one! So, this query bit us a while back. We had two tables being joined together in a query by a key column. The key column was an integer, and for the first table it had a range from zero to a bazillion. For the second table, it had a range from half a bazillion to half a bazillion plus a hundred. The first table had a bazillion rows, and the second table had only about a hundred. Both tables were well ordered on the key column. Both tables had an index on the key column. So, our query was like this: SELECT * FROM table1, table2 WHERE table1.key = table2.key LIMIT 1 ... because we wanted to find out if there were *any* hits between the two tables. The query took hours. To understand why, let's look at the query without the LIMIT. For this query, Postgres would perform a nested loop, iterating over all rows in the small table, and doing a hundred index lookups in the big table. This completed very quickly. However, adding the LIMIT meant that suddenly a merge join was very attractive to the planner, as it estimated the first row to be returned within milliseconds, without needing to sort either table. The problem is that Postgres didn't know that the first hit in the big table would be about half-way through, after doing a index sequential scan for half a bazillion rows. We fixed this query by changing it to: SELECT * FROM table1, table2 WHERE table1.key = table2.key AND table1.key >= (SELECT MIN(key) FROM table2) AND table1.key <= (SELECT MAX(key) FROM table2) ... which artificially limited the index sequential scan of table2 to start from the earliest possible hit and only continue to the last possible hit. This query completed quickly, as the min and max could be answered quickly by the indexes. Still, it's a pity Postgres couldn't work that out for itself, having all the information present in its statistics and indexes. AIUI the planner doesn't peek into indexes - maybe it should. Matthew -- In the beginning was the word, and the word was unsigned, and the main() {} was without form and void...
Matthew <matthew@flymine.org> writes: > ... For this query, Postgres would perform a nested loop, > iterating over all rows in the small table, and doing a hundred index > lookups in the big table. This completed very quickly. However, adding the > LIMIT meant that suddenly a merge join was very attractive to the planner, > as it estimated the first row to be returned within milliseconds, without > needing to sort either table. > The problem is that Postgres didn't know that the first hit in the big > table would be about half-way through, after doing a index sequential scan > for half a bazillion rows. Hmm. IIRC, there are smarts in there about whether a mergejoin can terminate early because of disparate ranges of the two join variables. Seems like it should be straightforward to fix it to also consider whether the time-to-return-first-row will be bloated because of disparate ranges. I'll take a look --- but it's probably too late to consider this for 8.3. regards, tom lane
On Thu, 6 Dec 2007, Tom Lane wrote: > Matthew <matthew@flymine.org> writes: > > ... For this query, Postgres would perform a nested loop, > > iterating over all rows in the small table, and doing a hundred index > > lookups in the big table. This completed very quickly. However, adding the > > LIMIT meant that suddenly a merge join was very attractive to the planner, > > as it estimated the first row to be returned within milliseconds, without > > needing to sort either table. > > > The problem is that Postgres didn't know that the first hit in the big > > table would be about half-way through, after doing a index sequential scan > > for half a bazillion rows. > > Hmm. IIRC, there are smarts in there about whether a mergejoin can > terminate early because of disparate ranges of the two join variables. > Seems like it should be straightforward to fix it to also consider > whether the time-to-return-first-row will be bloated because of > disparate ranges. I'll take a look --- but it's probably too late > to consider this for 8.3. Very cool. Would that be a planner cost estimate fix (so it avoids the merge join), or a query execution fix (so it does the merge join on the table subset)? Matthew -- I've run DOOM more in the last few days than I have the last few months. I just love debugging ;-) -- Linus Torvalds
Matthew <matthew@flymine.org> writes: > On Thu, 6 Dec 2007, Tom Lane wrote: >> Hmm. IIRC, there are smarts in there about whether a mergejoin can >> terminate early because of disparate ranges of the two join variables. > Very cool. Would that be a planner cost estimate fix (so it avoids the > merge join), or a query execution fix (so it does the merge join on the > table subset)? Cost estimate fix. Basically what I'm thinking is that the startup cost attributed to a mergejoin ought to account for any rows that have to be skipped over before we reach the first join pair. In general this is hard to estimate, but for mergejoin it can be estimated using the same type of logic we already use at the other end. After looking at the code a bit, I'm realizing that there's actually a bug in there as of 8.3: mergejoinscansel() is expected to be able to derive numbers for either direction of scan, but if it's asked to compute numbers for a DESC-order scan, it looks for a pg_stats entry sorted with '>', which isn't gonna be there. It needs to know to look for an '<' histogram and switch the min/max. So the lack of symmetry here is causing an actual bug in logic that already exists. That makes the case for fixing this now a bit stronger ... regards, tom lane
On Thu, Dec 06, 2007 at 11:13:18AM -0500, Tom Lane wrote: >Indeed, and if you've got examples where it's that far off, you should >report them. Yeah, the trick is to get it to a digestable test case. The basic scenario (there are more tables & columns in the actual case) is a set of tables partitioned by date with a number of columns in one table referencing rows in the others: Table A (~5bn rows / 100's of partitions) time Bkey1 Ckey1 Bkey2 Ckey2 Table B (~1bn rows / 100's of partitions) Bkey Bval Table C (~.5bn rows / 100's of partitions) Ckey Cval Bkey and Ckey are unique, but the planner doesn't know that. Mike Stone
Tom Lane wrote: > Michael Stone <mstone+postgres@mathom.us> writes: >> OTOH, the planner can really screw up queries on really large databases. >> ... I've got some queries that the >> planner thinks will return on the order of 10^30 rows for that sort of >> reason. In practice, the query may return 10^3 rows.... > > Indeed, and if you've got examples where it's that far off, you should > report them. If I read this right, I've got quite a few cases where the planner expects 1 row but gets over 2000. And within the plan, it looks like there's a step where it expects 511 rows and gets 2390779 which seems to be off by a factor of 4600x. Also shown below it seems that if I use "OFFSET 0" as a "hint" I can force a much (10x) better plan. I wonder if there's room for a pgfoundry project for a patch set that lets us use more hints than OFFSET 0. Ron logs=# analyze; ANALYZE logs=# explain analyze select * from fact natural join d_ref natural join d_uag where ref_host = 'download.com.com' andref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=2827.72..398919.05 rows=1 width=242) (actual time=69175.963..141550.628 rows=2474 loops=1) Hash Cond: (fact.ref_id = d_ref.ref_id) -> Nested Loop (cost=2819.88..398908.65 rows=511 width=119) (actual time=3094.740..139361.235 rows=2390779 loops=1) -> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=45.937..45.948 rows=1 loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text) -> Bitmap Heap Scan on fact (cost=2819.88..396449.49 rows=196223 width=32) (actual time=3048.770..135653.875 rows=2390779loops=1) Recheck Cond: (fact.uag_id = d_uag.uag_id) -> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=1713.148..1713.148rows=2390779 loops=1) Index Cond: (fact.uag_id = d_uag.uag_id) -> Hash (cost=7.83..7.83 rows=1 width=127) (actual time=62.841..62.841 rows=2 loops=1) -> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=62.813..62.823 rows=2loops=1) Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text = 'download.com.com'::text)) Total runtime: 141563.733 ms (13 rows) ############ using "offset 0" to force a better plan. logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'download.com.com'and ref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'offset 0) as a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=6465.12..7575.91 rows=367 width=2096) (actual time=2659.251..14703.343 rows=2474 loops=1) -> Limit (cost=0.00..14.22 rows=1 width=218) (actual time=114.968..115.140 rows=2 loops=1) -> Nested Loop (cost=0.00..14.22 rows=1 width=218) (actual time=114.964..115.127 rows=2 loops=1) -> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=75.891..75.900 rows=2loops=1) Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text = 'download.com.com'::text)) -> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=19.582..19.597 rows=1loops=2) Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text) -> Bitmap Heap Scan on fact (cost=6465.12..7556.18 rows=367 width=32) (actual time=2240.090..7288.145 rows=1237 loops=2) Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id)) -> BitmapAnd (cost=6465.12..6465.12 rows=367 width=0) (actual time=2221.539..2221.539 rows=0 loops=2) -> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=1633.032..1633.032rows=2390779 loops=2) Index Cond: (fact.uag_id = a.uag_id) -> Bitmap Index Scan on i__fact__ref_id (cost=0.00..3581.50 rows=253913 width=0) (actual time=150.614..150.614rows=77306 loops=2) Index Cond: (fact.ref_id = a.ref_id) Total runtime: 14710.870 ms (15 rows)
I wrote: > Hmm. IIRC, there are smarts in there about whether a mergejoin can > terminate early because of disparate ranges of the two join variables. > Seems like it should be straightforward to fix it to also consider > whether the time-to-return-first-row will be bloated because of > disparate ranges. I've posted a proposed patch for this: http://archives.postgresql.org/pgsql-patches/2007-12/msg00025.php regards, tom lane
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Tom Lane wrote: >> There's something fishy about this --- given that that plan has a lower >> cost estimate, it should've picked it without any artificial >> constraints. > I think the reason it's not picking it was discussed back in this thread > too. > http://archives.postgresql.org/pgsql-performance/2005-03/msg00675.php > http://archives.postgresql.org/pgsql-performance/2005-03/msg00684.php > My offset 0 is forcing the outer join. > [Edit: Ugh - meant cartesian join - which helps this kind of query.] Ah; I missed the fact that the two relations you want to join first don't have any connecting WHERE clause. The concern I mentioned in the above thread was basically that I don't want the planner to go off chasing Cartesian join paths in general --- they're usually useless and would result in an exponential explosion in the number of join paths considered in many-table queries. However, in this case the reason that the Cartesian join might be relevant is that both of them are needed in order to form an inner indexscan on the big table. I wonder if we could drive consideration of the Cartesian join off of noticing that. It'd take some rejiggering around best_inner_indexscan(), or somewhere in that general vicinity. Way too late for 8.3, but something to think about for next time. regards, tom lane
Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > >> Also shown below it seems that if I use "OFFSET 0" as a "hint" >> I can force a much (10x) better plan. I wonder if there's room for >> a pgfoundry project for a patch set that lets us use more hints >> than OFFSET 0. >> > There's something fishy about this --- given that that plan has a lower > cost estimate, it should've picked it without any artificial > constraints. I think the reason it's not picking it was discussed back in this thread too. http://archives.postgresql.org/pgsql-performance/2005-03/msg00675.php http://archives.postgresql.org/pgsql-performance/2005-03/msg00684.php My offset 0 is forcing the outer join. [Edit: Ugh - meant cartesian join - which helps this kind of query.] > What PG version are you using? logs=# select version(); version ---------------------------------------------------------------------------------------------------------------- PostgreSQL 8.2.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) (1 row) > Do you perhaps have a low setting for join_collapse_limit? logs=# show join_collapse_limit; join_collapse_limit --------------------- 8 (1 row) Actually, IIRC back in that other thread, "set join_collapse_limit =1;" helped http://archives.postgresql.org/pgsql-performance/2005-03/msg00663.php
Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >> Tom Lane wrote: >>> ...given that that plan has a lower cost estimate, it >>> should've picked it without any artificialconstraints. > >>I think the reason it's not picking it was discussed back... >> http://archives.postgresql.org/pgsql-performance/2005-03/msg00675.php > ... > > The concern I mentioned in the above thread was basically that I don't > want the planner to go off chasing Cartesian join paths in general... > However, in this case the reason that the Cartesian join might be > relevant is that both of them are needed in order to form an inner > indexscan on the big table.... Interesting.... I think Simon mentioned last time that this type of query is quite common for standard star schema data warehouses. And it seem to me the Cartesian join on the dimension tables is often pretty harmless since each dimension table would often return just 1 row; and the size of the fact table is such that it's useful to touch it as little as possible. > Way too late for 8.3, but something to think about for next time. No problem.. we've been working around it since that last thread in early '05 with early 8.0, IIRC. :-) Thanks to the excellent postgres hints system ("offset 0" and "set join_collapse_limit=1") we can get the plans we want pretty easily. :-)
On Thursday 06 December 2007 04:38, Simon Riggs wrote: > Robert, > > On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: > > If the whole performance of your system depends upon indexed access, then > > maybe you need a database that gives you a way to force index access at > > the query level? > > That sounds like a request for hints, which is OT here, ISTM. > If you want to eat peas, and someone suggests you use a knife, can I only argue the validity of using a knife? I'd rather just recommend a spoon. > > I think you're completly overlooking the effect of disk latency has on > > query times. We run queries all the time that can vary from 4 hours to > > 12 hours in time based solely on the amount of concurrent load on the > > system, even though they always plan with the same cost. > > Not at all. If we had statement_cost_limit then it would be applied > after planning and before execution begins. The limit would be based > upon the planner's estimate, not the likely actual execution time. > This is nice, but it doesnt prevent "slow queries" reliably (which seemed to be in the original complaints), since query time cannot be directly traced back to statement cost. > So yes a query may vary in execution time by a large factor as you > suggest, and it would be difficult to set the proposed parameter > accurately. However, the same is also true of statement_timeout, which > we currently support, so I don't see this point as an blocker. > > Which leaves us at the burning question: Would you use such a facility, > or would the difficulty in setting it exactly prevent you from using it > for real? I'm not sure. My personal instincts are that the solution is too fuzzy for me to rely on, and if it isnt reliable, it's not a good solution. If you look at all of the things people seem to think this will solve, I think I can raise an alternative option that would be a more definitive solution: "prevent queries from taking longer than x" -> statement_timeout. "prevent planner from switching to bad plan" -> hint system "prevent query from consuming too many resources" -> true resource restrictions at the database level I'm not so much against the idea of a statement cost limit, but I think we need to realize that it does not really solve as many problems as people think, in cases where it will help it often will do so poorly, and that there are probably better solutions available to those problems. Of course if you back me into a corner I'll agree a poor solution is better than no solution, so... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >> Tom Lane wrote: >>> There's something fishy about this --- given that that plan has a lower >>> cost estimate, it should've picked it without any artificial >>> constraints. One final thing I find curious about this is that the estimated number of rows is much closer in the "offset 0" form of the query. Since the logic itself is identical, I would have expected the estimated total number of rows for both forms of this query to be identical. Any reason the two plans estimate a different total number of rows? (explain statements for the two forms of the same query from earlier in the thread here: http://archives.postgresql.org/pgsql-performance/2007-12/msg00088.php )
On Fri, 2007-12-07 at 12:45 -0500, Robert Treat wrote: > On Thursday 06 December 2007 04:38, Simon Riggs wrote: > > > I think you're completly overlooking the effect of disk latency has on > > > query times. We run queries all the time that can vary from 4 hours to > > > 12 hours in time based solely on the amount of concurrent load on the > > > system, even though they always plan with the same cost. > > > > Not at all. If we had statement_cost_limit then it would be applied > > after planning and before execution begins. The limit would be based > > upon the planner's estimate, not the likely actual execution time. > > > > This is nice, but it doesnt prevent "slow queries" reliably (which seemed to > be in the original complaints), since query time cannot be directly traced > back to statement cost. Hmm, well it can be directly traced, just not with the accuracy you desire. We can improve the accuracy, but then we would need to run the query first in order to find out it was killing us. > > So yes a query may vary in execution time by a large factor as you > > suggest, and it would be difficult to set the proposed parameter > > accurately. However, the same is also true of statement_timeout, which > > we currently support, so I don't see this point as an blocker. > > > > Which leaves us at the burning question: Would you use such a facility, > > or would the difficulty in setting it exactly prevent you from using it > > for real? > > I'm not sure. My personal instincts are that the solution is too fuzzy for me > to rely on, and if it isnt reliable, it's not a good solution. If you look at > all of the things people seem to think this will solve, I think I can raise > an alternative option that would be a more definitive solution: > > "prevent queries from taking longer than x" -> statement_timeout. > > "prevent planner from switching to bad plan" -> hint system > > "prevent query from consuming too many resources" -> true resource > restrictions at the database level I like and agree with your list, as an overview. I differ slightly on specifics. > I'm not so much against the idea of a statement cost limit, but I think we > need to realize that it does not really solve as many problems as people > think, in cases where it will help it often will do so poorly, and that there > are probably better solutions available to those problems. Of course if you > back me into a corner I'll agree a poor solution is better than no solution, > so... statement_cost_limit isn't a panacea for all performance ills, its just one weapon in the armoury. I'm caught somewhat in that whatever I propose as a concrete next step, somebody says I should have picked another. Oh well. On specific points: With hints I prefer a declarative approach, will discuss later in release cycle. The true resource restrictions sound good, but its still magic numbers. How many I/Os are allowed before you kill the query? How much CPU? Those are still going to be guessed at. How do we tell the difference between a random I/O and a sequential I/O - there's no difference as far as Postgres is concerned in the buffer manager, but it can cause a huge performance difference. Whether you use real resource limits or statement cost limits you still need to work out the size of your table and then guess at appropriate limits. Every other system I've seen uses resource limits, but the big problem is that they are applied after something has been running for a long time. It's kinda like saying I'll treat the gangrene when it reaches my knee. I prefer to avoid the problem before it starts to hurt at all, so I advocate learning the lessons from other systems, not simply follow them. But having said that, I'm not against having them; its up to the administrator how they want to manage their database, not me. What resource limit parameters would you choose? (temp disk space etc..) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
I have _not_ added a TODO for this item. Let me know if one is needed. --------------------------------------------------------------------------- Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > > Tom Lane wrote: > >> There's something fishy about this --- given that that plan has a lower > >> cost estimate, it should've picked it without any artificial > >> constraints. > > > I think the reason it's not picking it was discussed back in this thread > > too. > > http://archives.postgresql.org/pgsql-performance/2005-03/msg00675.php > > http://archives.postgresql.org/pgsql-performance/2005-03/msg00684.php > > My offset 0 is forcing the outer join. > > [Edit: Ugh - meant cartesian join - which helps this kind of query.] > > Ah; I missed the fact that the two relations you want to join first > don't have any connecting WHERE clause. > > The concern I mentioned in the above thread was basically that I don't > want the planner to go off chasing Cartesian join paths in general --- > they're usually useless and would result in an exponential explosion > in the number of join paths considered in many-table queries. > > However, in this case the reason that the Cartesian join might be > relevant is that both of them are needed in order to form an inner > indexscan on the big table. I wonder if we could drive consideration > of the Cartesian join off of noticing that. It'd take some rejiggering > around best_inner_indexscan(), or somewhere in that general vicinity. > > Way too late for 8.3, but something to think about for next time. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I have _not_ added a TODO for this item. Let me know if one is needed. Please do, I think it's an open issue. * Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I have _not_ added a TODO for this item. Let me know if one is needed. > > Please do, I think it's an open issue. > > * Consider Cartesian joins when both relations are needed to form an > indexscan qualification for a third relation Done, with URL added. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
All that helps to pgsql to perform good in a TB-sized database enviroment is a Good Think (r) :D Pablo Bruce Momjian wrote: > I have _not_ added a TODO for this item. Let me know if one is needed.
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> I have _not_ added a TODO for this item. Let me know if one is needed. > > Please do, I think it's an open issue. > > * Consider Cartesian joins when both relations are needed to form an > indexscan qualification for a third relation > Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part of this join? In the couple cases where this happened to me it was in queries on a textbook star schema like this: select * from fact join dim1 using (dim1_id) join dim2 using (dim2_id) where dim1.value = 'something' and dim2.valuex = 'somethingelse' and dim2.valuey = 'more'; and looking up all the IDs before hitting the huge fact table. Often in these cases the where clause on the dimension tables are on values with a unique constraint. If I understand right - if the constraint can prove it'll return at most 1 row - that means the cartesian join is provably safe from blowing up. Not sure if that's redundant with the condition you mentioned, or if it's yet a separate condition where we might also want to consider cartesian joins. Ron M
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Would another possible condition for considering > Cartesian joins be be: > * Consider Cartesian joins when a unique constraint can prove > that at most one row will be pulled from one of the tables > that would be part of this join? What for? That would still lead us to consider large numbers of totally useless joins. regards, tom lane
Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >> Would another possible condition for considering >> Cartesian joins be be: > >> * Consider Cartesian joins when a unique constraint can prove >> that at most one row will be pulled from one of the tables >> that would be part of this join? > > What for? That would still lead us to consider large numbers of totally > useless joins. > > regards, tom lane Often I get order-of-magnitude better queries by forcing the cartesian join even without multi-column indexes. Explain analyze results below. Here's an example with your typical star schema. fact is the central fact table. d_ref is a dimension table for the referrer d_uag is a dimension table for the useragent. Forcing the cartesan join using "offset 0" makes the the query take 14 ms (estimated cost 7575). If I don't force the cartesian join the query takes over 100ms (estimated cost 398919). Indexes are on each dimension; but no multi-column indexes (since the ad-hoc queries can hit any permutation of dimensions). logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com'and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='and useragent = 'Mozilla/4.08 [en](WinNT; U ;Nav)' offset 0 ) as a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=6465.12..7575.91 rows=367 width=2096) (actual time=14.152..14.192 rows=4 loops=1) -> Limit (cost=0.00..14.22 rows=1 width=218) (actual time=0.084..0.102 rows=1 loops=1) -> Nested Loop (cost=0.00..14.22 rows=1 width=218) (actual time=0.082..0.096 rows=1 loops=1) -> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=0.056..0.058 rows=1loops=1) Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text= 'www.real.com'::text) AND ((ref_query)::text = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='::text)) -> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.020..0.029 rows=1loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] (WinNT; U ;Nav)'::text) -> Bitmap Heap Scan on fact (cost=6465.12..7556.18 rows=367 width=32) (actual time=14.053..14.066 rows=4 loops=1) Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id)) -> BitmapAnd (cost=6465.12..6465.12 rows=367 width=0) (actual time=14.016..14.016 rows=0 loops=1) -> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=2.258..2.258rows=7960 loops=1) Index Cond: (fact.uag_id = a.uag_id) -> Bitmap Index Scan on i__fact__ref_id (cost=0.00..3581.50 rows=253913 width=0) (actual time=9.960..9.960rows=13751 loops=1) Index Cond: (fact.ref_id = a.ref_id) Total runtime: 14.332 ms (15 rows) logs=# logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com'and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='and useragent = 'Mozilla/4.08 [en](WinNT; U ;Nav)' ) as a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=2827.72..398919.05 rows=1 width=242) (actual time=78.777..107.038 rows=4 loops=1) Hash Cond: (fact.ref_id = d_ref.ref_id) -> Nested Loop (cost=2819.88..398908.65 rows=511 width=119) (actual time=6.311..101.843 rows=7960 loops=1) -> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.021..0.029 rows=1 loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] (WinNT; U ;Nav)'::text) -> Bitmap Heap Scan on fact (cost=2819.88..396449.49 rows=196223 width=32) (actual time=6.273..91.645 rows=7960loops=1) Recheck Cond: (fact.uag_id = d_uag.uag_id) -> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=5.117..5.117rows=7960 loops=1) Index Cond: (fact.uag_id = d_uag.uag_id) -> Hash (cost=7.83..7.83 rows=1 width=127) (actual time=0.069..0.069 rows=1 loops=1) -> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=0.059..0.062 rows=1 loops=1) Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text= 'www.real.com'::text) AND ((ref_query)::text = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='::text)) Total runtime: 107.193 ms (13 rows)
Ron Mayer wrote: > Tom Lane wrote: >> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >>> Would another possible condition for considering >>> Cartesian joins be be: >>> * Consider Cartesian joins when a unique constraint can prove >>> that at most one row will be pulled from one of the tables >>> that would be part of this join? >> >> What for? That would still lead us to consider large numbers of totally >> useless joins. > > Often I get order-of-magnitude better queries by forcing the cartesian > join even without multi-column indexes. Ah - and sometimes even 2 order of magnitude improvements. 1.1 seconds with Cartesian join, 200 seconds if it doesn't use it. logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com'and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='and useragent = 'Mozilla/4.0 (compatible;MSIE 5.01; Windows 98)' offset 0 ) as a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=6465.12..7575.91 rows=367 width=2096) (actual time=1118.741..1119.207 rows=122 loops=1) -> Limit (cost=0.00..14.22 rows=1 width=218) (actual time=0.526..0.542 rows=1 loops=1) -> Nested Loop (cost=0.00..14.22 rows=1 width=218) (actual time=0.524..0.537 rows=1 loops=1) -> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=0.168..0.170 rows=1loops=1) Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text= 'www.real.com'::text) AND ((ref_query)::text = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='::text)) -> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.347..0.355 rows=1loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)'::text) -> Bitmap Heap Scan on fact (cost=6465.12..7556.18 rows=367 width=32) (actual time=1118.196..1118.491 rows=122 loops=1) Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id)) -> BitmapAnd (cost=6465.12..6465.12 rows=367 width=0) (actual time=1115.565..1115.565 rows=0 loops=1) -> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=813.859..813.859rows=1183470 loops=1) Index Cond: (fact.uag_id = a.uag_id) -> Bitmap Index Scan on i__fact__ref_id (cost=0.00..3581.50 rows=253913 width=0) (actual time=8.667..8.667rows=13751 loops=1) Index Cond: (fact.ref_id = a.ref_id) Total runtime: 1122.245 ms (15 rows) logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com'and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='and useragent = 'Mozilla/4.0 (compatible;MSIE 5.01; Windows 98)' ) as a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=2827.72..398919.05 rows=1 width=242) (actual time=114138.193..200622.416 rows=122 loops=1) Hash Cond: (fact.ref_id = d_ref.ref_id) -> Nested Loop (cost=2819.88..398908.65 rows=511 width=119) (actual time=1524.600..199522.182 rows=1183470 loops=1) -> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.023..0.033 rows=1 loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)'::text) -> Bitmap Heap Scan on fact (cost=2819.88..396449.49 rows=196223 width=32) (actual time=1524.562..197627.135rows=1183470 loops=1) Recheck Cond: (fact.uag_id = d_uag.uag_id) -> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=758.888..758.888rows=1183470 loops=1) Index Cond: (fact.uag_id = d_uag.uag_id) -> Hash (cost=7.83..7.83 rows=1 width=127) (actual time=0.067..0.067 rows=1 loops=1) -> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=0.058..0.060 rows=1 loops=1) Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text= 'www.real.com'::text) AND ((ref_query)::text = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='::text)) Total runtime: 200625.636 ms (13 rows) logs=#